Linux Containers Ecosystem

Internals

Process Management

The most obvious way:

proc = subprocess.Popen(...)
proc.wait()

Low level API:

pid = os.spawnve(os.P_NOWAIT, ...)
os.waitpid(pid, 0)

Even more low level:

pid = os.fork()
if pid == 0:
    os.execve(...)
assert pid > 0, "Fork failed"
os.waitpid(pid, 0)

fork() at system call level (strace):

clone(child_stack=0,
    flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD,
    child_tidptr=0x7fcd0e8539d0) = 8697

Finally namespaces (bad example!):

import signal, ctypes, os

libc = ctypes.CDLL('libc.so.6', use_errno=True)
CLONE_NEWPID = 0x20000000
CLONE_NEWUSER = 0x10000000
stack = ctypes.byref(ctypes.create_string_buffer(0x200000), 0x200000)
CHILDFUNC = ctypes.CFUNCTYPE(ctypes.c_int, ctypes.c_void_p)
@CHILDFUNC
def childfunc(_):
    print("CHILD", "pid:", os.getpid(), "uid:", os.getuid())
    return 0
pid = libc.clone(childfunc, stack,
    CLONE_NEWPID|CLONE_NEWUSER|signal.SIGCHLD, 0)
assert pid > 0, "Clone error"
print("PARENT", "pid:", os.getpid(),
    "uid:", os.getuid(), "child:", pid)
os.waitpid(pid, 0)

Output (the order of lines is arbitrary):

$ python3 clone.py
PARENT pid: 14305 uid: 1000 child: 14306
CHILD pid: 1 uid: 65534

Namespaces:

But there is chroot since 1993!

/var/lib/lxc/ubuntu/rootfs

  • /usr
  • /var
  • /dev
  • ...

/var/lib/lxc/nix/rootfs

  • /nix
  • /run
  • ...

With root privileges you can just:

os.chroot('/var/lib/lxc/ubuntu/rootfs')

CLONE_NEWNS

mount namespaces

available in 2.4.19 (2003)

mount --bind

available in 2.4.0 (2001)

Create hierarchy in new mount namespace:

mount --bind /var/lib/lxc/ubuntu/rootfs \
             /usr/lib/lxc/rootfs
mount --bind /dev \
             /usr/lib/lxc/rootfs/dev
mount -t tmpfs tmpfs /usr/lib/lxc/rootfs/tmp
# note .../rootfs/{dev,tmp} still empty
chroot /usr/lib/lxc/rootfs bash

CLONE_NEWPID

CLONE_NEWIPC

CLONE_NEWUTS

CLONE_NEWNET

CLONE_NEWNET

useful on its own using ip netns

CLONE_NEWUSER

containers by unprivileged users

Security

Running as Root

... we don’t claim Docker out-of-the-box is suitable for containing untrusted programs with root privileges ...

Source: Solomon Hykes

Running as non-Root

docker run --user=1000 something

Lie

Can become root by any binary with setuid set:

(e.g. su, sudo)

So can be broken on untrusted images

(e.g. by replacing /etc/sudoers)

Docker

Docker Socket

Docker command workflow:

docker run ubuntu bash

--> HTTP --> /var/run/docker.sock -->

docker -d

Docker socket permissions:

srw-rw---- 1 root docker Oct  7 23:23 \
    /var/run/docker.sock

Which is basically equivalent to:

%docker ALL=(ALL) NOPASSWD: ALL

In case it's not obvious:

docker run -it --rm \
    --privileged \
    --volume /:/host \
    ubuntu rm -rf /host

Never run:

docker -d -H 127.0.0.1

(any hostname, even localhost)

Without:

docker -d --tlscacert --tlsverify

But that's not enough!

SkyDock

Running as:

docker run -d \
-v /var/run/docker.sock:/docker.sock \
crosbymichael/skydock

breaking skydock

=

breaking host system

Breaking Clusters

Untrusted Images

Untrusted Infrastructure Images

Insufficently Authenticated Repositories

Tools

Low Level Tools

Docker

docker run ubuntu bash
sudo docker run -it --rm \
    --user $(id -u)
    --volume $(pwd):/workdir \
    --workdir /workdir \
    our.repo.local/foobar:$(get_version) \
    bash

Dev Env Tools

Vagga

# vagga.yaml
containers:
  'react':
    builder: npm
    parameters:
      packages: react-tools
commands:
  'build':
    container: react
    description: "Build static files"
    run: "jsx jsx/page.jsx > public/js/page.js"
$ git clone git://github.com/.../foobar
$ cd foobar
$ vagga
Available commands:
    build       Build static files
    run         Run nginx+app+redis
    build-docs  Build docs
$ vagga build
# docker tree
-+= 00001 root systemd --system
 |-+- 10771 root docker -d
 | \--= 32029 root bash   << our process
 \-+= 30029 pc tmux
   \-+= 10718 pc -zsh     << our shell
     \--= 32021 pc docker run -it --rm bash
# vagga tree
-+= 00001 root systemd --system
 \-+= 30029 pc tmux
   \-+= 10358 pc -zsh        << our shell
     \-+= 00940 pc vagga bash
       \-+- 00941 pc vagga bash
         \--= 00942 pc bash  << our process

Production Tools

dokku

maestro-ng

cocaine

kubernetes

weave

deis

flynn

mesos

geard

coreos (fleet)

Docker+CoreOS

[Service]
TimeoutStartSec=0
ExecStartPre=-/usr/bin/docker kill busybox1
ExecStartPre=-/usr/bin/docker rm busybox1
ExecStartPre=/usr/bin/docker pull busybox
ExecStart=/usr/bin/docker run --name ...
ExecStop=/usr/bin/docker stop busybox1

Nix

SpaceForward
Left, Down, Page DownNext slide
Right, Up, Page UpPrevious slide
POpen presenter console
HToggle this help