March 05, 2014, 2:39 PM — A long time ago, a computer program was a stack of punch cards, and moving the program from computer to computer was easy as long as you didn't drop the box. Every command, instruction, and subroutine was in one big, fat deck. Editors, compilers, and code repositories have liberated us from punch cards, but somehow deploying software has grown more complicated. Moving a program from the coding geniuses to the production team is fraught with errors, glitches, and hassles. There's always some misconfiguration, and it's never as simple as carrying that deck down the hall.
Into this world comes open source Docker, the latest layer of virtualization to bundle everything together in a stable package or, in the current parlance, "container." (If the computer industry ever runs out of synonyms for "box," we're in big trouble.) The software opens up the process of creating and building virtual machines to anyone who can work with the Linux command line. You put the instructions for starting up your machine in one file called the Dockerfile, issue the build command, and voilà, your new machine is running in its own Shangri-La or Private Idaho or La-La Land. (Choose your own metaphor.)
If you take the right steps down the path to creating a Dockerfile, the results can be incredible. I whipped up a few virtual machines in a few minutes, and the building and deploying process was lightning quick. Anyone who has waited for other virtualization layers to start up will be surprised by how quickly you can type docker run and watch the virtual machine spring to life. This might be because Docker containers are typically more lightweight than traditional virtual machines. I suspect it might also be because everything runs from the command line. There are no mouse clicks to distract Docker. It's all about communicating with other machines through shell scripts, not those pesky humans who need cute icons in their GUIs.
Life with Docker is tightly integrated with the Linux command line. Docker depends entirely upon the hooks for containers in the newer versions of the Linux kernel, which allow isolated bundles of apps, services, and the libraries they depend on to live side by side on the Linux host. The Linux kernel team did most of the clever work, and now Docker is making it easy for people to access the power. The simplest way to use it is with a newer version of Ubuntu (say, 12.04) or one of the close cousins. There are instructions for using Docker with Mac OS X or Windows, but they involve installing VirtualBox and running the Linux kernel in a virtual machine.
Docker containers are built out of text written in the Dockerfile, the equivalent of the make file. There's not much to the syntax. Most of the lines in a Dockerfile will begin with RUN, which passes the rest of the line to the instance inside the container. These are usually lines that say things like RUN sudo apt-get install.... Much of the code in the Dockerfile is a shell script for building your machine and installing the software you need.
The real action occurs when you start playing with the other commands that poke holes in the container's flexible layer. The command ADD . /src maps your current directory and makes it appear inside the container as the directory src. I used it to put some Web pages for the version of Node.js I fired up inside a container. My Web page appeared to be both outside and inside the virtual world at the same time, but it seems like this is an illusion. Docker is really zipping up your files and passing a copy. You will also poke holes in the container for the TCP/IP ports, mapping the ports of the existing machine to the ports inside the container.
Two clever tricks start the moment when you ask Docker to build the machine. First, you can start accessing previously built containers from Docker's repositories. Most of the standard distros are there, as well as a number of common configurations with tools like MongoDB. You can ADD these slices to your Dockerfile and they'll be downloaded to your new machine. The basic repository is public, but the company behind the Docker project is looking into building private repositories for enterprise work.
The second is the way the new machine is built up with slices, much like a coldcut sandwich. Docker is clever enough to keep the changes in layers, potentially saving space and complexity. The changes you make are stored separately as diffs between the containers. These diffs are also mobile, and it's possible to juggle them to deploy your software. Your developers create the container with all the right libraries, then hand it over to the ops staff, which treats it like a little box that just needs to run.
For all of the cleverness, though, it's important to recognize that the software is very new and some parts are being redesigned as I type this. The Docker website says, "Please note Docker is currently under heavy development. It should not be used in production (yet)." The project plans to have an official release of a new version each month. It also notes that the current master branch of the open repository is the current release candidate. You can get it and build it yourself.
From what I saw, Docker is far enough along to be used in lightweight projects that don't overstress the machine or risk damage if something fails. Many report issues with stuck containers and "ghosts" that clog up machines. These can be swept away by restarting everything, a bit of a pain that undermines one of the selling points of the lightning-quick layer of virtualization. The bigger danger for you is that the Docker team will revise the API or add a new feature that trashes your hard work. This is bound to happen because I already stumbled upon several deprecated commands.
The development team is also starting to tackle the growing pains that emerge when a project goes from a fun experiment for hackers to a serious part of infrastructure. Docker just announced a new "responsible security" program to help people report holes. While the Docker sandbox may stop some security leaks, it is quite new and relatively untested. Is there a way for one Docker container to reach inside another running next door? It's certainly not part of the official API, but these are untested waters. I wouldn't trust my bitcoin password at Mt. Gox to a Docker container.
Some of these qualms might be eased by the company's decision to open-source the code under the generous Apache 2.0 license. Developers can see the code and -- if they have the time -- look for the kind of holes that should be patched. The company wants to encourage non-employees to contribute, so it's working to broaden the team of developers to extend outside the company.
This is paying off in a burgeoning community of startups that want to add something to the Docker ecosystem. Companies like Tutum, Orchard, and StackDock, for instance, let you build up your Dockerfile interactively in a browser. When it's done, you push a button, and it's deployed to their cloud at prices that begin at $5 per month for 1GB of RAM. There are others like Quay.io, which offers to host your Docker repositories, and Serf, a service discovery and orchestration tool that will help Docker containers learn about one another.
There are also plenty of other, more established corners of the devops world, including Chef and Puppet, that are taking notice and adapting to the new opportunity to let users build Dockerfiles. This list of names will probably change by the time you read this because it's one of the most exciting segments of a very dynamic world. There will be plenty of mergers, flameouts, and new startups in this area.
These startups show the promise of the technology. StackDock, for instance, lets you assemble your machine from a few standard cards. These will be kept cached locally, and all the machines will start with the same OS and kernel for now. This can dramatically reduce the memory devoted to keeping the same copy of the OS for all of the instances.
Build once, run anywhere
Several people I've spoken with sounded a bit leery when hearing there was another virtual machine solution promising to make code that runs almost anywhere. They've lived through the interest in Pascal, Java, and the rest. The difference is that Docker is much more narrowly focused on packaging the Linux machines that act as the backbone of the Internet. There are no pretenses of taking over the desktop or any other part of the computing world. Docker doesn't want to translate some neutral byte code into local binaries. It wants to package x86 code that works with the Linux kernel. These are simpler goals.
Docker began as a tool to help the developer package up a Linux application, and even after all the hype, it remains just that: a container-building tool that works efficiently and cleverly. Will it sweep through data centers? Many Linux developers will love it. They'll be able to build up nice machines on their desk and ship them off to the cloud without having to waste extra time figuring out how to reconfigure their cloud. Docker shifts the focus to the most important part of the equation: the app. Instead of buying multiple machine instances, they'll be buying compute time. It's entirely possible that many of the clouds will morph into farms for running Docker containers.
There's no doubt that the ease and simplicity of Docker mean that many will start incorporating it into their stacks. It will become one of the preferred ways to ship around code. But for all of its promise, I still feel like everything is a bit too new.
Toward the end of the process, I started wondering about this entire operation. It's wholly possible to put a Docker container inside a Vagrant or VirtualBox VM that is sitting on the operating system. If this is a cloud machine, the operating system itself could be sitting on some hypervisor. There's plenty of virtualization going on. If it were a thriller mystery, the protagonist would be peeling off masks again and again and again.
At its root, Docker is solving a problem caused by a failure of operating system design. The old ideas of isolating users and jobs in an operating system aren't good enough. Somehow the developers and the staff need another, more powerful force field to stop the software from messing with each package. The success of Docker is one step toward this redesign, but it's clearly more of a Band-Aid than the kind of unifying vision that the operating system world needs.
Who knows when this newer, better, and cleaner model will emerge, but until it does, Docker is one of the simplest ways of using some virtual duct tape to wall off the applications from each other. The issues with ghosts and disk space will be solved. The tool will become less command-line driven. Anyone building software to run in production on Linux boxes will love the flexibility it brings, and that will drive plenty of interest over the next five years.