The hunt for a better Dockerfile

Source

Time to thank Dockerfiles for their service and send them on their way

For why I don't think Dockerfiles are good enough anymore, click here. After writing about my dislike of Dockerfiles and what I think is a major regression in the tools Operations teams had to work with, I got a lot of recommendations of things to look at. I'm going to try to do a deeper look at some of these options and see if there is a reasonable option to switch to.

My ideal solution would be an API I could hit and just supply the parameters for the containers to. This would let me standardize the process with the same language I use for the app, write some tests around the containers and hook in things like CI logging conventions and exception tracking.

BuildKit

BuildKit is a child of the Moby project, an open-source project designed to advance the container space to allow for more specialized uses for containers. Judging from its about page, it seems to be staffed by some Docker employees and some folks from elsewhere in the container space.

What is the Moby project? Honestly I have no idea. They have on their list of projects high-profile things like containerd, runc, etc. You can see the list here. This seems to be the best explanation of what the Moby project is:

Docker uses the Moby Project as an open R&D lab, to experiment, develop new components, and collaborate with the ecosystem on the future of container technology. All our open source collaboration will move to the Moby project.

My guess is the Moby project is how Docker gets involved in open-source projects and in turns open-sources some elements of its stack. Like many things Docker does, it is a bit inscrutable from the outside. I'm not exactly sure who staffs most of this project or what their motivations are.

BuildKit walkthrough

BuildKit is built around a totally new model for building images. At its core is a new format for defining builds called LLB. It's an intermediate binary format that uses the Go Marshal function to seralize your data. This new model allows for actual concurrency in your builds, as well as a better model for caching. You can see more about the format here.

LLB is really about decoupling the container build process from Dockerfiles, which is nice. This is done through the use of Frontends, of which Docker is one of many. You run a frontend to convert a build definition (most often a Dockerfile) into LLB. This concept seems strange, but if you look at the Dockerfile frontend you will get a better idea of the new options open to you. That can be found here.

Of the most interest for most folks is the inclusion of a variety of different mounts. You have: --mount=type=cache which takes advantage of the more precise caching available due to LLB to persist the cache between building invocations. There is also --mount=type=secret which allows you to give the container access to secrets while ensuring they aren't baked into the image. Finally there is --mount=type=ssh which uses SSH agents to allow containers to connect using the hosts SSH to things like git over ssh.

In theory this allows you to build images using a ton of tooling. Any language that supports Protocol Buffers could be used to make images, meaning you can move your entire container build process to a series of scripts. I like this a lot, not only because the output of the build process gives you a lot of precise data about what was done, but you can add testing and whatever else.

In practice, while many Docker users are currently enjoying the benefits of LLB and BuildKit, this isn't a feasible tool to use right now to build containers using Go unless you are extremely dedicated to your own tooling. The basic building blocks are still shell commands you are executing against the frontend of Docker, although at least you can write tests.

If you are interested in what a Golang Dockerfile looks like, they have some good examples here.

buildah

With the recent announcement of Docker Desktop new licensing restrictions along with the IP based limiting of pulling images from Docker Hub, the community opinion of Docker has never been lower. There has been an explosion of interest in Docker alternatives, with podman being the frontrunner. Along with podman is a docker build alternative called buildah. I started playing around with the two for an example workflow and have to say I'm pretty impressed.

podman is a big enough topic that I'll need to spend more time on it another time, but buildah is the build system for podman. It actually predates podman and in my time testing it, offers substantial advantages over docker build with conventional Dockerfiles. The primary way that you use buildah is through writing shell scripts to construct images, but with much more precise control over layers. I especially enjoyed being able to start with an empty container that is just a directory and build up from there.

If you want to integrate buildah into your existing flow, you can also use it to build containers from Dockerfiles. Red Hat has a series of good tutorials to get you started you can check out here. In general the whole setup works well and I like moving away from the brittle Dockerfile model towards something more sustainable and less dependent on Docker.

I've never heard of PouchContainer before, an offering from Alibaba but playing around with it has been eye-opening. It's much more ambitious than a simple Docker replacement, instead adding on a ton of shims to various container technologies. The following diagram lays out just what we're talking about here:

The CLI called just pouch includes some standard options like building from a Dockerfile with pouch build. However this tool is much more flexible in terms of where you can get containers from, including concepts like pouch load which allows you to load up a tar file full of containers it will parse. Outside of just the CLI, you have a full API in order to do all sorts of things. Interested in creating a container with an API call? Check this out.

There is also a cool technology they call a "rich container", which seems to be designed for legacy applications where the model of one process running isn't sufficient and you need to kick off a nested series of processes. They aren't wrong, this is actually a common problem when migrating legacy applications to containers and it's not a bad solution to what is an antipattern. You can check out more about it here.

PouchContainer is designed around kubernetes as well, allowing for it to serve as the container plugin for k8s without needing to recompile. This combined with a P2P model for sharing containers using Dragonfly means this is really a fasinating approach to the creation and distribution of containers. I'm surprised I've never heard of it before, but alas looking at the repo it doesn't look like it's currently maintained.

Going through what is here though, I'm very impressed with the ambition and scope of PouchContainer. There are some great ideas here, from models around container distribution to easy to use APIs. If anyone has more information about what happened here or if is a sandbox somewhere I can use to learn more about this, please let me know on Twitter.

Packer, for those unfamiliar with it, is maybe the most popular tool out there for the creation of AMIs. These are the images that are used when an EC2 instance is launched, allowed organizations to install whatever software they need for things like autoscaling groups. Packer uses two different concepts for the creation of images:

This allows for organizations that are using things like Ansible to configure boxes after they launch to switch to baking the AMI before the instance is started. This saves time and involves less overhead. What's especially interesting for us is this allows us to set up Docker as a builder, meaning we can construct our containers using any technology we want.

How this works in practice is we can create a list of provisioners in our packer json file like so:

"provisioners": [{
        "type": "ansible",
        "user": "root",
        "playbook_file": "provision.yml"
    }],

So if we want to write most of our configuration in Ansible and construct the whole thing with Packer, that's fine, We can also use shell scripts, Chef, puppet or whatever other tooling we like. In practice you define a provisioner with whatever you want to run, then a post-processor pushing the image to your registry. All done.

Summary

I'm glad that there exists options for organizations looking to streamline their container experience. If I were starting out today and either had existing Ansible/Puppet/Chef infrastructure as code, I would go with Packer. It's easy to use and allows you to keep what you have with some relatively minor tweaks. If I were starting out fresh, I'd see how far I could get with buildah. There seems to be more community support around it and Docker as a platform is not looking particularlly robust at this particular moment.

While I strongly prefer using Ansible for creating containers vs Dockerfiles, I think the closest to the "best" solution is the buildkit Go client approach. You would still get the benefits of buildkit while being able to very precisely control exactly how a container is made, cache, etc. However the buildah process is an excellent middle group, allowing for shell scripts to create images that, ideally, contain the optimizations inherit with the newer process.

Outstanding questions I would love the answers to:

  • Is there a library or abstraction that allows for a less complicated time dealing with buildkit? Ideally something in Golang or Python, where we could more easily interact with it?
  • Or are there better docs for how to build containers in code with buildkit that I missed?
  • With buildah are there client libraries out there to interact with its API? Shell scripts are fine, but again ideally I'd like to be writing critical pieces of infrastructure in a language with some tests and something where the amount of domain specific knowledge would be minimal.
  • Is there another system like PouchContainer that I could play around with? An API that allows for the easy creation of containers through standard REST calls?

Know the answers to any of these questions or know of a Dockerfile alternative I missed? I'd love to know about it and I'll test it. Twitter