Lately, I've noticed many people bragging (or moaning) about the size of a docker image. Here are some examples.

One of docker's selling points is that containers are lightweight when compared to virtual machines, so why does it matter that your image is a little overweight? I can think of a few reasons. If you know of other reasons, please share them with me in the comments.

1. Large images take longer to download

The time it takes to push or pull an image to a docker registry is the main problem with having large images. On a fresh Digital Ocean droplet, it took me 9 seconds to docker pull busy box, 36 seconds to docker pull postgres, 68 seconds to download docker pull ruby. While these don't seem like a long time, keep in mind that some cloud providers can now provision a small virtual machine for you in under a minute. That means that it takes longer to download your ruby app image to a new docker host than to provision the host itself!

2. Large images take up more disk space

This is the least compelling reason to worry about your image size. Disk space is relatively cheap these days, and you can always add more with block storage if needed. Launching a new container from an image requires very little extra disk space because docker uses UnionFS to add a read-write layer on top of the read only image layers. This means all containers started form the same image share the same read-only layers.

The easiest way I found to tell the size of a docker image is to use the docker images command. The image size is given in the last column on the right.

~# docker images
REPOSITORY          TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
ruby                latest              f0f149c3d6f7        12 days ago         777.5 MB
postgres            latest              b733b00eb1ae        13 days ago         213.9 MB
busybox             latest              8c2e06607696        3 weeks ago         2.433 MB

You can also use docker inspect to find the image size. Docker inspect prints a huge blog of JSON, so I used some extra command line fu to find the VirtualSize field and convert it from bytes to something easier for humans to read.

~# docker inspect ruby | grep -i VirtualSize | awk '{$2/=1000000; printf "%.2f MB\n",$2}'
777.48 MB

Finally, you can actually run the image and see how much disk space the container uses.

~# docker run $image sh -c 'du -sh / 2>/dev/null'
ruby: -> 804M       /

Notice that the size reported inside the container is 3.4% (26.5MB) larger than the reported virtual size of the image. I do not yet know exactly why this is.

If you want to see something really cool, Century Link Labs hosts a site that let's you visualize docker images.

3. Large images contain unnecessary components

The most common reason people use docker is to package their application into an image that is easy to verify, distribute, and run at scale. Depending on the base image you select and how you install (and clean up after) your application's dependencies, your image could contain mostly files and programs that you will never need to run your app.

~# docker run -it busybox /bin/sh
/ # ls -1 /bin /usr/bin /sbin /usr/sbin /usr/local/bin | wc -l
ls: /usr/local/bin: No such file or directory
259
~# docker run -it ruby /bin/sh
# ls -1 /bin /usr/bin /sbin /usr/sbin /usr/local/bin | wc -l
1082

As you can see, the ruby image has four times more commands available than the busybox image, including compilers and other build tools. The running application typically does not require most of these files or programs. Additionally, package managers like apt and yum can leave around lots of artifacts after installation that are not necessary for running your app. If you're not careful, that little Sinatra app you wrote could easily become a 1+ GB image!

How to build smaller images

If you're like me, you're already thinking to yourself, "large docker images are not so great, how can I make mine smaller?" Fortunately, many smart people have found ways to build smaller images, so I will defer to their solutions.