Docker Volumes Can Be Tricky

November 24, 2019 geeky-tomato Comment

Every technology has a few tricks up its sleeve when it behaves unexpectedly and sometimes illogically, and you need to run into such issues, spend some time figuring them out, and then (hopefully) never stumble upon them again.

Docker is a straightforward and logical container management platform, and reading docs builds a complete understanding of the technology in your mind step by step, page by page. On the other hand, it’s all useless without practice, and building actual containers for the first time often makes people feel confused, as if they just graduated college and got their first job, realizing they don’t really know how to do the actual work.

Docker Volumes might appear as one of the simplest topics of the Docker docs, but don’t let that first impression deceive you: there are quite a few things you need to know. Volumes can be shared between containers, and even exist by themselves, lost and forgotten, taking up space and waiting for their chance to reappear from the abyss.

Volumes vs. Bind-Mounts

There are multiple types of data storage in Docker, but at the moment we’re interested in the two most popular ones: docker-managed volumes and bind-mounts. I won’t go around retyping the official documentation here, so I’ll just mention the main difference.

Docker-managed volumes have data stored by Docker, and usually you will only be able to access it through a container the volume is attached to. It can be called a single-side bind, where you define the path in the container, but never have to worry about it’s location in the host machine.
Bind-mounts are basically shared directories, where you map certain path in the container to the local directory.

Looking at the docker run documentation, you’ll probably notice that there are two command-line options: --volume (-v) and --mount. You’ll probably think, the former is used for docker-managed volumes, and the latter for bind-mounts. Well, that would be wrong. The main difference is the format, as --mount requires more verbose syntax, and also allows bind-mounts.
For the purposes of this article we’ll only use the laconic --volume option, as it’s more than sufficient for our needs.

Persistent Storage? Not So Fast!

Both volumes and bind-mounts are supposed to be persistent (unlike inherently temporary tmpfs), but you still need to keep your eyes open – data is permanently stored in a volume, but to access it you need to make sure you attach the right volume to the container.

Let’s begin with a bind-mount volume:

# create the storage directory
mkdir localstorage

# create and run new container
# make it write into the localstorage directory
sudo docker run --rm --name devtest \
    -v $(pwd)/localstorage:/volumetest busybox \
    sh -c 'echo "Hey Lucky!" > /volumetest/hey.txt'
# --rm option means the container is removed as soon as it's done writing into the file

# create another container with the same volume mapping
# and make it read from the file we just created
sudo docker run --rm --name devtest \
    -v $(pwd)/localstorage:/volumetest busybox \
    cat /volumetest/hey.txt
# It should output "Hey Lucky!"

As you can see, bind-mount volumes are pretty straightforward. The file is stored in our local directory, and it’s not going anywhere (unless you delete it). Not let’s try to repeat that using a docker-managed volume:

sudo docker run --rm --name devtest \
    -v /volumetest2 busybox \
    sh -c 'echo "Hello Again!" > /volumetest2/hello.txt'

sudo docker run --rm --name devtest \
     -v /volumetest2 busybox \
     cat /volumetest2/hello.txt

# cat: can't open '/volumetest2/hello.txt': No such file or directory

It didn’t work, and that’s logical; you essentially told docker to create two separate anonymous volumes, and the fact that they both point to the same directory inside the containers doesn’t mean the files will be transferred from one volume to another.

Besides, using the --rm option removes not only the container after it’s done running, but also all anonymous volumes attached to it. This means that by the time we create the second container and try to run the cat command, the original container saying “Hello Again!” is already removed.

So how do we preserve data in a Docker-managed volume? There are two ways: use a named volume, or import it from another container using --volumes-from. I’ll briefly touch both.

Named Docker-Managed Volumes

That’s pretty simple and straightforward. We create a container with a named volume, and it doesn’t get anywhere until we say so.

sudo docker run --rm --name devtest \
    -v heisenberg:/volumetest4 busybox \
    sh -c 'echo "Heisenberg" > /volumetest4/saymyname.txt'
# --rm removes the container, but not the volume

sudo docker run --rm --name devtest \
    -v heisenberg:/volumetest4 busybox \
    cat /volumetest4/saymyname.txt
# this command will output "Heisenberg"

# it's time to get rid of Heisenberg
sudo docker volume rm heisenberg

Importing Volumes

We will create a container storing our volumes, and use –volumes-from to import them into another container:

# first we create an empty container with a docker-managed volume attached
sudo docker run --name mystorage \
     -v /songs busybox
# it doesn't do anything but exists, thus serving as a volume holder
# please note that we don't use --rm option here

# now we create another container and import volumes
sudo docker run --rm --name mysongs \
     --volumes-from mystorage busybox \
     sh -c 'echo "I am still standing!" > /songs/elton.txt'

# the mysongs container is stopped and removed, we create another one
sudo docker run --rm --name mysongs \
     --volumes-from mystorage busybox \
     cat /songs/elton.txt
# it should output "I am still standing!"

# now let's remove the storage container
# don't forget the -v option to remove the unnamed volume attached to it
sudo docker rm -v mystorage

This usage pattern can also be applied with named volumes or bind-mounts. There are a couple of most common use cases:

If your container has a huge number of volumes, you can use it to keep them all in one place, simplifying your work and minimizing the chance of messing something up by mistake.
If you have dynamic containers using the volumes, a dedicated volumes container would be a single place to look for all the containers in case you need to modify them or backup the data.

How Docker Compose Handles Volumes

Docker Compose behavior doesn’t differ much from the command line. You can create a docker-managed volume (named or unnamed), or a bind-mount, and almost everything works as expected.

The only difference is that unnamed docker-managed volumes persist data stored in them after rebuilding an image or recreating a container.

Let’s create a docker-compose.yml to setup the container, attach an unnamed volume, and write into a new file in that volume:

version: "3.5"

services:
    pinkfloyd:
        image: busybox
        volumes:
            - /data/lyrics
        command: sh -c 'echo "For a lead role in a cage?" > /data/lyrics/wish.txt'

Now run sudo docker-compose up to perform the command.

As you can see, the volume doesn’t have a name, and in regular circumstances it would’ve been lost after recreating the container.
However, the data persists. Let’s edit the docker-compose.yml to replace the write command with an output from the file:

...
        command: cat /data/lyrics/wish.txt

We no longer write into that container, but only read from it. Let’s apply the changes to the container:

sudo docker-compose up
# you should see "For a lead role in a cage?" displayed

As you can see, the data is still there. Although this behavior differs from how Docker usually handles unnamed modules, it makes a lot of sense. But you need to be aware that’s going on with your volumes, so I wouldn’t recommend storing data in an unnamed docker-managed volume as it would make accidental data loss much more likely.

Persistent data in unnamed containers might also lead to unexpected results. Let’s say you create a MySQL container via Docker Compose. After the first run it creates the mysql database, and inserts the user entry with the password you provided. Even if you’ve chosen to store the data in an unnamed docker-managed volume, the data is still in there.

You modify docker-compose.yml to change the root password to MySQL, and run docker-compose up -d once again. The container gets recreated, but the volume is still assigned, so database already exists and the new root password will not be set. Long story short, you cannot connect to MySQL because your new root password doesn’t work, and you lost the old password.