Data Persistence

Milan Deket Categories: Back-end development Date 21-Jan-2020 5 minutes to read
Data Persistence

Table of contents

    Data Persistence

    So far, we have written about the data necessary for running containers. The data that were created inside of the container after it has been run have not been mentioned. Containers are immutable and temporary which means that they never change their state. Instead, they create new state and they are not supposed to work for long time until they are destroyed.

    In a nutshell, the idea is to discard or destroy containers at any given time, while a new container from Docker image can always be created. This concept of temporary containers that do not change make us not to change things once we run them. Should there be any change in configuration, or in case we want to run a new version of a container, then we will stop and destroy old containers and run new ones. This kind of concept has its own advantages and disadvantages. For instance, what happens in case a container contains a database which we need to keep? We cannot destroy it because, in that way, we will lose all the important data for the users of the app. Ideally, containers should not contain those sensitive data which they would record in the base in combination with files which run the application itself. This kind of concept is known as separation of concerns. Containers such as nginx i mysql have already been configured to keep the data until the container is destroyed. Even stopping or destroying of these two kinds of containers would not destroy the data which have been created during their work. Couple of years ago, this problem didn’t exist because all the data was simply saved directly on the computer’s disk. In other words, there were no containers, so there was no concern regarding this issue. Everything was persistent itself. In this new age of containers and automatic scaling of applications this represents a big problem. Docker offers two solutions to this problem:

    1. data volumes
    2. bind mounts

    The first option, docker volumes, creates a special location outside of the container where the data that require persistence will be stored and kept. This will keep the data even if we delete the container and it will enable us to add that location to any other kind of container afterwards. The container sees that as a simple path to the file.

    The other option allows us to share directly between a host and Docker, that is between the computer where Docker container was run and the container itself. For the container, this path will be the same as any other local path, because it will not be aware that that specific directory or a file is located on the host.

    data-persistence_news-details.jpg

    Milan Deket

    Data volumes

    In case we use Docker image from Docker hub and we check dockerfile, there we can see whether the volume will be created and used. VOLUME is another command that can be listed in dockerfile, which we will use to define where volume will be located. It is important to remember that if we want to delete volume in this case, we have to do that manually. Volume will not disappear if we stop or delete the container. This is done for security measures so that we would not delete data together with the container by accident. We can check that if we run mysql and keep track of what’s happening with its data which is stored in volume. The command for running of mysql is:

    docker container run -d --name mysql –eMYSQL_ALLOW_EMPTY_PASSWORD=True mysql

    If we call inspect over mysql container and check Mounts section, we will see that volume which has been assigned a name by Docker has been created. If we list all of the volumes with the command:
     

    docker volume ls

    we will see that volume with the same name has been located here when we called inspect over mysql container. This means that the container got its unique location on host to store the data. All of these things have been done in the background. In other words, they have been mapped on the location in the container to make the container think that it is storing everything on the location /var/lib/mysql in its local file system.

    In case we don’t list the name for volume, Docker will assign a name to it itself, and mainly that name will consist of unrelated letters and numbers which are not easy to remember. Naturally, this is not a problem in case there is only one volume, but if there are two or more volumes which do not have names, then this can become a serious problem.

    To see which volume is used in which container, we would have to run a command inspect on each running container and to check Mounts section, but even this would not give necessary information, because there can be volumes which are not assigned to any container. To make volumes this easier to use, it is possible to attach a volume when running a container with -v option and parameters which include volume name and the path to directory.

    The example for mysql is:

    docker container run -d --name mysql -e MYSQL_ALLOW_EMPTY_PASSWORD=True -v mysql-db:/var/lib/mysql mysql

    Using this command we have defined the following:

    1. -d - The container will be run as a separate process.
    2. --name mysql - The container will be named mysql.
    3. -e MYSQL_ALLOW_EMPTY_PASSWORD=True - Environment variable which we will use to tell mysql container that the password to access the database is not needed.
    4. -v mysql-db:/var/lib/mysql - Creating volume named mysql-db on the path ‘:/var/lib/mysql’.
    5. mysql – the name of Docker image which is used to create the container.

    Bind Mounts

    Bind mount is actually just a mapping of files or a directory from host into the file or directory within the container. It represents two locations which indicate the same physical location on the disk. It behaves the same way as volume. By deleting the container, files which have been used for bind mount will not disappear as long as bind mount itself is not deleted. In other words, the files will exist on the host as long as the bind mount exists. Unlike volume, bind mounts cannot be defined in dockerfile. They can only be defined while executing a command docker container run. The form is very similar, as we can see in the following example:

    docker container run -d --name nginx -p 80:80 -v /Users/milandeket/index.html:/usr/share/nginx/html/index.html nginx

    By using this command we have defined the following:

    1. -d - Container will be run as a separate process.
    2. --name mysql - Container will get a name mysql.
    3. -p 80:80 - Port 80 on host will be directed to the port 80 in the container which is generally listened by nginx. This means that we can open a browser and see the landing page of nginx on the address http://localhost.
    4. -v /Users/milandeket/index.html:/usr/share/nginx/html/index.html - With this, we have mapped the location /usr/share/nginx/html/index.html from the container to the path /Users/milandeket/index.html on the host.
    5. nginx - Docker image which will be run.

    One of the options which we can add is read-only so that the container will be able to read-only from that file.

    We’ve explained how containers can be run on one host, in the next Docker Serie section we will explain how to run containers in a cluster. Stay tuned.

    Milan Deket Our Team
    Milan Deket Partner & Tech Lead
    Engineer and Tech lead with nearly a decade of experience. Startup enthusiast. Passionate about building systems that help clients' businesses grow and succeed.