TDM 40200: Project 12 — 2023

Motivation: Containers are everywhere and a very popular method of packaging an application with all of the requisite dependencies. In the previous series of projects you’ve built a web application. While right now it may be easy to share and run your application with another individual, as time goes on and packages are updated, this is less and less likely to be the case. Containerizing your application ensures that the application will have the proper versions of the proper packages available in the proper location to run.

Context: This is a second of a series of projects focused on containers. The end goal of this series is to solidify the concept of a container, and enable you to "containerize" the application you’ve spent the semester building. You will even get the opportunity to deploy your containerized application!

Scope: Python, containers, UNIX

Learning Objectives
  • Improve your mental model of what a container is and why it is useful.

  • Use docker to build a container image.

  • Understand the difference between the ENTRYPOINT and CMD commands in a Dockerfile.

  • Use docker to run a container.

  • Use docker to run a shell inside of a container.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Questions

Question 1

In this project, we will begin learning about the various docker commands. In addition, we will learn about some of the Dockerfile commands, and even build and use some simple containers!

Like the last project, we will be working in a virtual machine (VM). By using a VM on Anvil, we are able to ensure that everyone has the proper permissions in order to run each and every one of the necessary commands to build and run Docker images. There are 3 main steps needed in order to both get a VM up and running on Anvil resources, and connect and get a shell on the VM from Anvil.

  1. Get a terminal on Anvil — you may complete this part however you like. I like to use ssh to connect to Anvil from my local machine, however, you may also use ondemand.anvil.rcac.purdue.edu, launch a Jupyter Lab session, and launch a terminal from within Jupyter Lab. Either works equally as well as the other.

  2. Clear out any potential SLURM environment variables:

    for i in $(env | awk -F= '/SLURM/ {print $1}'); do unset $i; done;
  3. Launch SLURM job with 4 cores and about 8 GB of memory and get a shell into the given backend node:

    salloc -A cis220051 -p shared -n 4 -c 1 -t 04:00:00

    This job will only buy you 4 hours of time on the backend node. If you need more time, you will need to re-launch the job and change the arguments to salloc to request more time.

  4. Once you have a shell on the backend node, you will need to load the qemu module:

    module load qemu
  5. Next, copy over a fresh VM image to use for this project:

    cp /anvil/projects/tdm/apps/qemu/images/alpine.qcow2 $SCRATCH

    If at any time you want to start fresh, you can simply copy over a new VM image from /anvil/projects/tdm/apps/qemu/images/alpine.qcow2 to your $SCRATCH directory. Any changes you made to the previous image will be lost. This is good to know in case you want to try something crazy but are worried about breaking something! No need to worry, you can simply re-copy the VM image and start fresh anytime!

  6. The previous command will result in a new file called alpinel.qcow2 in your $SCRATCH directory. This is the VM image you will be using for this project. Now, you will need to launch the VM:

    qemu-system-x86_64 -vnc none,ipv4 -hda $SCRATCH/alpine.qcow2 -m 8G -smp 4 -enable-kvm -net nic -net user,hostfwd=tcp::2200-:22 &

    The last part of the previous command forwards traffic from port 2200 on Anvil to port 22 on the VM. If you receive an error about port 2200 being used, you can change this number to be any other unused port number. To find an unused port you can use a utility we have available to you.

    module use /anvil/projects/tdm/opt/core
    module load tdm
    find_port

    The find_port command will output an unused port for you to use. If, for example, it output 12345, then you would change the qemu command to the following.

    qemu-system-x86_64 -vnc none,ipv4 -hda $SCRATCH/alpine.qcow2 -m 8G -smp 4 -enable-kvm -net nic -net user,hostfwd=tcp::12345-:22 &
  7. After launching the VM, it will be running in the background as a process (this is what the & at the end of the command does). After about 15-30 seconds, the VM will be fully booted and you can connect to the VM from Anvil using the ssh command.

    ssh -p 2200 tdm@localhost -o StrictHostKeyChecking=no

    You may be prompted for a password for the user tdm. The password is simply purdue.

    If in a previous step you changed the port from say 2200 to something like 12345, you would change the ssh command accordingly.

  8. Finally, you should be connected to the VM and have a new shell running inside the VM, great! If you were successful, contents of the terminal should look very similar to the following.

Successfully connected to the VM
Figure 1. Successfully connected to the VM

For this question, just include a screenshot of your terminal after you have successfully connect to the VM.

If at any time you would like to "save" your progress and restart the project at a later date or time, you can do this by exiting the VM by running the exit command. Next, type jobs to find the qemu job number (probably 1). Finally, bring the qemu command to the foreground by typing either fg 1 or fg %1 followed by Ctrl+c. This will kill the VM and you can restart the project at a later date or time by simply using the same alpine.qcow2 image you used previously.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 2

Whoa, you may have noticed things look a little bit different from the previous project — that’s okay! We made a few modifications that will be useful to you during this project. Let’s test out the most useful new feature.

In your terminal in the VM, list all of the files as follows:

ls -la

Okay, nothing special yet. That is to be expected. Now, in the same terminal session, type the letter "l" and immediately pause for a second. You will quickly notice that the terminal shows "s -la" in light grey text after your initially typed "l". We’ve installed a program that remembers your shell history and does its best to predict what you will type based on your previous commands. If you press the right arrow on your keyboard, the rest of the "ls -la" command will be typed out fully. This is an extrememly useful feature, especially as we are juggling various docker commands that can be long and confusing. For example, you can type "docker" and start typing the up arrow on your keyboard and this tool will cycle through all of your previous commands that started with "docker".

Another way to remember/recall previous commands you’ve run is to open the shell history search interface by holding Ctrl+r and then beginning your search as you type.

Okay, try running the command docker and docker ps and docker images. Follow these command up with "docker" followed by you pressing the up arrow on your keyboard to cycle through your previous commands. Once you are comfortable with this functionality, go ahead and take a screenshot of some of your outputs from these docker commands and include it in your submission.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 3

Next, let’s build a barebones Docker image using the docker command and a Dockerfile.

A Dockerfile is a text file that contains a set of instructions for building a Docker image.

The docker command is a command line interface (CLI) that allows you to interact with Docker.

A Docker image is a essentially a zipped up tarball of a file system that contains all of the files and dependencies needed to run a program. You can think of it as the ubuntu filesystem that you extracted and used with chroot in the previous project.

For a very barebones image, your Dockerfile will only need to contain two lines. The first line will be a FROM command. This command will tell Docker what base image to build on top of. It is very common to choose a barebones operating system image like alpine or ubuntu as the base image.

The second line will be a CMD command. This command will tell Docker what command to run when a container is started from the image.

Dockerfile
FROM OS
CMD ["command", "arg1", "arg2"]

Use your favorite command line text editor (the image has nano, vim, and emacs installed already) to create a new file called Dockerfile in your home directory. Replace "OS" with the base image you want to use. For this project, we will use an ubuntu image from here — specifically the newest stable version of ubuntu, which is currently 22.04 — ubuntu:22.04. Here, ubuntu is the repository namespace and 22.04 is the tag specifying a version of the image. While _technically ubuntu could put all sorts of different containers in the ubuntu namespace under different tags, it is customary to use the tag to specify the version of the image.

Next, replace "command" with the command you want to run when a container is started from the image. For now, let’s use the most basic shell that is available on many linux operating systems, /bin/sh. If we had multiple arguments to pass to the command, we would add them to the list of arguments after the first argument. For example, if we wanted to run the command echo "hello world", we would use the following CMD ["echo", "'hello world'"] command.

Once complete, save the file and close the text editor. Now, its time to build our image! Run the following command to build the image:

cd
docker build -t myfirstimage .

Here, we are using the -t flag to specify a tag for our image. This tag will be used to refer to our image in the future. In this case, we are using the tag myfirstimage. If you want to use a different tag, you can replace myfirstimage with whatever you want. The "." denotes the current working directory, which is where our Dockerfile is located. If there was no file named Dockerfile in our current working directory, we would have to specify the name of the file we want to use by using the -f <filename> flag. This is useful if you have multiple dockerfiles in a single directory.

Once the image is built, you can check to see that it is there by running the following command:

docker images

You will notice that there are 2 images available: ubuntu:22.04 and myfirstimage. The ubuntu:22.04 image is the base image we used to build our image on top of. The myfirstimage image is the image we just built. Very cool!

Now, let’s run our image! Run the following command to run our image:

docker run -it myfirstimage

The -i stands for interactive — without it, we would not be able to interact with the container — commands would just be shown with no output. The -t stands for tty — without it, we would not have a functioning terminal. Essentially, we need both of these flags in order to have a shell running in our container.

Congratulations! You now have a shell (/bin/sh) running in a container!

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 4

Okay, now that you have a shell running in the container, let’s take a minute to clarify the last line of our Dockerfile.

There are two important commands that we need to delineate: CMD and ENTRYPOINT. It is kind of a mess, but it is important to take the time to understand the differences — otherwise it will be more difficult to debug your containers in the future.

The following Dockerfile has a single CMD line. In a Dockerfile, there can only be a single CMD line — if there is more, only the last CMD line will be respected.

Dockerfile
FROM ubuntu:22.04
CMD ["/bin/sh", "-c", "echo cwd: $PWD"]

This CMD is in exec form. This means that:

  1. There is the use of square brackets around the arguments.

  2. The first argument is an executable file.

Build the image and run it. What was your output?

docker build -t myfirstimage .
docker run -it myfirstimage

Now, run it and pass in a different command. What was your output?

docker run -it myfirstimage /bin/sh

Modify the Dockerfile and rebuild your image.

Dockerfile
FROM ubuntu:22.04
CMD ["echo", "cwd: $PWD"]
docker build -t myfirstimage .
docker run -it myfirstimage

What happens? Instead of "cwd: /" you get "cwd: $PWD". This is because variable substitution doesn’t occur since a shell isn’t processing the commands. We can, however, use the shell form. This means that:

  1. There is no use of square brackets around the arguments.

  2. The commands and arguments are passed to the sh shell.

Dockerfile
FROM ubuntu:22.04
CMD echo "cwd: $PWD"
docker build -t myfirstimage .
docker run -it myfirstimage

You once again get "cwd: /" since the sh shell is performing the variable substitution! Behind the scenes it is really running /bin/sh -c "echo cwd: $PWD".

Finally, there is another series of scenarios that we can explore that have to do with our ENTRYPOINT. The first being — what happens if we are not using the shell form of CMD and our first argument is not and executable like echo or /bin/sh? Let’s find out!

Dockerfile
FROM ubuntu:22.04
CMD ["cwd: $PWD"]
docker build -t myfirstimage .
docker run -it myfirstimage

What happens? It fails! Docker doesn’t understand what to do with that, since it isn’t anything executable. In these scenarios, you need to specify an ENTRYPOINT.

Dockerfile
FROM ubuntu:22.04
CMD ["cwd: $PWD"]
ENTRYPOINT ["echo"]
docker build -t myfirstimage .
docker run -it myfirstimage

It works just like when we did the following!

Dockerfile
FROM ubuntu:22.04
CMD ["echo", "cwd: $PWD"]

Or does it? In this case, once again there is no variable substitution because a shell is not processing the commands. However, you will find things are different than before. Before, you could run the following:

docker run -it myfirstimage /bin/sh

The result would be that the contents of the CMD, CMD ["echo", "cwd: $PWD"], would be effectively replaced and a shell would be spawned. However, try running it with the following Dockerfile.

Dockerfile
FROM ubuntu:22.04
CMD ["cwd: $PWD"]
ENTRYPOINT ["echo"]
docker build -t myfirstimage .
docker run -it myfirstimage /bin/sh

What happens? It does not do what one might expect! It simply prints out "/bin/sh". Why? Well, the arguments after docker run do not replace our ENTRYPOINT, just our CMD. So, in this case, we essentially ran echo /bin/sh! In fact, if you gave multiple parameters to ENTRYPOINT — none of them would be replaced.

Dockerfile
FROM ubuntu:22.04
CMD ["cwd: $PWD"]
ENTRYPOINT ["echo", "$PWD"]
docker build -t myfirstimage .
docker run -it myfirstimage /bin/sh
result
$PWD /bin/sh

In this example, ENTRYPOINT is in exec form — it has the square brackets. ENTRYPOINT also has a shell form.

Dockerfile
FROM ubuntu:22.04
CMD ["cwd: $PWD"]
ENTRYPOINT echo $PWD
docker build -t myfirstimage .
docker run -it myfirstimage /bin/sh
result
/

Here, what is actually being run is /bin/sh -c "echo $PWD". When ENTRYPOINT is run using shell form, CMD is completely ignored, and, because CMD is completely ignored, the /bin/sh argument passed as a part of the docker run command is also ignored. A side effect is that that signals are not passed properly using this method, this will effect stopping the container and the first process running in the container.

Hopefully this gives you a taste of the myriad of capabilities that CMD and ENTRYPOINT provide. It is a mess, however, Docker does provide some "best practices". In a nutshell:

  1. Stick to the exec forms for both CMD and ENTRYPOINT.

  2. If you want variable substitution to work, directly execute the shell of your choice. Some examples:

    Dockerfile
    FROM ubuntu:22.04
    CMD ["/bin/bash", "-c", "echo cwd: $PWD"]
    Dockerfile
    FROM ubuntu:22.04
    ENTRYPOINT ["/bin/sh", "-c"]
    CMD ["echo cwd: $PWD"]

For this question, submit the text for a Dockerfile that would result in the following output when run.

docker run -it myfirstimage
result
/bin/bash
docker run -it myfirstimage /bin/sh
result
# you get an `sh` shell prompt in the container
docker run -it myfirstimage 'echo $SHELL -- cool'
result
/bin/bash -- cool
docker run -it myfirstimage "echo $SHELL -- cool"
result
/bin/zsh -- cool

For this last example — remember taht we are in the zsh shell ourselves, and that double quotes are first interpreted by our current shell before executed.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 5

Okay, we are making some progress, but the complexity of ENTRYPOINT and CMD are certainly enough to slow us down a bit. Rebuild myfirstimage with the following Dockerfile.

Dockerfile
FROM ubuntu:22.04
CMD ["/bin/bash"]
docker build -t myfirstimage .
docker run -it myfirstimage

Once you are in a shell in your container, go ahead and run the following to create a file called "imhere" in the /root/ directory.

cd /root
touch imhere

Verify that the file exists:

ls -la /root/

Great! Now go ahead and exit the container. Now, rerun the container, and verify that the file is still there.

docker run -it myfirstimage
ls -la /root/

It is no longer there! The container executes our shell, we run some commands, and as soon as we exit the container our changes are all gone! Containers are ephemeral — any changes you make will only survive for the duration that the process running the container exists. When we exit the container the process is no longer running and our changes disappear.

There are a couple of ways around this. One is to use a VOLUME to bind a location outside of our container somewhere inside our container. We will play with this later on. Another way that is super straightforward is to not let our process exit! We can do this by using the -d flag to run the container in the background. Give it a try!

docker run -dit myfirstimage

Whoa, this is way different — there is a long string of characters that are printed, and then we have our regular shell prompt. What is going on?

The long string of characters is the container ID. This is a unique identifier for the container. We can use this to interact with the container.

Okay, great, but do we have to remember that? No! We don’t! You can see all running containers using the following command.

docker ps
output
CONTAINER ID   IMAGE          COMMAND       CREATED          STATUS          PORTS     NAMES
b34dd462b664   myfirstimage   "/bin/bash"   40 seconds ago   Up 39 seconds             goofy_goldwasser

Here, we have an abbreviated container id as well as the name of the image that was used to create the container. Okay, now how do we interact with the container? We can use the docker exec command. This command allows us to execute a command inside of a running container. In this case, we want a shell, so let’s give it a try.

docker exec -it b34dd462b664 /bin/bash

# or, since b34dd462b664 is hard to type, they give us a user friendly name we can use instead

docker exec -it goofy_goldwasser /bin/bash

Great! Let’s repeat the previous steps to create the file in /root/ called "imhere".

cd /root
touch imhere
ls -la /root/

Now, let’s exit the container. Is the container still running? Use docker ps to find out. Okay, great! It is still running — that should mean that if we get another shell inside the container and look for the file, it should still be there. Let’s give it a try.

docker exec -it goofy_goldwasser /bin/bash
ls -la /root/

Indeed, it is! Excellent! While this is great, if, for some reason, the container restarted this file would once again disappear. If we really need some data to persist, we need to use a VOLUME, but we will mess around with this in a future project. What else can we do? What other commands are useful? Well, let’s make our running container stop.

docker stop goofy_goldwasser

Here is a cool feature — we have the shell configured with a docker plugin — this means we have autocompletion on docker related commands. For example, type only "docker stop goo" (or instead of "goo", the very beginning of your container name), then type the "tab" key — it will autocomplete and type out the rest of the container name! This is super useful!

After that, check on your container with docker ps — you’ll find it has stopped! Very cool.

Remember how earlier in the project we mentioned that Docker images are just layers of a read-only filesystem compressed into a zipped up tarball? Well, up until this point I haven’t seen any actual transferrable files, have you? With docker we have the ability to export them using the docker save command. Let’s do this.

docker save myfirstimage > myfirstimage.tar

# or

docker save -o myfirstimage.tar myfirstimage

The result is a tarball that you can transfer to any other system (at least, any with the same architecture) with Docker and use docker load to load it up.

docker load < myfirstimage.tar
docker images

# or

docker load -i myfirstimage.tar
docker images

For this question, just include a screenshot of the terminal contents that demonstrates that you were able to persist the "imhere" file after exiting the container.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.