STAT 39000: Project 12 — Fall 2021

Motivation: Containers are a modern solution to packaging and shipping some sort of code in a reproducible and portable way. When dealing with R and Python code in industry, it is highly likely that you will eventually have a need to work with Docker, or some other container-based solution. It is best to learn the basics so the basic concepts aren’t completely foreign to you.

Context: This is the first project in a 2 project series where we learn about containers, and one of the most popular container-based solutions, Docker.

Scope: Docker, unix, Python

Learning Objectives
  • Understand the various components involved with containers: Dockerfile/build file, container image, container registry, etc.

  • Understand how to push and pull images to and from a container registry.

  • Understand the basic Dockerfile instructions.

  • Understand how to build a container image.

  • Understand how to run a container image.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Questions

Question 1

First thing first. Please read this fantastic article for a great introduction to containers. Afterwards, please review the content we have available here.

In this project, we have a special challenge. Brown does not have Docker installed. This is due to a variety of reasons. Brown does have a tool called Singularity installed, however, it is different enough from more common containerization tools, that it does not make sense to learn for your first "container" experience.

To solve this issue, we’ve created a virtual machine that runs Ubuntu, and has Docker pre-installed and configured for you to use. To be clear, the majority of this project will revolve around the command line from within Jupyter Lab. We will specifically state the "deliverables" which will mainly be text or images that are copied and pasted in Markdown cells.

Please login and launch a Jupyter Lab session. Create a new notebook to put your solutions, and open up a terminal window beside your notebook.

In your terminal, navigate to /depot/datamine/apps/qemu/scripts/. You should find 4 scripts. They perform the following operations, respectively.

  1. Copies our VM image from /depot/datamine/apps/qemu/images/ to /scratch/brown/$USER/, so you each get to work on your own (virtual) machine.

  2. Creates a SLURM job and provides you a shell to that job. The job will last 4 hours, provide you with 4 cores, and will have ~6GB of RAM.

  3. Runs the virtual machine in the background, in your SLURM job.

  4. SSH’s into the virtual machine.

Run the scripts in your Terminal, in order, from 1-4.

cd /depot/datamine/apps/qemu/scripts/
./1_copy_vm.sh
./2_grab_a_node.sh
./3_run_a_vm.sh

You may need to press enter to free up the command line.

./4_connect_to_vm.sh

You will eventually be asked for a password. Enter thedatamine.

Remember, to add an image or screenshot to a markdown cell, you can use the following syntax:

![](/home/kamstut/my_image.png)
Items to submit
  • A screenshot of your terminal window after running the 4 scripts.

Question 2

Awesome! Your terminal is now connected to an instance of Ubuntu with Docker already installed and configured for you! Now, let’s get to work.

First thing is first. Let’s test out pulling an image from the Docker Hub. wernight/funbox is a fun image to do some wacky things on a command line. Pull the image (hub.docker.com/r/wernight/funbox), and verify that the image is available on your system using docker images.

Run the following to get an ascii aquarium.

docker run -it wernight/funbox asciiquarium

Wow! That is wild! You can run this program on any system where an OCI compliant runtime exists — very cool!

To quit the program, press kdb:[Ctrl + c].

For this question, submit a screenshot of the running asciiquarium program.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 3

Okay, that was fun, but let’s do something a little bit more practical. Check out the ~/projects/whin directory in your VM. You should pretty quickly realize that this is our version of the WHIN API that we used earlier on in project (8).

If you recall, we had a lot of "extra" steps we had to take in order to run the API. We had to:

  • Install the Python dependencies.

  • Activate the appropriate Python environment.

  • Set the DATABASE_PATH environment variable.

  • Remember some long and complicated command.

This is a fantastic example of when containerizing your app could be a great idea!

Let’s begin by writing our own Dockerfile.

First thing is first. We want our image to contain the correct version of Python for our app. Our app requires at least Python 3.9. Let’s see if we can find a base image that has Python 3.9 or later. Google "Python docker image" and you will find the following link: hub.docker.com/_/python

Here, we will find a wide variety of different "official" Python docker images. A great place to start. If you click on the "Tags" tab, you will be able to scroll through a wide variety of different versions of Python + operating systems. A great Linux distribution is Debian.

Fun fact: Debian/the Debian project (one of the, if not the most popular linux distribution) was founded by a Purdue alum, Ian Murdock.

Okay, let’s go for the Python 3.9.9 + Bullseye (Debian) image. The tag for the image is python:3.9.9-bullseye. But wait a second. If you look at the space required for the base image — it is already up to 370 or so MB — that is quite a bit! Maybe there is a lighter weight option? If you search for "slim" you will find an image with the tag python:3.9.9-slim-bullseye that takes up only 45 MB by default — much better.

Create a file called Dockerfile in the ~/projects/whin directory. Use vim/emacs/nano to edit the file to look like this:

Dockerfile
FROM python:3.9.9-slim-bullseye

Now, let’s build our image.

docker build -t whin:0.0.1 .

Once created, you should be able to view your image by running the following.

docker images

Now, let’s run our image. After running docker images, if you look under the IMAGE column, you should see an id for you image — something like 3dk35bdl. To run your image, do the following.

docker run -dit 3dk35bdl

Be sure to replace 3dk35bdl with the id of your image. Great! Your image should now be running. Find out by running the following.

docker ps

Under the NAMES column, you will see the name of your running container — very cool! How does this test out anything? Don’t we want to see if we have Python 3.9 running like we want it to? Yes! Let’s get a bash shell inside our container. To do so run the following.

docker exec -it suspicious_lumiere /bin/bash

Replace suspicious_lumiere with the name of your container. You should now be in a bash shell. Awesome! Run the following to see what version of Python we have installed.

python --version
Output
Python 3.9.9

Awesome! So far so good! To exit the container, type and run exit. Take a screenshot of your terminal after following these steps and add it to your notebook in a markdown cell.

To clean up and stop the container, run the following.

docker stop suspicious_lumiere
Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 4

Okay, great! We have version 0.0.1 of our whin image. Great.

Now let’s make this thing useful. Use vim/emacs/nano to edit the ~/projects/whin/Dockerfile to look like this:

Dockerfile
FROM python:3.9.9-slim-bullseye

WORKDIR /app

RUN python -m pip install fastapi[all] pandas aiosql fastapi-responses cyksuid httpie

COPY . .

EXPOSE 21650

CMD ["uvicorn", "app.main:app", "--reload", "--port", "21650", "--host", "0.0.0.0"]

Here, do your best to explain what each line of code does. Build version 0.0.2 of your image, and run it.

Okay, in theory, that last line should run our API — awesome! Let’s check the logs to see if it is working.

docker logs my_container_name

Remember, to get your container name, run docker ps and look under the NAME column.

What you should get is a Python error! Something about NoneType. Whoops! We forgot to include the DATABASE_PATH environment variable so our API knows where our WHIN database is. That is critical to our API.

This command will be very useful to achieve this!

Modify our Dockerfile to include the DATABASE_PATH environment variable with a value /home/tdm-user/projects/whin/whin.db. Rebuild your image (as version 0.0.2), and run it. Check the logs again, does it appear to be working?

Items to submit
  • The fixed Dockerfile contents in a markdown cell as code (surrounded by 3 backticks).

  • A screenshot (or more) of the terminal output from running the various commands.

Question 5

Okay, there is one step left. Let’s see if the API is really fully working by making a request to it. First, get a shell to the running container.

docker exec -it container_name /bin/bash

Remember, to get your container_name list the running containers using docker ps.

One inside the container, let’s make a request to the API that is running. Run the following:

python -m httpie localhost:21650

If all is well you should get:

Output
HTTP/1.1 200 OK
content-length: 25
content-type: application/json
date: Thu, 18 Nov 2021 20:28:47 GMT
server: uvicorn

{
    "message": "Hello World"
}

Awesome! You can see our API is definitely working, cool!

Okay, one final test. Let’s exit the container and make a request to the API again. After all, it wouldn’t be that useful if we had to essentially login to a container when we want to access an API running in that container, would it?

http localhost:21650

Uh oh! Although our API is running smoothly inside of the container, we have no way of accessing it outside of the container. Remember, EXPOSE only signals that we want to expose that port, it doesn’t actually do that for us. No worries, this can be easily fixed.

docker run -dit -p 21650:21650 --name my_container_name 3kdgj024jn

Here, we named the resulting container my_container_name. This is a cool trick if you get tired of running docker ps to get the name of a newly running container.

Where 3kdgj024jn is the id of your image. Now, let’s try and access the API again.

http localhost:21650

Voila! It works! The following is an equivalent run statement:

docker run -dit -p 21650 --name my_container_name 3kdgj024jn

However, if you want to specify that the API internally is using port 21650, but we want to expose the API running inside our container to outside our container on a different port, say, port 5555, we could run the following.

docker run -dit -p 5555:21650 --name my_container_name 3kdgj024jn

Then, you could access the API by running the following:

http localhost:5555

While our request goes to port 5555, once the request hits the container, it is routed to port 21650 inside the container, which is where our API is running. This can be confusing a may take some experimentation until you are comfortable with it.

Items to submit
  • Screenshot(s) showing the input and output from the terminal.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.