TDM 20100: Project 2 - Introduction to Bash

Project Objectives

This project introduces you to some of the most useful UNIX tools, helps you navigate the filesystem, and enables you to run UNIX commands directly from within your Jupyter notebook.

Learning Objectives
  • Distinguish the differences between /home, /anvil/scratch, and /anvil/projects/tdm

  • Run Bash commands from within JupyterLab

  • Use man to read and learn about UNIX utilities

  • Navigate the UNIX filesystem

  • Analyze files in the UNIX filesystem

  • Create and delete files and folders in UNIX

Dataset

  • /anvil/projects/tdm/data/icecream/breyers/

  • /anvil/projects/tdm/data/flights/

Ways to run bash

There are three different ways to run bash code in Anvil.
1. In a terminal
2. Using !
3. Magic cell

Terminal

Go to File > New > Terminal
This should pop out a terminal in a new window.

This terminal approach allows for man lookups (man command won’t work using the other two approaches).

Try to run this code in the terminal:

# man is short for manual, to quit, press "q"
# use "k" or the up arrow to scroll up, or "j" or the down arrow to scroll down.

man grep

Using '!'

The ! method allows you to run a bash command from within the same cell as a cell containing Python code.

For example, in your Jupyter notebook, you can run this code in your cell:

!ls

import pandas as pd
myDF = pd.DataFrame({'col1': [1, 2, 3], 'col2': [4, 5, 6]})
myDF.head()

!echo "Hello World!"

Magic cell

Code cells that start with % or %% are sometimes referred to as magic cells. Any cell that begins with %%bash will run the bash code in that cell.

A cell will not know what code another cell runs. For example, if you create a new variable in a cell and then write some new bash code in a new cell, the new cell will not recognize the variable from the previous cell.

To see a list of available magics, run %lsmagic in a cell.

The commands listed in the "line" section are run with a single % and can be mixed with other code.

To answer the project questions, please run your bash code using the %%bash magic in Jupyter notebook. This ensures that we are all using the correct shell (there are many shells) and that your work is displayed properly for your grader.

Before moving on to the questions, you may also want to take a look at this video, which shows different ways of running the BASH command on Anvil:

Dr. Ward has some more example with ls and cd commands in this video (please use notebook.anvilcloud.rcac.purdue.edu, instead of ondemand in your project).

Questions

Question 1 (2 points)

The / is the root directory in a UNIX-like system. You can think of it as the top-level folder that contains all other folders on the system. For example, /home is a folder located within the root directory.

The $HOME variable refers to the absolute path of your personal home directory. Inside the root / directory, there is a folder called home, which contains a subfolder named x-<your-username>. This x-<your-username> folder is your personal home directory.

Let’s explore more by doing some exercises below.

  1. Write a bash command to display both your home directory ($HOME) and your current working directory (pwd). These two directories should be the same. Ensure you run the command in the terminal immediately after opening it, without making any changes to the home directory sidebar.

  2. Write a bash command to change your current directory to /anvil/projects/tdm/data using cd command.

  3. Run the same command from Step 1 above again.

  4. Explain any observations you see in the results from Step 1 and Step 2. Explain the difference between $HOME and pwd.

Relevant topics: home, pwd, cd, echo

Deliverables

1a. Code used to answer Step 1, 2, 3
1b. Output from Step 1, 3
1c. Written answer for Step 4

Question 2 (2 points)

Relative paths are an important concept to understand, especially when you try to nagivate files and folders in a UNIX-like operating system.

. represents the current directory - you can think it as "here."

  • cd . means to stay in the current directory

  • ./myscript.sh means to run the myscript.sh file in the current directory

  • mv ./myfile.txt $HOME means to move the myfile.txt from the current directory to my home directory

.. represents the parent directory, relative to the rest of the path.

  • cd .. means to move up one directory

  • mv ../myscript.sh ./ means to move the myscript.sh file from the parent directory to the current directory

Let’s explore more by doing some exercises below.

  1. Write a bash command to change your current directory to /anvil/projects/tdm/data/zillow using cd command.

  2. Run each of the commands individually and print the current working directory for parts a–d. After executing each command, make sure to return to the /anvil/projects/tdm/data/zillow directory. Explain the functionality of each command based on your observations.

    1. cd

    2. cd .

    3. cd ..

    4. cd ../../

    5. ls or ls .

    6. ls -la or ls -la .

    7. ls ../

Relevant topics: pwd, cd, ., .., ls, echo

Deliverables

2a. Code used to answer Step 1, 2
2b. Final current working directory for a, b, c, d
2c. Output of e, f, g
2d. Written explanation of each command does
2e. How does using relative paths benefit you for particular commands like ls? Hint: check your current working directory for g.

Question 3 (2 points)

There’s a quick way to get some information about a file without the need to read them in first like R and Python.

Quick Tip: Tab completion is a very handy trick. When you partially type a directory name, you can press the tab key to see all available options — or it will autocomplete if there’s only one match if it’s in terminal. Give it a try!

cd /anvil/p # then hit the tab key then enter
  1. Go to /anvil/projects/tdm/data/icecream/breyers

  2. Print the first five rows of reviews.csv using head

  3. Print the last five rows of reviews.csv using tail

  4. Print only column names (first row) of reviews.csv using -n option

  5. Run wc reviews.csv and identify which parts of the output represent what information

  6. Get the line count only for the given file using the -l option

Relevant topics: head, tail, wc

Deliverables

3a. The code used to solve all the steps above
3b. The output from Steps 2, 3, 4, 5, and 6
3c. A written explanation for Step 5 (describing the parts of the wc output)

For more practise, please refer to Dr. Ward’s following video which includes examples with head, cut and wc commands (please use notebook.anvilcloud.rcac.purdue.edu to practise). In this video, the cut command is used to extract all of the origin and destination airports from the 1987.csv file in the flights subset directory. The resulting origin and destination airports are stored into a file in their home directory.

Question 4 (2 points)

Those in the following directories have been discussed:

  • $HOME or /home/$USER: your home directory

  • /anvil/projects/tdm/: TDM directory

  • /anvil/projects/tdm/data: where public data lives in TDM directory

There’s one more directory you should know about: $SCRATCH or /anvil/scratch/$USER

Run this command below to see your quote and usage (myquota-this command works only from terminal):

myquota
  1. What are the size limits for your home directory and scratch directory?

  2. Copy the reviews.csv file to your SCRATCH directory using cp

  3. Copy the entire icecream directory to your SCRATCH

  4. Print the list of files and folders in your SCRATCH directory

  5. Delete the copied reviews.csv from your SCRATCH

  6. Delete the copied icecream directory from your SCRATCH

  7. Print the list of files and folders of your SCRATCH directory again

Relevant topics: cp, rm, rmdir

Dr. Ward shows moving some large files in the following video. You can compare your SCRATCH directory space (myquota-this command works only from terminal) with what Dr. Ward says in the video. Is it the same?

Also, there are additional examples with rmdir and also mkdir in Dr. Ward`s video below for extra practice. The video also demonstrates the use of the grep command - no worries, next week`s project will cover the grep command in detail.

Deliverables

4a. Written answer for Step 1 (size limits for home and scratch directories)
4b. Code used to solve Steps 1 through 7
4c. Output from Steps 1, 4, 7

Question 5 (2 points)

  1. Create a new directory called mydinner in your home directory

  2. Inside the mydinner directory, create the following files using the touch command:

    1. spaghetti.txt

    2. bread.txt

    3. broccoli.txt

    4. smoothie.txt

    5. tiramisu.txt

    6. Optional: Feel free to create additional files for other dinner items you enjoy

  3. Display the contents of the mydinner directory using ls

  4. Edit each of the files to include the following ingredients:

    1. spaghetti.txt: noodle, tomato sauce

    2. bread.txt: bread, garlic, butter, cheese

    3. broccoli.txt: broccoli, salt, pepper

    4. smoothie.txt: strawberry, banana, milk

    5. tiramisu.txt: top-secret tiramisu recipe from granny

    6. Optional: Add ingredients to any additional files you created

  5. Use the cat command to print the contents of each file

  6. Move the mydinner directory to SCRATCH and rename it to mybreakfast

  7. Display the contents of the SCRATCH directory

  8. Delete the mybreakfast directory

Relevant topics: mkdir, touch, cat, vi, echo, >>

Deliverables

5a. Code used to solve all the steps above
5b. Output from Step 3, 5, 7

Submitting your Work

Once you have completed the questions, save your Jupyter notebook. You can then download the notebook and submit it to Gradescope.

Items to submit
  • firstname_lastname_project1.ipynb

You must double check your .ipynb after submitting it in gradescope. A very common mistake is to assume that your .ipynb file has been rendered properly and contains your code, markdown, and code output even though it may not. Please take the time to double check your work. See here for instructions on how to double check this.

You will not receive full credit if your .ipynb file does not contain all of the information you expect it to, or if it does not render properly in Gradescope. Please ask a TA if you need help with this.