bash Overview

ls

ls (think list) is one of the most-used commands in Unix-like operating systems. ls prints the files and sub-directories located in the current working directory, or, in the directory specified as an argument by the user.

ls is incredibly useful for navigation. When working with a graphical user interface (GUI), like Finder in MacOS or File Explorer in Windows, the files and folders in a directory are shown to you each time you navigate to and from a specific directory.

When navigating files and folders via a shell, this is not quite as straightforward. When using cd to navigate to a particular directory, you will not have a nice list of files and folders presented to you upon navigating to said directory. Instead, we use ls to list the files and folders, and see what options we have for navigation. For example, let’s say our current working directory is /home/john/, and we want to go to our projects folder (/home/john/projects) to see what projects we have. To navigate to our projects folder, we can use cd.

Examples

The year range of flight data in the two directories below, and which directory has bigger file sizes:

/anvil/projects/tdm/data/flights/subset/
/anvil/projects/tdm/data/flights/

Click to see solution
# in its own cell
%%bash
ls -la /anvil/projects/tdm/data/flights/subset/*.csv

# in its own cell
ls -la /anvil/projects/tdm/data/flights/*.csv
The year range for `/anvil/projects/tdm/data/flights/`is 1987-2023.

The year range for `/anvil/projects/tdm/data/flights/subset/` is 1987-2008.

The files are larger in `/anvil/projects/tdm/data/flights/`

cut

We can use cut to extract information from a text file. We usually just need to specify the delimited between the fields of data, using the -d option, and we also need to specify the fields to extract, using the -f option. For example, we can display the city and state of the donations to federal election campaigns.

Examples

Use the cut command to extract all of the origin airports and destination airports from the 1987.csv file in the flights subset directory, and store the resulting origin and destination airports into a file in your home directory.

Click to see solution
%%bash
cut -d, -f17,18 /anvil/projects/tdm/data/flights/subset/1987.csv > $HOME/originsanddestinations.csv

head -n6 $HOME/originsanddestinations.csv
Origin,Dest
SAN,SFO
SAN,SFO
SAN,SFO
SAN,SFO
SAN,SFO

grep

The grep utility is used for searching for regular expressions in files. There are many variants of the grep command.

Examples

Use the grep command to find data in the 1987.csv file in the flights subset directory that contain the pattern IND. Save all of the lines of that 1987.csv file into a new file in your home directory called indyflights.csv.

Click to see solution
%%bash
grep "IND" /anvil/projects/tdm/data/flights/subset/1987.csv > $HOME/indyflights.csv

head $HOME/indyflights.csv
1987,10,1,4,700,700,804,755,TW,76,NA,64,55,NA,9,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,2,5,700,700,805,755,TW,76,NA,65,55,NA,10,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,3,6,659,700,757,755,TW,76,NA,58,55,NA,2,-1,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,4,7,700,700,756,755,TW,76,NA,56,55,NA,1,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,6,2,702,700,806,755,TW,76,NA,64,55,NA,11,2,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,7,3,700,700,804,755,TW,76,NA,64,55,NA,9,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,8,4,658,700,804,755,TW,76,NA,66,55,NA,9,-2,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,9,5,700,700,805,755,TW,76,NA,65,55,NA,10,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,10,6,700,700,804,755,TW,76,NA,64,55,NA,9,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA
1987,10,11,7,700,700,752,755,TW,76,NA,52,55,NA,-3,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA