bash Overview
ls
ls
(think list) is one of the most-used commands in Unix-like operating systems. ls
prints the files and sub-directories located in the current working directory, or, in the directory specified as an argument by the user.
ls
is incredibly useful for navigation. When working with a graphical user interface (GUI), like Finder in MacOS or File Explorer in Windows, the files and folders in a directory are shown to you each time you navigate to and from a specific directory.
When navigating files and folders via a shell, this is not quite as straightforward. When using cd
to navigate to a particular directory, you will not have a nice list of files and folders presented to you upon navigating to said directory. Instead, we use ls
to list the files and folders, and see what options we have for navigation. For example, let’s say our current working directory is /home/john/
, and we want to go to our projects folder (/home/john/projects
) to see what projects we have. To navigate to our projects folder, we can use cd
.
Examples
The year range of flight data in the two directories below, and which directory has bigger file sizes:
/anvil/projects/tdm/data/flights/subset/
/anvil/projects/tdm/data/flights/
Click to see solution
# in its own cell
%%bash
ls -la /anvil/projects/tdm/data/flights/subset/*.csv
# in its own cell
ls -la /anvil/projects/tdm/data/flights/*.csv
The year range for `/anvil/projects/tdm/data/flights/`is 1987-2023. The year range for `/anvil/projects/tdm/data/flights/subset/` is 1987-2008. The files are larger in `/anvil/projects/tdm/data/flights/`
cut
We can use cut
to extract information from a text file. We usually just need to specify the delimited between the fields of data, using the -d
option, and we also need to specify the fields to extract, using the -f
option. For example, we can display the city and state of the donations to federal election campaigns.
Examples
Use the cut
command to extract all of the origin airports and destination airports from the 1987.csv
file in the flights subset directory, and store the resulting origin and destination airports into a file in your home directory.
Click to see solution
%%bash
cut -d, -f17,18 /anvil/projects/tdm/data/flights/subset/1987.csv > $HOME/originsanddestinations.csv
head -n6 $HOME/originsanddestinations.csv
Origin,Dest SAN,SFO SAN,SFO SAN,SFO SAN,SFO SAN,SFO
grep
The grep
utility is used for searching for regular expressions in files. There are many variants of the grep
command.
Examples
Use the grep
command to find data in the 1987.csv
file in the flights subset directory that contain the pattern IND. Save all of the lines of that 1987.csv
file into a new file in your home directory called indyflights.csv
.
Click to see solution
%%bash
grep "IND" /anvil/projects/tdm/data/flights/subset/1987.csv > $HOME/indyflights.csv
head $HOME/indyflights.csv
1987,10,1,4,700,700,804,755,TW,76,NA,64,55,NA,9,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA 1987,10,2,5,700,700,805,755,TW,76,NA,65,55,NA,10,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA 1987,10,3,6,659,700,757,755,TW,76,NA,58,55,NA,2,-1,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA 1987,10,4,7,700,700,756,755,TW,76,NA,56,55,NA,1,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA 1987,10,6,2,702,700,806,755,TW,76,NA,64,55,NA,11,2,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA 1987,10,7,3,700,700,804,755,TW,76,NA,64,55,NA,9,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA 1987,10,8,4,658,700,804,755,TW,76,NA,66,55,NA,9,-2,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA 1987,10,9,5,700,700,805,755,TW,76,NA,65,55,NA,10,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA 1987,10,10,6,700,700,804,755,TW,76,NA,64,55,NA,9,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA 1987,10,11,7,700,700,752,755,TW,76,NA,52,55,NA,-3,0,STL,IND,229,NA,NA,0,NA,0,NA,NA,NA,NA,NA