git is a powerful version control software written by Linus Torvalds, the inventor of the Linux kernel.
git has a reputation for being complicated and confusing. While there may be some truth to this,
git is used extensively in industry, and is relied upon by many of the applications being developed today.
While there are certainly advantages to understanding
git in detail, it is unnecessary for many projects. It is possible to get the "gist" of
git by learning some basic terminology, workflows, and how
git fits into a regular data science or software project. For this book, this is our goal. We want to provide you with the smallest amount of information that allows you to incorporate
git into your project, or helps you feel comfortable using
git in an already established project. If you want to take a deeper dive into
git, check out the resources section below.
git may be largely synonymous with GitHub, they are distinct things.
git is a piece of software running on your computer — your local system. You can use
git without using GitHub (or another host like GitLab).
git works just like other familiar programs like
sed, you can read about the commands and options by reading the
man (short for manual) pages.
Microsoft’s GitHub, however, is a software development and version control platform, hosted online.
git is a complicated tool, and platforms like GitHub aim to make using
git as pain-free as possible. It is free and easy to create a GitHub account. Other competing platforms include: GitLab, sourcehut, Gitea, and Bitbucket. Each platform has their advantages and disadvantages, but all largely serve the same purpose.
Setup your user name and email.
git config --global user.name "John Smith" git config --global user.email "[email protected]"
Setup your default text editor.
git config --global core.editor vim
At this stage, if you were to commit to a project, there would be no way to tell if you are really John Smith. In fact, you could be anybody claiming to be John Smith. In the same way that online document signing applications allow you to verify you are you, you can create a GPG key, upload it to GitHub, and automatically sign your commits so creators know it comes from you. To do so, continue on.
brew install gpg2.
Install gpgtools from GPGTools.
Open a terminal and type the following.
gpg --full-generate-key --expert
ECC (sign only)in the first prompt, and
Curve 25519for the second. Choose how many years you’d like your key to be valid for, and enter the information as you are prompted.
It is recommended to not use a passphrase if you want to have your commits automatically signed when using GitHub Desktop. Otherwise, you will need to run the following in a terminal before you can commit to the project.
When complete, you can print the public key by running the following.
gpg --export -a "John Smith"
Make sure your replace "John Smith" with the user name you provided when creating the key.
Copy the public key to your clipboard, navigate to github.com, and sign in. Click on your profile in the upper right-hand corner of the screen and navigate to Settings. On the left-hand menu, click SSH and GPG keys and then New GPG key. Paste your public key in the provided text area and click Add GPG key.
Lastly, in order to sign commits using the newly created key, open up a text editor and modify
$HOME/.gitconfigto use your key.
[user] name = John Smith email = [email protected] signingkey = ABCDEFGHIJKLMNOP [gpg] program = /usr/local/bin/gpg (or other path to `gpg` executable) [commit] gpgsign = true
To get your signing key, run the following.
gpg --list-secret-keys --keyid-format=long
Your signing key is the 16 character value following
You can think of a repository (repo) as a version controlled directory for one or more projects. A repo contains all of the projects files, code, documentation, etc., along with the project’s entire revision history. When a single repo contains the code and project files for many projects, it can sometimes be referred to as a monorepo. Repos are typically either
public repos are open to anyone who can access the website.
private repos are only open to those who have been explicitly given permissions to the repo.
It is obvious when looking at a project on GitHub what a repo is, but what about on your own computer? What makes a folder a repo? Where are all of the version control components located? The answer is in the hidden
.git folder in your project directory. For example,
my_project is a repo, with all of the commits, repo addresses, etc., placed in the
.git folder. If you were to remove the
.git folder, the
my_project directory would no longer be a repository, but rather a normal directory.
my_project ├── .git │ ├── HEAD │ ├── config │ ├── description │ ├── hooks │ │ └── README.sample │ ├── info │ │ └── exclude │ ├── objects │ │ ├── info │ │ └── pack │ └── refs │ ├── heads │ └── tags ├── .gitignore ├── Cargo.toml ├── LICENSE ├── README.md ├── docs ├── scripts ├── src │ └── main.rs └── tests 13 directories, 10 files
To initialize a new repository from a currently existing project directory, there are a few commands to learn.
cd my_project (1) git init (2) git remote add origin [email protected]:exampleuser/my_project.git (3) git branch -M main (4) git push -u origin main (5)
|1||Navigate to the root of the project directory.|
|2||Initialize the repository, this is the command that creates the
|3||Essentially links the local repo (on your computer) to the remote repo (on GitHub). When we run commands like
|5||This command sets the upstream branch for the
Typically heard in reference to "cloning a repo". Cloning a repo is the act of downloading and copying a repository to your local machine, usually from a hosting platform like GitHub.
To clone a GitHub repo, you will need
Read access to the repository. If you’ve setup
git to use SSH keys, you can clone a repository as follows.
git clone [email protected]:TheDataMine/the-examples-book.git
If you setup
git using a credential helper and HTTPS, you can clone a repository as follows.
git clone https://github.com/TheDataMine/the-examples-book.git
Both commands will copy the entirety of the repository in your current working directory (including the
New files added to a repo are not automatically tracked. If you modify an untracked file, those changes are not recorded in the
.git folder. If you modify a tracked file, any changes saved to disk are tracked and noted by
git, and automatically added to the staging area, ready to be committed.
git add adds a file or folder to the staging area, and begins tracking. To add a new file to the staging area, run the following.
git add my_file.txt
To add everything in the root directory to staging, run the following.
git add .
git add respects the
.gitignore file in the root of the repo. The
.gitignore is a specially named file with a pattern on each line that tells
git which files to ignore and not track. A common example of a file that should not be tracked is a
.env file with sensitive credentials.
A single unit of change, which could be to a single file, or multiple files. Commits allow users to track changes made to the project throughout time. In an ideal world, commits should be accompanied by a succinct message with a description of what changes were made and why.
To commit a change to the local repository, simply modify the file or files and save them to disk as you normally would. If the files are currently being tracked,
git will "see" the changes and mark the file(s) as modified. Then, just commit the changes.
git commit -m "My succinct commit message."
To get a list of changes between the current, staged changes and the most recent commit, simply run.
git pull "pulls down" the changes made to the remote repo to your local repo. For example, let’s say we have Alice and Bob working on a project together. Alice made a change to the project and updated GitHub with all of the changes she made. Bob wants to update his local repo on his computer to be up-to-date. In order to do so, Bob runs
git pull, and assuming Bob hasn’t made any conflicting changes locally, the changes Alice made will get merged into Bob’s local repo.
In order to use
git pull, your current working directory should be inside of the local repo.
git push is the symbolic opposite of
git push takes your local commits and updates the remote repo so the rest of the team can work with the latest and greatest.
In order to use
git push, your current working directory should be inside of the local repo.
A branch is just a copy of the repository within the repository. Branches enable a logical separation from the live version (usually
master), to enable freedom of work without fear of messing something up. Typically your default branch is named
main. You can create as many branches as you want within a repository, and switch between them using
git checkout. When creating a new branch, you will be making a copy of a currently existing branch — often times this will be the
One common example of using branches would be what are sometimes referred to as "feature" branches. A feature branch is a branch created with the specific purpose of developing a feature on it, which can later be merged into the
To create a new branch called
my-branch, first, checkout the branch from which you’d like to branch off of, for example,
git checkout main
You can confirm which branch is live by looking for the asterisk after running the following.
Next, create the branch.
git branch my-branch
Once the branch is created, you can switch to it.
git checkout my-branch
It is very common to need to create a new branch and immediately switch to that branch. To do so, you can run.
git checkout -b my-new-branch
git checkout is the command that allows you to switch between different branches. To switch to a branch called "my-branch" simply run the following.
git checkout my-branch
Upon switching to my-branch, all of the files and folders on your local machine will change to match the code and files on that branch. If my-branch had a drastically different file/folder structure than my-other-branch, upon switching branches the files and folders will appear and disappear on your local machine.
Merging is the process of combining the changes and commits from one branch or fork to another. Ultimately, all accepted modifications made on other (non-live) branches need to be merged into the live branch.
To merge a branch called
my-branch into the
main branch, you must first switch the branch you want to merge into. In this case that is the
git checkout main
Then, it is as straightforward as running the merge command.
git merge my-branch
|When there is a conflict, this will not be so straightforward. Please see the an example of resolving a conflict in the GitHub Desktop section.|