1. Basics of Querying
Concepts
| Command | Description | Example | Example Description |
|---|---|---|---|
|
Used to select data from a database. |
|
Selects all columns from the table called |
|
Used to select unique values from one or more columns. Duplicate rows (for the given columns) will not be selected. |
|
Selects all rows that are not duplicate combinations fro columns 1 and 2 from the table called |
|
Used to specify the number of records to return. |
|
Selects all columns from the table called |
Aliasing
Aliasing is the process of giving a table or a table column a temporary name. Aliases are commonly used to either make the query easier to write, or more readable.
Note that aliases only last for the duration of a single query. If we were to run the previous query, and subsequently run the following query, it would fail.
Demo
|
Fulya Add Aliasing examples below the-examples-book.com/tools/sql/aliasing |
For today’s practices, please select the seminar kernel. To run SQL queries in a JupyterLab notebook, first run the following in a cell at the top of your notebook to establish a connection with the database:
%sql sqlite:////anvil/projects/tdm/data/movies_and_tv/imdb.db
|
In sqlite, you can show the tables using the following query:
Unfortunately, sqlite-specific functions can’t be run in a JupyterLab cell like that. Instead, we need to use a different query as follows:
|
See first five lines of people table and titles:
SELECT * FROM people LIMIT 5;
SELECT * FROM titles LIMIT 5;
It is a good idea to run this query on each table in the dataset to identify the common keys that can be used to connect the tables in SQL. It is also helpful to limit the number of rows you display (LIMIT), viewing an entire table at once is not practical.
|
In SQL and certain other languages like bash or batch, the asterisk (*) acts as a wildcard that represents "everything". For instance, the command |
Now, let’s check how many records are in the people table:
SELECT COUNT(*) FROM people;
As you can see, there are more than 11 million records, so displaying the full table would not be feasible.
Let’s take a closer look at the titles table by running the following query:
SELECT * FROM titles LIMIT 5;
As you can see, each row includes a title_id that corresponds to the title of a movie, TV show, or another type of media. But what exactly is this title_id? Take a look at the following link:
Practice on your own
Now, you’ll practice the concepts above on your own with the Lahman Baseball data.
%sql sqlite:////anvil/projects/tdm/data/lahman/lahman.db
1.1. Load the SQL database into a queryable SQLite database.
1.2. How many tables are in the database?
1.3. List the names of all the tables in the database.
1.4. Write a SQL query to select all columns from the People table, limiting the results to 5 records.
Now focus on the People table.
1.5. How many columns are in the People table?
1.6. What is the name of the column that contains the player’s first name?
1.7. For a players deathYear, what is put in the column if the player is still alive?
1.8. What is Don Aase’s playerID?