TDM 20100: Project 8 - SQL Introduction 4
Project Objectives
In this project, we will be copying the Lahman database to your local storage in anvil so that you are able to modify it, and learn core SQL concepts that involve writing and updating database tables. We will also see how these concepts apply to an alternate dataset.
Dataset
-
/anvil/projects/tdm/data/lahman/lahman.db
-
/anvil/projects/tdm/data/chinook/chinook.db (Chinook Music Database)
If AI is used in any cases, such as for debugging, research, etc., we now require that you submit a link to the entire chat history. For example, if you used ChatGPT, there is an “Share” option in the conversation sidebar. Click on “Create Link” and please add the shareable link as a part of your citation. The project template in the Examples Book now has a “Link to AI Chat History” section; please have this included in all your projects. If you did not use any AI tools, you may write “None”. We allow using AI for learning purposes; however, all submitted materials (code, comments, and explanations) must all be your own work and in your own words. No content or ideas should be directly applied or copy pasted to your projects. Please refer to the-examples-book.com/projects/fall2025/syllabus#guidance-on-generative-ai. Failing to follow these guidelines is considered as academic dishonesty. |
This project primarily uses the Lahman baseball database, but students may also practice with the Chinook music database for additional examples. The Chinook database contains information about music artists, albums, tracks, customers, and sales - providing a different domain to practice the same SQL concepts. There is a dedicated page for Chinook database in the example book: the-examples-book.com/projects/data-sets/chinook All deliverables are required to use the Lahman dataset. The Chinook dataset is included for practice purposes. |
Questions
Question 1 (2 points)
The first thing we need to do is to copy the Lahman database and save it in your local folder in anvil. We can do this buy first loading the database from the path above like normal, and then using the python sqlite3 library to create a new database file in your local folder and copy the contents of the Lahman database into it.
import sqlite3
# Load the Lahman database
read_only_lahman = sqlite3.connect('/anvil/projects/tdm/data/lahman/lahman.db')
local_lahman = sqlite3.connect('./lahman.db')
read_only_lahman.backup(local_lahman) # Backup the database to your local file
read_only_lahman.close()
local_lahman.close()
Now, connect to your local copy of the Lahman database and write a query that returns the first 5 rows of the "Teams" table. You can connect to your local version by using this command:
%sql sqlite:///./lahman.db
SELECT * FROM Teams LIMIT 5;
We can do the same with Chinook data for more practice.
Database Setup for Chinook
Chinook represents a digital media store with tables for artists, albums, media tracks, invoices and customers. The updated version of the chinook data is available here:
github.com/lerocha/chinook-database/tree/master/ChinookDatabase/DataSources
import sqlite3
# Load the Chinook database
read_only_chinook = sqlite3.connect('/anvil/projects/tdm/data/chinook/Chinook_Sqlite.sqlite')
local_chinook = sqlite3.connect('./chinook.sqlite')
read_only_chinook.backup(local_chinook)
read_only_chinook.close()
local_chinook.close()
Connect to your local copy of the chinook database and write a query that returns the first 5 rows of the "Artist" table:
%sql sqlite:///./chinook.sqlite
SELECT * FROM Artist LIMIT 5;
1.1. Lahman database copied to your local folder in anvil
1.2. Query that returns the first 5 rows of the "Teams" table from Lahman database.
Question 2 (2 points)
Now that we have a modifiable Lahman database, let’s talk about creating a new table. The CREATE TABLE
command is used to create a new table in the database. For example, if you wanted to create a new table called "MyPlayers" with columns for "playerID", "name", and "teamID", you would use the following command:
CREATE TABLE MyPlayers (
playerID TEXT PRIMARY KEY,
name TEXT NOT NULL,
teamID TEXT NOT NULL
);
This will create a new table called "MyPlayers" with the specified columns. You should also notice that each column has a data type specified, such as TEXT
or INTEGER
. This is important because it tells the database what kind of data will be stored in each column. Additionally, the PRIMARY KEY
constraint is used to ensure that each row in the table has a unique identifier. Finally, the NOT NULL
constraint is used to ensure that a column cannot have a NULL value. There are many other constraints that can be used, such as UNIQUE
, FOREIGN KEY
, and CHECK
, but we will not cover those in this project.
SQLite doesn’t have very many data types. The most common ones are However, more robust databases like PostgreSQL can have a wide rang of data types, including Arrays, JSON, UUIDs, etc. You can read more about PostgreSQL data types here. |
Now that we have this table, how do we insert data into it? The INSERT INTO
command is used to insert data into a table. For example, if you wanted to insert a new player into the "MyPlayers" table, you would use the following command:
INSERT INTO MyPlayers (playerID, name, teamID)
VALUES ('player1', 'John Doe', 'team1');
Now, if you query the "MyPlayers" table, you should see the new player that you just inserted:
SELECT * FROM MyPlayers;
This is simple, but impractical to add thousands of players one by one. Instead, we can use the INSERT INTO
command to insert multiple rows at once. For example, if you wanted to insert multiple players into the "MyPlayers" table, you would use the following command:
INSERT INTO MyPlayers (playerID, name, teamID)
VALUES
('player2', 'Jane Smith', 'team1'),
('player3', 'Bob Johnson', 'team2'),
('player4', 'Alice Brown', 'team2');
For the Chinook dataset, we can create a new table called
And we can insert multiple playlists into the
|
Additionally, we can use the INSERT INTO
command to insert data from another table. For example, if you wanted to insert all players from the "People" table into the "MyPlayers" table for Lahman, you would use the following command:
INSERT INTO MyPlayers (playerID, name, teamID)
SELECT p.playerID, p.nameFirst || ' ' || p.nameLast AS name, MIN(s.teamID) as teamID
FROM People p
JOIN Salaries s on s.playerID = p.playerID
WHERE s.yearID = ( SELECT MAX(yearID) FROM Salaries s2 WHERE s2.playerID = s.playerID )
GROUP BY p.playerID;
The above SQL query is quite complex. This is because the Salaries table does not require playerID to be unique, as a player could be on different teams or have a different salary in different years, or even get traded mid-season. Therefore, we need to aggregate both the teamID and the yearID to ensure that we are only getting one row per player. We do this by using the |
Once you’ve inserted the data, query the first 5 rows of the "MyPlayers" table to see the data you just inserted.
For the Chinook dataset, we can insert playlists based on the first 5 albums in the
This query is also complex, as we have to cast album IDs to text datatypes and prefix them with "playlist_". We also limit it to the first 5 albums by ensuring AlbumId is in the list containing numbers 1-5. Query the first 5 rows of
|
2.1. Create a new table called "MyPlayers",
2.2. Insert a single new player into the "MyPlayers" table using the INSERT INTO
command,
2.3. Insert multiple players into the "MyPlayers" table using the INSERT INTO
command,
2.4. Insert all players from the "Master" table into the "MyPlayers" table using the INSERT INTO`
command,
2.5. Query the first 5 rows of the "MyPlayers" table.
Question 3 (2 points)
Now that you know how to create a new table and insert data into it, let’s talk about altering an existing table. The ALTER
command is used to modify an existing table in the database. For example, if you wanted to add a new column called "age" to the "MyPlayers" table, you would use the following command:
ALTER TABLE MyPlayers
ADD COLUMN age INTEGER;
Or, if you wanted to remove a column called "teamID" from the "MyPlayers" table, you would use the following command:
ALTER TABLE MyPlayers
DROP COLUMN teamID;
Another common thing that
|
Now, if you query the "MyPlayers" table, you should see the changes that you just made:
SELECT * FROM MyPlayers LIMIT 5;
Now that you know how to alter an existing table, please modify your "MyPlayers" table to add a new column called "height" of type REAL
and a new column called "weight" of type REAL
. You can do this by using the ALTER TABLE
command as shown above. After you have added the new columns, insert 5 example players into the "MyPlayers" table, giving them each a name, unique playerID, height, weight and age. Then, query the table to return the 5 tallest players in the "MyPlayers" table, ordered by height in descending order.
For the Chinook dataset, we can add new columns called
For the Chinook dataset, we can insert 5 new playlists, providing data for the new columns.
And we can query the 5 playlists with the most tracks.
|
3.1. Alter the "MyPlayers" table to add a new column called "height" of type REAL
,
3.2. Alter the "MyPlayers" table to add a new column called "weight" of type REAL
,
3.3. Insert a new player into the "MyPlayers" table with the specified data,
3.4. Query the "MyPlayers" table to return the 5 tallest players, ordered by height in descending order.
Question 4 (2 points) - Deleting Data
Now that you know how to alter an existing table and insert data into it, let’s talk about deleting data from a table. The DELETE
command is used to delete data from a table. For example, if you wanted to delete a player from the "MyPlayers" table, you would use the following command:
DELETE FROM MyPlayers
WHERE playerID = 'player1';
This will delete the player with the specified playerID from the "MyPlayers" table. If you query the "MyPlayers" table after running this command, you should see that the player has been deleted:
SELECT * FROM MyPlayers WHERE playerID = 'player1' LIMIT 5;
In a similar manner, we could delete all players from the "MyPlayers" table that have a height less than 6 feet (72 inches). For example, if you wanted to delete all players with a height less than 6 feet, you would use the following command:
DELETE FROM MyPlayers
WHERE height < 72.0;
This will delete all players from the "MyPlayers" table that have a height less than 6 feet. If you query the "MyPlayers" table after running this command, you should see that the players with a height less than 6 feet have been deleted:
SELECT * FROM MyPlayers WHERE height < 6.0 LIMIT 5;
For the Chinook dataset, we can delete a single playlist, such as the one with
and
And, we can delete all playlists that are not public (i.e.,
and
|
4.1. Delete a player from the "MyPlayers" table using the DELETE
command,
4.2. Delete all players from the "MyPlayers" table that have a height less than 6 feet using the DELETE
command,
4.3. Query the "MyPlayers" table to verify that the players have been deleted.
Question 5 (2 points)
Finally, let’s talk about greatly removing data from a table or the table as a whole. If you want to remove all data from a table but keep the structure, column names and types, etc., you can use the DELETE
command instead without a WHERE
clause. This will remove all data from the table but keep the table itself intact. For example, if you wanted to truncate the "MyPlayers" table, you would use the following command:
DELETE FROM MyPlayers;
The DROP TABLE
command is used to delete an entire table from the database. For example, if you wanted to drop the "MyPlayers" table, you would use the following command:
DROP TABLE MyPlayers;
For the Chinook dataset, we can remove all data from the
And, we can drop the
|
This will delete the entire "MyPlayers" table from the database. If you query the "MyPlayers" table after running this command, you should see that the table no longer exists:
SELECT * FROM MyPlayers LIMIT 5;
Now, dropping a table is a very destructive operation, so be careful when using this command. You should only use it when you are sure that you no longer need the table and its data. |
5.1. Remove all data from the "MyPlayers" table using the TRUNCATE
command,
5.2. Drop the "MyPlayers" table using the DROP TABLE
command,
5.3. Verify that the "MyPlayers" table has been dropped by attempting to query it.
Submitting your Work
Once you have completed the questions, save your Jupyter notebook. You can then download the notebook and submit it to Gradescope.
-
firstname_lastname_project8.ipynb
You must double check your You will not receive full credit if your |