TDM 20100: Project 9 - SQL Introduction 5
Project Objectives
In this project, we will be learning about triggers, views, and transactions in SQL. We will continue working with your local copy of the Lahman database, and see how these concepts apply to an alternate dataset.
Dataset
-
/anvil/projects/tdm/data/lahman/lahman.db
-
Alternative: /anvil/projects/tdm/data/chinook/chinook.sqlite (Chinook Music Database)
If AI is used in any cases, such as for debugging, research, etc., we now require that you submit a link to the entire chat history. For example, if you used ChatGPT, there is an “Share” option in the conversation sidebar. Click on “Create Link” and please add the shareable link as a part of your citation. The project template in the Examples Book now has a “Link to AI Chat History” section; please have this included in all your projects. If you did not use any AI tools, you may write “None”. We allow using AI for learning purposes; however, all submitted materials (code, comments, and explanations) must all be your own work and in your own words. No content or ideas should be directly applied or copy pasted to your projects. Please refer to the-examples-book.com/projects/fall2025/syllabus#guidance-on-generative-ai. Failing to follow these guidelines is considered as academic dishonesty. |
This project primarily uses the Lahman baseball database, but students may also practice with the Chinook music database for additional examples. The Chinook database contains information about music artists, albums, tracks, customers, and sales - providing a different domain to practice the same SQL concepts. There is a dedicated page for Chinook database in the example book: the-examples-book.com/projects/data-sets/chinook All deliverables are required to use the Lahman dataset. The Chinook dataset is included for practice purposes. |
Introduction
If you have already worked on the previous week’s project (project 8) and/or have the local copy of the Lahman database, you can skip this section and begin with Question 1. If not, please follow the steps below.
Copy the Lahman database and save it in your local folder in anvil. We can do this buy first loading the database from the path it located in Anvil, and then using the python sqlite3 library to create a new database file in your local folder and copy the contents of the Lahman database into it.
import sqlite3
# Load the Lahman database
read_only_lahman = sqlite3.connect('/anvil/projects/tdm/data/lahman/lahman.db')
local_lahman = sqlite3.connect('./lahman.db')
read_only_lahman.backup(local_lahman) # Backup the database to your local file
read_only_lahman.close()
local_lahman.close()
Now, connect to your local copy of the Lahman database and write a query that returns the first 5 rows of the "Teams" table. You can connect to your local version by using this command:
%sql sqlite:///./lahman.db
SELECT * FROM Teams LIMIT 5;
Questions
Question 1 (2 points)
To start, let’s connect to your local copy of the Lahman database and write a query that returns the first 5 rows of the "Teams" table. You can connect to your local version by using this command, assuming the database file is in the same directory as your Jupyter notebook:
%sql sqlite:///./lahman.db
Last project, you learned how to modify a database and its tables with commands like INSERT
, ALTER
, and DELETE
. In this project, we will continue to use these commands, but learn how to use them in a more advanced way.
A trigger is a type of stored procedure that is automatically executed whenever certain event(s) occur in the database. For example, you can create a trigger that automatically updates a column in a table whenever a row is inserted or updated. Or, you could delete a row from a table whenever a row is deleted from another table. Triggers can be very useful for maintaining the integrity of your data.
To start, let’s learn how to create a trigger. The trigger command structure is a lot longer than the other commands we have learned so far, but it is still relatively simple. The basic structure of a trigger is as follows:
CREATE TRIGGER
IF NOT EXISTS trigger_name
(BEFORE|AFTER|INSTEAD OF)
(INSERT|UPDATE OF column_name|DELETE)
ON table_name
[FOR EACH ROW]
[WHEN condition]
BEGIN
-- SQL statement(s) to execute
END;
Although that may seem quite complex at first glance, it is relatively simple. Sections surrounded by parenthesis are required and one of those options must be selected. Surrounded by square brackets are optional and change how the trigger works. Below is a line by line explanation of this code:
The CREATE TRIGGER
command is used to create a new trigger.
The IF NOT EXISTS
clause is optional and is used to prevent an error if the trigger already exists.
The BEFORE
, AFTER
, or INSTEAD OF
keywords specify when the trigger should be executed relative to the event that triggers it.
The INSERT
, UPDATE
, or DELETE
keywords specify which event should trigger the execution of the trigger.
The ON
keyword specifies the table that the trigger is associated with.
The FOR EACH ROW
clause is optional and specifies that the trigger should be executed for each row affected by the event. For example, if you have a trigger that occurs after an INSERT
operation and you insert 10 rows in a single INSERT
statement, the SQL statements in the trigger will be executed 10 times, once for each row inserted. If you do not include this clause, the trigger will only be executed once for the entire operation.
The WHEN
clause is also optional and specifies a condition that must be met for the trigger to be executed. For example, you could specify that the trigger should only be executed if a certain column has a specific value, or if a certain condition is met similar to a WHERE
clause in a SELECT
statement. If the condition is not met, the trigger will not be executed.
Finally, the BEGIN
and END
keywords are used to define the body of the trigger, which contains the SQL statements to be executed when the trigger is fired.
You can read more about triggers in SQLite here.
To test this out, let’s create a trigger that automatically updates the "yearID" column in the "Teams" table to the current year whenever a new row is inserted. The trigger will look like this:
CREATE TRIGGER IF NOT EXISTS update_yearID
AFTER INSERT ON Teams
FOR EACH ROW
BEGIN
UPDATE Teams
SET yearID = strftime('%Y', 'now')
WHERE teamID = NEW.teamID;
END;
For the Chinook dataset, you can create a similar trigger to automatically update the
|
There are a few new things in this trigger that we haven’t seen before. Firstly, the Secondly, the Finally, the |
To test if this works, let’s insert a new row into the "Teams" table and see if the yearID
column is automatically updated to the current year. You can do this by using the following command:
INSERT INTO Teams (teamID, name, yearID)
VALUES ('team1', 'Test Team', 0);
For the Chinook dataset, you would test the
|
Then, let’s query the "Teams" table to see if the yearID
column has been updated:
SELECT * FROM Teams WHERE teamID = 'team1';
For the Chinook dataset, query the
|
As you can see, originally we inserted a yearID
of 0, but after the trigger was executed, the yearID
column was automatically updated to the current year (2025).
1.1. Create a trigger called "update_yearID" that automatically updates the "yearID" column in the "Teams" table to the current year whenever a new row is inserted
1.2. Insert a new row into the "Teams" table with a yearID of 0
1.3. Query the "Teams" table to see if the yearID
column has been updated.
Question 2 (2 points)
Now that you know how to create a trigger, try one out on your own. Firstly, create a new table called "DeathRecords" with playerID
, yearID
, monthID
, and dayID
columns that are not NULL using the CREATE TABLE
command. The playerID
column should be of type TEXT
, and the yearID
, monthID
, and dayID
columns should be of type INTEGER
. Look back at the previous project for an example of how to create a table.
Now that you have this table, we want a trigger so that everytime a player is added to the "DeathRecords" table, the same player’s deathYear
, deathMonth
, and deathDay
columns in the "People" table are updated to match the values in the "DeathRecords" table.
For the Chinook dataset, the equivalent is creating an activity log table, for example:
Also, we can create a trigger to log the creation of a new playlist into the
|
hint: we don’t want to get the current date like we did, we want the same value that was inserted into the "DeathRecords" table. For example, |
Once you have made your trigger, let’s test it out by inserting a new row into the "DeathRecords" table. You can do this by using the following command:
INSERT INTO DeathRecords (playerID, yearID, monthID, dayID)
VALUES ('abreubo01', 2025, 1, 1);
Then, let’s query the "People" table to see if the deathYear
, deathMonth
, and deathDay
columns have been updated:
SELECT * FROM People WHERE playerID = 'abreubo01';
Once you have verified that the trigger works, let’s test if it works for multiple rows at the same time. Run the below command to insert multiple players into the "DeathRecords" table:
INSERT INTO DeathRecords (playerID, yearID, monthID, dayID)
VALUES
('pintore01', 2025, 2, 1),
('hundlni01', 2025, 3, 1),
('detwiro01', 2025, 4, 1);
Then, query the "People" table to see if the deathYear
, deathMonth
, and deathDay
columns have been updated for all of the players:
SELECT playerID, deathYear, deathMonth, deathDay FROM People
WHERE playerID IN ('pintore01', 'hundlni01', 'detwiro01');
2.1. Create a new table called "DeathRecords" with playerID
, yearID
, monthID
, and dayID
columns,
2.2. Create a trigger that automatically updates the deathYear
, deathMonth
, and deathDay
columns in the "People" table whenever a new row is inserted into the "DeathRecords" table,
2.3. Insert a new row into the "DeathRecords" table and verify that the trigger works by checking the "People" table,
2.4. Insert multiple rows into the "DeathRecords" table and verify that the trigger works for all of the rows by checking the "People" table.
Question 3 (2 points)
Now that you know how to create a trigger, let’s learn how to create a view. A view is a virtual table that is based on the result of a query. It is similar to a table, but it does not store any data itself. Instead, it stores the SQL query that defines the view. When you query a view, the SQL query is executed and the result is returned as if it were a table.
The basic structure of a view is as follows:
CREATE VIEW IF NOT EXISTS view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
The CREATE VIEW
command is used to create a new view.
The IF NOT EXISTS
clause is optional and is used to prevent an error if the view already exists, similar to triggers.
The AS
keyword is used to specify the SQL query that defines the view.
The SELECT
statement is used to select the columns that you want to include in the view, and the FROM
clause specifies the table that the view is based on.
The WHERE
clause is optional and can be used to filter the rows that are included in the view.
Try making a view called "TallPlayers" that returns the playerID
, nameFirst
, nameLast
, height
, and weight
columns from the "People" table, but only for players who are taller than 6 feet (72 inches).
To test this out, let’s query the "TallPlayers" view to see if it returns the expected results:
SELECT * FROM TallPlayers LIMIT 5;
You should see that the view returns only players who are taller than 6 feet. However, what exactly does this view do for us? Why not just run the SELECT
statement directly? The benefit of using a view is that it allows you to save a complex query and reuse it multiple times without having to rewrite the entire query each time. For example, if you wanted to find the top 5 heaviest players who were born in 1994 and over 6 feet, you could use the following command:
SELECT * FROM People
WHERE height > 72
AND weight IS NOT NULL
AND birthYear = 1994
ORDER BY weight DESC
LIMIT 5;
Or, you could use the "TallPlayers" view to simplify the query:
SELECT * FROM TallPlayers
WHERE weight IS NOT NULL
AND birthYear = 1994
ORDER BY weight DESC
LIMIT 5;
Create a view called "TallHeavyPlayers" with those same conditions (height > 72, weight IS NOT NULL, birthYear = 1994), that returns the playerID
, nameFirst
, nameLast
, height
, and weight
columns from the "People" table, ordered by weight in descending order.
Now, For Lahman data, query the "TallHeavyPlayers" view to get the top 5 heaviest players who were born in 1994 and are over 6 feet tall:
SELECT * FROM TallHeavyPlayers LIMIT 5;
What if we wanted to see the top 7 players instead? Instead of having to copy all of the conditions, ordering, etc. from our previous queries, we can simply query the view again with a different limit:
SELECT * FROM TallHeavyPlayers LIMIT 7;
For the Chinook dataset, the view examples require that the
query the
For the Chinook dataset, you can create a view called
And, query the
|
3.1. Create a view called "TallPlayers" that returns the playerID
, nameFirst
, nameLast
, height
, and weight
columns from the "People" table, but only for players who are taller than 6 feet (72 inches),
3.2. Query the "TallPlayers" view to see if it returns the expected results,
3.3. Create a view called "TallHeavyPlayers" that returns the playerID
, nameFirst
, nameLast
, height
, and weight
columns from the "People" table, but only for players who are taller than 6 feet (72 inches), have a non-NULL weight, and were born in 1994, ordered by weight in descending order,
3.4. Query the "TallHeavyPlayers" view to get the top 5 heaviest players who were born in 1994 and are over 6 feet tall,
3.5. Query the "TallHeavyPlayers" view again to get the top 7 heaviest players who were born in 1994 and are over 6 feet tall.
Question 4 (2 points)
Finally, let’s look at transactions. A transaction is a sequence of one or more SQL statements that are executed as a single unit of work. Transactions are used to ensure that a series of SQL statements are executed together, and if any of the statements fail, the entire transaction is rolled back and none of the changes are made to the database.
For example, let’s say you want to insert a new player into the "People" table, and then insert a new team into the "Teams" table, and then give that player a salary on the new team in the "Salaries" table. If you successfully did two of these but the third one failed, you would end up with an inconsistent state in your database. To prevent this, you can use a transaction to ensure that either all of the statements are executed successfully, or none of them are.
The basic structure of a transaction is as follows:
BEGIN TRANSACTION;
-- SQL statements to execute
COMMIT;
The BEGIN TRANSACTION
command is used to start a new transaction.
The COMMIT
command is used to save the changes made by the transaction to the database. If any of the SQL statements fail, you can use the ROLLBACK
command to undo all of the changes made by the transaction.
However, if one of those SQL statements fails, the COMMIT command will not be executed. Then, we can use the ROLLBACK
command to undo all of the changes made by the transaction. The rollback command is just used by itself, as follows:
ROLLBACK;
In other versions of SQL, you can use |
One problem with this is that not all forms of SQL support transactions, including SQLite. However, the concept of transactions is still important, primarily the idea of not allowing partial updates to a database and ensuring data integrity. SQLite does provide an alternative approach with SAVEPOINT
and RELEASE
. These work very similarly to transactions, but are more powerful as you can create nested transactions. For simplicity, when we refer to 'transactions' in this project, we are referring to SQLite’s SAVEPOINT
, ROLLBACK TO
, and RELEASE
commands.
The basic structure of a transaction using SAVEPOINT
is as follows:
SAVEPOINT savepoint_name; -- Start a new savepoint that can be later referenced to release from memory or rollback to
-- SQL statements to execute
RELEASE savepoint_name; -- Commit the changes made since the savepoint was created and remove the savepoint from memory
-- OR --
ROLLBACK TO savepoint_name; -- Undo all changes made since the savepoint was created, but keep the savepoint in memory for potential future use
There are some interesting interactions with how SQLite3 is run in our Jupyter notebooks through ipython-sql and SQLAlchemy. By default, each code cell is run as it’s own cell, with SQLAlchemy automatically inserting a |
Before running any code, run a code cell with the following code: |
%config SqlMagic.autocommit = False
This configuration change is required for both the Lahman and Chinook databases when running in the Jupyter environment. |
Let’s try this out with our "DeathRecords" table. Start a transaction that has 3 INSERT
statements, inserting 3 new rows into the DeathRecords table. The data you should insert is as follows:
-
playerID: 'bandyje01', yearID: 2025, monthID: 5, dayID: 1
-
playerID: 'meyeral01', yearID: 2025, monthID: 6, dayID: 1
-
playerID: 'perdoan01', yearID: 2025, monthID: 7, dayID: 1
SAVEPOINT insert_deathrecords;
INSERT INTO DeathRecords VALUES ('bandyje01', 2025, 5, 1);
INSERT INTO DeathRecords VALUES ('meyeral01', 2025, 6, 1);
INSERT INTO DeathRecords VALUES ('perdoan01', 2025, 7, 1);
RELEASE insert_deathrecords;
COMMIT;
Then, run the transaction. If you run the following command, you should see that the transaction was successful and all 3 rows were inserted into the "DeathRecords" table:
SELECT * FROM DeathRecords;
Instead of just using |
For the Chinook dataset, you can run a successful batch insert into the
Verification:
|
Now, let’s see what happens if one of the INSERT
statements fails. Try to insert the following rows into the "DeathRecords" table without a transaction:
-
playerID: 'castidi01', yearID: 2025, monthID: 8, dayID: 1
-
playerID: 'quezajo01', yearID: 2025, monthID: 9, dayID: NULL
The second insert will fail because the dayID
column is not allowed to be NULL. You should see from the output it says 0 rows affected.
, indicating that this row was ignored instead of added to the table. Now, let’s query the "DeathRecords" table to see if any of the rows were inserted:
SELECT * FROM DeathRecords WHERE playerID in ('castidi01', 'quezajo01');
You should see that the first row was inserted because that INSERT
statement was successful, but the second row was not inserted because the INSERT
statement failed.
Now, let’s try the same thing, but this time using a transaction and the ROLLBACK TO
command. Start a transaction that has 2 INSERT
statements, inserting the same 2 new rows into the "DeathRecords" table, but make sure that one of the INSERT
statements fails. Use the following command:
BEGIN;
SAVEPOINT deathrecords_insert2;
INSERT OR IGNORE INTO DeathRecords VALUES ('castidi01', 2025, 8, 1);
INSERT OR IGNORE INTO DeathRecords VALUES ('quezajo01', 2025, 9, NULL);
ROLLBACK TO deathrecords_insert2;
COMMIT;
Use a different playerID for the first insert to avoid a conflict with the previous insert. For example, you could use 'castidi02' instead of 'castidi01'. Alternatively, you could delete the previous row with |
For the Chinook dataset, you can demonstrate the
|
SELECT * FROM DeathRecords WHERE playerID = 'castidi01';
You should see that the row with playerID
'castidi01' was removed because the transaction was rolled back.
4.1. Start a transaction that has 3 INSERT
statements, inserting 3 new rows into the "DeathRecords" table,
4.2. Run the transaction and verify that all 3 rows were inserted into the "DeathRecords",
4.3. Start a transaction that has 2 INSERT
statements, inserting 2 new rows into the "DeathRecords" table, but make sure that one of the INSERT
statements fails,
4.4. Rollback the transaction and verify that no rows were inserted into the "DeathRecords" table.
Question 5 (2 points)
Now, let’s put all of these concepts together. Please create a trigger that automatically updates the "People" table whenever a new player is added to the "DeathRecords" table. The trigger will update the deathYear
, deathMonth
, and deathDay
columns in the "People" table to match the values in the "DeathRecords" table. Then, create a view called "DeceasedPlayers" that returns the playerID
, nameFirst
, nameLast
, deathYear
, deathMonth
, and deathDay
columns from the "People" table, but only for players who have a non-NULL value in the deathYear
column.
Finally, start a transaction that has 3 INSERT
statements, inserting 3 new rows into the "DeathRecords" table. The data you should insert is as follows:
-
playerID: 'johndoe01', yearID: 2025, monthID: 10, dayID: 1
-
playerID: 'janedoe01', yearID: 2025, monthID: 11, dayID: 1
-
playerID: 'foobar01', yearID: 2025, monthID: 12, dayID: 1
Then, run the transaction. If you run the following command, you should see that the transaction was successful and all 3 rows were inserted into the "DeathRecords" table:
SELECT * FROM DeathRecords WHERE playerID IN ('johndoe01', 'janedoe01', 'foobar01');
Then, query the "DeceasedPlayers" view to see if the new players were added to the view:
SELECT * FROM DeceasedPlayers WHERE playerID IN ('johndoe01', 'janedoe01', 'foobar01');
CREATE TRIGGER IF NOT EXISTS update_myplayers_death
AFTER INSERT ON DeathRecords
FOR EACH ROW
BEGIN
UPDATE People
SET deathYear = NEW.yearID,
deathMonth = NEW.monthID,
deathDay = NEW.dayID
WHERE playerID = NEW.playerID;
END;
CREATE VIEW IF NOT EXISTS DeceasedPlayers AS
SELECT playerID, nameFirst, nameLast, deathYear, deathMonth, deathDay
FROM People
WHERE deathYear IS NOT NULL;
For the Chinook dataset, the combined example involves a similar sequence: a trigger to update a timestamp, a view for recent activity, and a transaction to insert new data. Trigger to update timestamp (if not already done):
View for Recent Playlists:
Transaction:
Verification:
|
5.1. Create a trigger that automatically updates the "People" table whenever a new player is added to the "DeathRecords" table,
5.2. Create a view called "DeceasedPlayers" that returns the playerID
, nameFirst
, nameLast
, deathYear
, deathMonth
, and deathDay
columns from the "MyPlayers" table, but only for players who have a non-NULL value in the deathYear
column,
5.3. Start a transaction that has 3 INSERT
statements, inserting 3 new rows into the "DeathRecords" table,
5.4. Run the transaction and verify that all 3 rows were inserted into the "DeathRecords" table,
5.5. Query the "DeceasedPlayers" view to see if the new players were added to the view.
Submitting your Work
Once you have completed the questions, save your Jupyter notebook. You can then download the notebook and submit it to Gradescope.
-
firstname_lastname_project9.ipynb
You must double check your You will not receive full credit if your |