TDM 20100: Project 13 — 2022

Motivation: We’ve covered a lot about SQL in a relatively short amount of time, but we still haven’t touched on some other important SQL topics. In this final project, we will touch on some other important SQL topics.

Context: In the previous project, you had the opportunity to take the time to insert data into a sqlite3 database. There are still many common tasks that you may need to perform using a database: triggers, views, transaction, and even a few sqlite3-specific functionalities that may prove useful.

Scope: SQL

Learning Objectives
  • Create a trigger on your sqlite3 database and demonstrate that it works.

  • Create one or more views on your sqlite3 database and demonstrate that they work.

  • Describe and use a database transaction. Rollback a transaction.

  • Optionally, use the sqlite3 "savepoint", "rollback to", and "release" commands.

  • Optionally, use the sqlite3 "attach" and "detach" commands to execute queries across multiple databases.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

  • /anvil/projects/tdm/data/flights/subset/flights_sample.db

  • /anvil/projects/tdm/data/movies_and_tv/imdb.db

Questions

Question 1

Begin by copying the database from the previous project to your $HOME directory. Open up a terminal and run the following.

cp /anvil/projects/tdm/data/flights/subset/flights_sample.db $HOME

Go ahead and launch sqlite3 and connect to the database.

sqlite3 $HOME/flights_sample.db

From within sqlite3, test things out to make sure the data looks right.

.header on
SELECT * FROM flights LIMIT 5;
expected output
Year|Month|DayofMonth|DayOfWeek|DepTime|CRSDepTime|ArrTime|CRSArrTime|UniqueCarrier|FlightNum|TailNum|ActualElapsedTime|CRSElapsedTime|AirTime|ArrDelay|DepDelay|Origin|Dest|Distance|TaxiIn|TaxiOut|Cancelled|CancellationCode|Diverted|CarrierDelay|WeatherDelay|NASDelay|SecurityDelay|LateAircraftDelay
2007|1|1|1|1232|1225|1341|1340|WN|2891|N351|69|75|54|1|7|SMF|ONT|389|4|11|0||0|0|0|0|0|0
2007|1|1|1|1918|1905|2043|2035|WN|462|N370|85|90|74|8|13|SMF|PDX|479|5|6|0||0|0|0|0|0|0
2007|1|1|1|2206|2130|2334|2300|WN|1229|N685|88|90|73|34|36|SMF|PDX|479|6|9|0||0|3|0|0|0|31
2007|1|1|1|1230|1200|1356|1330|WN|1355|N364|86|90|75|26|30|SMF|PDX|479|3|8|0||0|23|0|0|0|3
2007|1|1|1|831|830|957|1000|WN|2278|N480|86|90|74|-3|1|SMF|PDX|479|3|9|0||0|0|0|0|0|0

With any luck, things should be working just fine.

Let’s go ahead and create a trigger. A trigger is what it sounds like, given a specific action, do a specific action. This is a powerful tool. One of the most common uses of a trigger that you will see in the wild is the "updated_at" field. This is a field that stores a datetime value, and uses a trigger to automatically update to the current date and time anytime a record in the database is updated.

First, we need to create a new column called "updated_at", and set the default value to something. In our case, lets set it to January 1, 1970 at 00:00:00.

ALTER TABLE flights ADD COLUMN updated_at DATETIME DEFAULT '1970-01-01 00:00:00';

If you query the table now, you will see all of the values have been properly added, great!

SELECT * FROM flights LIMIT 5;

Now add a trigger called "update_updated_at" that will update the "updated_at" column to the current date and time whenever a record is updated. Check out the official documentation here for examples of triggers.

Once your trigger has been written, go ahead and test it out by updating the following record.

UPDATE flights SET Year = 5555 WHERE Year = 2007 AND Month = 1 AND DayofMonth = 1 AND DayOfWeek = 1 AND DepTime = 1225 AND Origin = 'SMF';
SELECT * FROM flights WHERE Year = 5555;

If it worked right, your updated_at column should have been updated to the current date and time, cool!

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

  • Output from connecting to the database from inside your Jupyter notebook and running the SELECT * FROM flights WHERE Year = 5555; query.

Question 2

Next, we will touch on views. A view is essentially a virtual table that is created from some query and given a name. Why would you want to create such a thing? Well, there could be many reasons.

Maybe you have a complex query that you need to run frequently, and it would just be easier to see the final result with a click? Maybe the database has horrible naming conventions and you want to rename things in a view to make it more readable and/or queryable?

After some thought, it may occur to you that we’ve had such an instance where a view could be nice using our imdb.db database!

Copy the imdb.db to your $SCRATCH directory, and navigate to your $SCRATCH directory.

cp /anvil/projects/tdm/data/movies_and_tv/imdb.db $SCRATCH
cd $SCRATCH

Sometimes, it would be nice to have the rating and votes from the ratings table available directly from the titles table, wouldn’t it? It has been a bit of a hassle to access that information and use a JOIN whenever we’ve had a need to see rating information. In fact, if you think about it, the rating information living in its own table doesn’t really make that much sense.

Create a view called titles_with_ratings that has all of the information from the titles table along with the rating and votes from the ratings table. You can find the official documentation here.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

  • Output from connecting to the database from inside your Jupyter notebook and running SELECT * FROM titles_with_ratings LIMIT 5; query.

Question 3

Read the offical sqlite3 documentation for transactions here. As you will read, you’ve already been using transactions each time you run a query! What we will focus on is how to use transactions to rollback changes, as this is probably the most useful use case you’ll run into.

Connect to our flights_sample.db database from question (1), start a deferred transaction, and update a row, similar to what we did before, using the following query.

UPDATE flights SET Year = 7777 WHERE Year = 5555;

Now, query the record to see what it looks like.

SELECT * FROM flights WHERE Year = 7777;

You’ll notice our trigger from before is still working, cool!

This is pretty great, until you realized that the year should most definitely not be 7777, but rather be 5555. Oh no! Well, at this stage you haven’t committed your transaction yet, so you can just rollback the changes and everything will be back to normal. Give it a try (again, following the official documentation).

After rolling back, run the following query.

SELECT * FROM flights WHERE Year = 7777;

As you can see, nothing appears! Let’s try with the correct year.

SELECT * FROM flights WHERE Year = 5555;

Nice! Note only was our Year field rolled back to the original values after question (1), but our updated_at field was too, excellent! As you can imagine, this is pretty powerful stuff, especially if you are writing to a database and want to make sure things look right before committing the changes.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

  • A screenshot in your Jupyter notebook showing the series of queries that demonstrated your rollback worked as planned.

Question 4

SQL and sqlite3 are powerful tools, and we’ve barely scratched the surface. Check out the offical documentation, and demonstrate another feature of sqlite3 that we haven’t yet covered.

Some suggestions, if you aren’t interested in browsing the documentation: window functions, math functions, date and time functions, and core functions (there are many we didn’t use!)

Please make sure the queries you run are run from an sql cell in your Jupyter notebook.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Question 5 (optional, 0 pts)

There are two other interesting features of sqlite3: savepoints (kind of a named transaction) and attach and detach. Demonstrate one or both of these functionalities and write 1-2 sentences stating whether or not you think they are practical or useful features, and why or why not?

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.