TDM 20100: Project 9 — 2022

Motivation: Although SQL syntax may still feel unnatural and foreign, with more practice it will start to make more sense. The ability to read and write SQL queries is a "bread-and-butter" skill for anyone working with data.

Context: We are in the second of a series of projects that focus on learning the basics of SQL. In this project, we will continue to harden our understanding of SQL syntax, and introduce common SQL functions like `AVG`, `MIN`, and `MAX`.

Scope: SQL, sqlite

Learning Objectives
• Explain the advantages and disadvantages of using a database over a tool like a spreadsheet.

• Describe basic database concepts like: rdbms, tables, indexes, fields, query, clause.

• Basic clauses: select, order by, limit, desc, asc, count, where, from, etc.

• Utilize SQL functions like min, max, avg, sum, and count to solve data-driven problems.

Make sure to read about, and use the template found here, and the important information about projects submissions here.

Dataset(s)

The following questions will use the following dataset(s):

• `/anvil/projects/tdm/data/taxi/taxi_sample.db`

Questions

Question 1

In previous projects, we used `awk` to parse through and summarize data. While `awk` is extremely convenient and can work well, but SQL is even better.

Write a query that will return the `fare_amount`, `surcharge`, `tip_amount`, and `tolls_amount` as a percentage of `total_amount`.

 Make sure to limit the output to only 100 rows! Use the `LIMIT` clause to do this.
 Use the `sum` aggregate function to calculate the totals, and division to figure out the percentages.
Items to submit
• Code used to solve this problem.

• Output from running the code.

Question 2

Check out the `payment_type` column. Write a query that counts the number of each type of `payment_type`. The end result should print something like the following.

Output sample
```payment_type, count
CASH, 123```
 You can use aliasing to control the output header names.

Write a query that sums the `total_amount` for `payment_type` of "CASH". What is the total amount of cash payments?

Items to submit
• Code used to solve this problem.

• Output from running the code.

Question 3

Write a query that gets the largest number of passengers in a single trip. How far was the trip? What was the total amount? Answer all of this in a single query.

Whoa, there must be some erroneous data in the database! Not too surprising. Write a query that explores this more, explain what your query does and how it helps you understand what is going on.

 Make sure all queries limit output to only 100 rows.
Items to submit
• Code used to solve this problem.

• Output from running the code.

Question 4

Write a query that gets the average `total_amount` for each year in the database. Which year has the largest average `total_amount`? Use the `pickup_datetime` column to determine the year.

 If you want the headers to be more descriptive, you can use aliases.
Items to submit
• Code used to solve this problem.

• Output from running the code.

Question 5

What percent of data in our database has information on the location of pickup and dropoff? Examine the data, to see if there is a pattern to the rows with that information and without that information.

 There is a distinct pattern. Pay attention to the date and time of the data.

Confirm your hypothesis with the original data set(s) (in `/anvil/projects/tdm/data/taxi/yellow/*.csv`), using bash. This doesn’t have to be anything more thorough than running a simple `head` command with a 1-2 sentence explanation.

 Of course, there will probably be some erroneous data for the latitude and longitude columns. However, you could use the `avg` function on a latitude or longitude column, by year to maybe get a pattern.
Items to submit
• Code used to solve this problem.

• Output from running the code.

 Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted. In addition, please review our submission guidelines before submitting your project.