TDM 10100: Project 8 — 2022

Motivation: Functions are an important part of writing efficient code.
Functions allow us to repeat and reuse code. If you find you using a set of coding steps over and over, a function may be a good way to reduce your lines of code!

Context: We’ve been learning about and using functions these last few weeks.
To learn how to write your own functions we need to learn some of the terminology and components.

Scope: r, functions

Learning Objectives
  • Gain proficiency using split, merge, and subset.

  • Demonstrate the ability to use the following functions to solve data-driven problem(s): mean, var, table, cut, paste, rep, seq, sort, order, length, unique, etc.

  • Read and write basic (csv) data.

  • Comprehend what a function is, and the components of a function in R.

Dataset(s)

We will use the same dataset(s) as last week:

  • /anvil/projects/tdm/data/movies_and_tv/titles.csv

  • /anvil/projects/tdm/data/movies_and_tv/episodes.csv

  • /anvil/projects/tdm/data/movies_and_tv/people.csv

  • /anvil/projects/tdm/data/movies_and_tv/ratings.csv

Please select 6000 memory when launching Jupyter for this project.

Helpful Hints

fread- is a fast and efficient way to read in data.

library(data.table)

titles <- data.frame(fread("/anvil/projects/tdm/data/movies_and_tv/titles.csv"))
episodes <- data.frame(fread("/anvil/projects/tdm/data/movies_and_tv/episodes.csv"))
people <- data.frame(fread("/anvil/projects/tdm/data/movies_and_tv/people.csv"))
ratings <- data.frame(fread("/anvil/projects/tdm/data/movies_and_tv/ratings.csv"))

Questions

Writing our own function to make a repetitive operation easier by turning it into a single command.

Take care to name the function something concise but meaningful so that others can understand what the function can be understood by other users.

Function parameters can also be called formal arguments.

Insider Knowledge

A function is an object that contains multiple interrelated statments put together in a predefined order when called(run).

Functions can be built-in or created by the user (user-defined).

Some examples of built in functions are:
  • min(), max(), mean(), median()

  • print()

  • head()

Helpful Hints

Syntax of a function

what_you_name_the_function <- function (parameters) {
  statement(s) that are executed when the function runs
  the last line of the function is the returned value
}

ONE

To gain a better insight into our data, let’s make two simple plots:

  1. A grouped bar chart see an example here

  2. A line plot see an example here

  3. What information are you gaining from either of these graphs?

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

TWO

For practice, now that you have a basic understanding of how to make a function, we will use that knowledge, applied to our dataset.

Here are pieces of a function we will use on this dataset; put them in the correct order

  • results ← merge(ratings_df, titles_df, by.x = "title_id", by.y = "title_id")

  • }

  • function(titles_df, ratings_df, ratings_of_at_least)

  • return(popular_movie_results)

  • {

  • popular_movie_results ← results[results$type == "movie" & results$rating >= ratings_of_at_least, ]

  • find_movie_with_at_least_rating ←

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

THREE

Take the above function and add comments explaining what the function does at each step.

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

FOUR

my_selection <- find_movie_with_at_least_rating(titles, ratings, 7.6)

Using the code above answer these questions.

  1. How many movies in total are there, which are above that limit?

  2. Change the limits in the function from "at least 5.0" to "lower than 5.0".

  3. How many movies have ratings lower than 5.0?

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

FIVE

Now create a function that takes a genre as the input and finds either

  1. the movie from that genre that has the largest number of votes, OR

  2. the movie from that genre that has the highest rating.

(You don’t need to do both. In the video, I discuss how to find the movie from that genre that has the highest rating.)

Items to submit
  • Code used to solve this problem.

  • Output from running the code.

Please make sure to double check that your submission is complete, and contains all of your code and output before submitting. If you are on a spotty internet connection, it is recommended to download your submission after submitting it to make sure what you think you submitted, was what you actually submitted.

In addition, please review our submission guidelines before submitting your project.