STAT 19000: Project 14 — Fall 2020

Motivation: Functions are the building blocks of more complex programming. It’s vital that you understand how to read and write functions. In this project we will incrementally build and improve upon a function designed to recommend a beer. Note that you will not be winning any awards for this recommendation system, it is just for fun!

Context: One of the main focuses throughout the semester has been on functions, and for good reason. In this project we will continue to exercise our R skills and build up our recommender function.

Scope: r, functions

Learning objectives
  • Read and write basic (csv) data.

  • Explain and demonstrate: positional, named, and logical indexing.

  • Utilize apply functions in order to solve a data-driven problem.

  • Gain proficiency using split, merge, and subset.

Dataset

The following questions will use the dataset found in Scholar:

/class/datamine/data/beer/

Questions

Question 1

Read /class/datamine/data/beer/beers.csv into a data.frame named beers. Read /class/datamine/data/beer/breweries.csv into a data.frame named breweries. Read /class/datamine/data/beer/reviews.csv into a data.frame named reviews. As in the previous project, make sure you used the fread function from the data.table package, and convert the data.table to a data.frame. We want to create a very basic beer recommender. We will start simple. Create a function called recommend_a_beer that takes as input my_beer_id (a single value) and returns a vector of beer_ids from the same style. Test your function on 2093.

Make sure you do not include the given my_beer_id in the vector of beer_ids containing the `beer_ids`of your recommended beers.

You may find the function setdiff useful. Run the example below to get an idea of what it does.

You will not win any awards for this recommendation system!

x <- c('a','b','b','c')
y <- c('c','b','d','e','f')
setdiff(x,y)
setdiff(y,x)
Items to submit
  • R code used to solve the problem.

  • Length of result from recommend_a_beer(2093).

  • The result of 2093 %in% recommend_a_beer(2093).

Question 2

That is a lot of beer recommendations! Let’s try to narrow it down. Include an argument in your function called min_score with default value of 4.5. Our recommender will only recommend beer_ids with a review score of at least min_score. Test your improved beer recommender with the same beer_id from question (1).

Note that now we need to look at both beers and reviews datasets.

Items to submit
  • R code used to solve the problem.

  • Length of result from recommend_a_beer(2093).

Question 3

There is still room for improvement (obviously) for our beer recommender. Include a new argument in your function called same_brewery_only with default value FALSE. This argument will determine whether or not our beer recommender will return only beers from the same brewery. Test our newly improved beer recommender with the same beer_id from question (1) with the argument same_brewery_only set to TRUE.

You may find the function intersect useful. Run the example below to get an idea of what it does.

x <- c('a','b','b','c')
y <- c('c','b','d','e','f')
intersect(x,y)
intersect(y,x)
Items to submit
  • R code used to solve the problem.

  • Length of result from recommend_a_beer(2093, same_brewery_only=TRUE).

Question 4

Oops! Bad idea! Maybe including only beers from the same brewery is not the best idea. Add an argument to our beer recommender named type. If type=style our recommender will recommend beers based on the style as we did in question (3). If type=reviewers, our recommender will recommend beers based on reviewers with "similar taste". Select reviewers that gave score equal to or greater than min_score for the given beer id (my_beer_id). For those reviewers, find the beer_ids for other beers that these reviewers have given a score of at least min_score. These beer_ids are the ones our recommender will return. Be sure to test our improved recommender on the same beer_id as in (1)-(3).

Items to submit
  • R code used to solve the problem.

  • Length of result from recommend_a_beer(2093, type="reviewers").

Question 5

Let’s try to narrow down the recommendations. Include an argument called abv_range that indicates the abv range we would like the recommended beers to be at. Set abv_range default value to NULL so that if a user does not specify the abv_range our recommender does not consider it. Test our recommender for beer_id 2093, with abv_range = c(8.9,9.1) and min_score=4.9.

You may find the function is.null useful.

Items to submit
  • R code used to solve the problem.

  • Length of result from recommend_a_beer(2093, abv_range=c(8.9, 9.1), type="reviewers", min_score=4.9).

Question 6

Play with our recommend_a_beer function. Include another feature to it. Some ideas are: putting a limit on the number of beer_id`s we will return, error catching (what if we don’t have reviews for a given `beer_id?), including a plot to the output, returning beer names instead of ids or new arguments to decide what `beer_id`s to recommend. Be creative and have fun!

Items to submit
  • R code used to solve the problem.

  • The result from running the improved recommend_a_beer function showcasing your improvements to it.

  • 1-2 sentecens commenting on what you decided to include and why.