# STAT 19000: Project 9 — Fall 2020

Motivation: A key component to writing efficient code is writing functions. Functions allow us to repeat and reuse coding steps that we used previously, over and over again. If you find you are repeating code over and over, a function may be a good way to reduce lots of lines of code!

Context: We’ve been learning about and using functions all year! Now we are going to learn more about some of the terminology and components of a function, as you will certainly need to be able to write your own functions soon.

Scope: r, functions

Learning objectives
• Gain proficiency using split, merge, and subset.

• Demonstrate the ability to use the following functions to solve data-driven problem(s): mean, var, table, cut, paste, rep, seq, sort, order, length, unique, etc.

• Read and write basic (csv) data.

• Explain and demonstrate: positional, named, and logical indexing.

• Demonstrate how to use tapply to solve data-driven problems.

• Comprehend what a function is, and the components of a function in R.

## Dataset

The following questions will use the dataset found in Scholar:

`/class/datamine/data/goodreads/csv`

## Questions

 Please make sure to look at your knit PDF before submitting. PDFs should be relatively short and not contain huge amounts of printed data. Remember you can use functions like `head` to print a sample of the data or output. Extremely large PDFs will be subject to lose points.

### Question 1

We’ve provided you with a function below. How many arguments does the function have, and what are their names? You can get a `book_id` from the URL of a goodreads book’s webpage.

Two examples:

Find 2 or 3 `book_id` and test out the function until you get two successes. Explain in words, what the function is doing, and what options you have.

``````library(imager)
get_author_name <- function(my_authors_dataset, my_author_id){
return(my_authors_dataset[my_authors_dataset\$author_id==my_author_id,'name'])
}
fun_plot <- function(my_authors_dataset, my_books_dataset, my_book_id, display_cover=T) {
book_info <- my_books_dataset[my_books_dataset\$book_id==my_book_id,]
all_books_by_author <- my_books_dataset[my_books_dataset\$author_id==book_info\$author_id,]
author_name <- get_author_name(my_authors_dataset, book_info\$author_id)

if(display_cover){
par(mfrow=c(1,2))
plot(img, axes=FALSE)
}

plot(all_books_by_author\$num_pages, all_books_by_author\$average_rating,
ylim=c(0,5.1), pch=21, bg='grey80',
xlab='Number of pages', ylab='Average rating',
main=paste('Books by', author_name))

points(book_info\$num_pages, book_info\$average_rating,pch=21, bg='orange', cex=1.5)
}``````
Items to submit
• How many arguments does the function have, and what are their names?

• The result of using the function on 2-3 `book_id`s.

• 1-2 sentences explaining what the function does (generally), and what (if any) options the function provides you with.

### Question 2

You may have encountered a situation where the `my_book_id` was not in our dataset, and hence, didn’t get plotted. When writing functions, it is usually best to try and foresee issues like this and have the function fail gracefully, instead of showing some ugly (and sometimes unclear) warning. Add some code at the beginning of our function that checks to see if `my_book_id` is within our dataset, and if it does not exist, prints "Book ID not found.", and exits the function. Test it out on `book_id=123` and `book_id=19063`.

 Run `?stop` to see if that is a function that may be useful.
Items to submit
• R code with your new and improved function.

• The results from `fun_plot(123)`.

• The results from `fun_plot(19063)`.

### Question 3

We have this nice `get_author_name` function that accepts a dataset (in this case, our `authors` dataset), and a `book_id` and returns the name of the author. Write a new function called `get_author_id` that accepts an authors name and returns the `author_id` of the author.

You can test your function using some of these examples:

``````get_author_id(authors, "Brandon Sanderson") # 38550
get_author_id(authors, "J.K. Rowling") # 1077326``````
Items to submit
• R code containing your new function.

• The results of using your new function on a few authors.

### Question 4

See the function below.

``````search_books_for_word <- function(word) {
return(books[grepl(word, books\$description, fixed=T),]\$title)
}``````

Given a word, `search_books_for_word` returns the titles of books where the provided word is inside the book’s description. `search_books_for_word` utilizes the `books` dataset internally. It requires that the `books` dataset has been loaded into the environment prior to running (and with the correct name). By including and referencing objects defined outside of our function’s scope within our function (in this case the variable `books`), our `search_books_for_word` function will be more prone to errors, as any changes to those objects may break our function. For example:

``````our_function <- function(x) {
print(paste("Our argument is:", x))
print(paste("Our variable is:", my_variable))
}
# our variable outside the scope of our_function
my_variable <- "dog"
# run our_function
our_function("first")
# change the variable outside the scope of our function
my_variable <- "cat"
# run our_function again
our_function("second")
# imagine a scenario where "my_variable" doesn't exist, our_function would break!
rm(my_variable)
our_function("third")``````

Fix our `search_books_for_word` function to accept the `books` dataset as an argument called `my_books_dataset` and utilize `my_books_dataset` within the function instead of the global variable `books`.

Items to submit
• R code with your new and improved function.

• An example using the updated function.

### Question 5

Write your own custom function. Make sure your function includes at least 2 arguments. If you access one of our datasets from within your function (which you definitely should do), use what you learned in (4), to avoid future errors dealing with scoping. Your function could output a cool plot, interesting tidbits of information, or anything else you can think of. Get creative and make a function that is fun to use!

Items to submit
• R code used to solve the problem.

• Examples using your function with included output.