# STAT 19000: Project 8 — Fall 2020

Motivation: A key component to writing efficient code is writing functions. Functions allow us to repeat and reuse coding steps that we used previously, over and over again. If you find you are repeating code over and over, a function may be a good way to reduce lots of lines of code!

Context: We’ve been learning about and using functions all year! Now we are going to learn more about some of the terminology and components of a function, as you will certainly need to be able to write your own functions soon.

Scope: r, functions

Learning objectives
• Gain proficiency using split, merge, and subset.

• Demonstrate the ability to use the following functions to solve data-driven problem(s): mean, var, table, cut, paste, rep, seq, sort, order, length, unique, etc.

• Read and write basic (csv) data.

• Explain and demonstrate: positional, named, and logical indexing.

• Demonstrate how to use tapply to solve data-driven problems.

• Comprehend what a function is, and the components of a function in R.

## Dataset

The following questions will use the dataset found in Scholar:

`/class/datamine/data/goodreads/csv`

## Questions

 Please make sure to look at your knit PDF before submitting. PDFs should be relatively short and not contain huge amounts of printed data. Remember you can use functions like `head` to print a sample of the data or output. Extremely large PDFs will be subject to lose points.

### Question 1

Read in the same data, in the same way as the previous project (with the same names). We’ve provided you with the function below. How many arguments does the function have? Name all of the arguments. What is the name of the function? Replace the `description` column in our `books` data.frame with the same information, but with stripped punctuation using the function provided.

``````# A function that, given a string (myColumn), returns the string
# without any punctuation.
strip_punctuation <- function(myColumn) {
# Use regular expressions to identify punctuation.
# Replace identified punctuation with an empty string ''.
desc_no_punc <- gsub('[[:punct:]]+', '', myColumn)

# Return the result
return(desc_no_punc)
}``````
 Since `gsub` accepts a vector of values, you can pass an entire vector to `strip_punctuation`.
Items to submit
• R code used to solve the problem.

• How many arguments does the function have?

• What are the name(s) of all of the arguments?

• What is the name of the function?

### Question 2

Use the `strsplit` function to split a string by spaces. Some examples would be:

``````strsplit("This will split by space.", " ")
strsplit("This. Will. Split. By. A. Period.", "\\.")``````

An example string is:

``test_string <- "This is  a test string  with no punctuation"``

Test out `strsplit` using the provided `test_string`. Make sure to copy and paste the code that declares `test_string`. If you counted the words shown in your results, would it be an accurate count? Why or why not?

Relevant topics: [strsplit](#r-strsplit), [functions](#r-writing-functions)

Items to submit
• R code used to solve the problem.

• 1-2 sentences explaining why or why not your count would be accurate.

### Question 3

Fix the issue in (2), using `which`. You may need to `unlist` the `strsplit` result first. After you’ve accomplished this, you can count the remaining words!

Items to submit
• R code used to solve the problem (including counting the words).

### Question 4

We are finally to the point where we have code from questions (2) and (3) that we think we may want to use many times. Write a function called `count_words` which, given a string, `description`, returns the number of words in `description`. Test out `count_words` on the `description` from the second row of `books`. How many words are in the description?

Items to submit
• R code used to solve the problem.

• The result of using the function on the `description` from the second row of `books`.

### Question 5

Practice makes perfect! Write a function of your own design that is intended on being used with one of our datasets. Test it out and share the results.

 You could even pass (as an argument) one of our datasets to your function and calculate a cool statistic or something like that! Maybe your function makes a plot? Who knows?
Items to submit
• R code used to solve the problem.

• An example (with output) of using your newly created function.