Control Flow

If/else statements

if, else, and else if are keywords in R that allow you to direct the progression of your code. These commands test logical statements, executing certain code depending on the statement’s truth.

The structure and requirements of if/else statements are fairly rigid.

  • There cannot be an else without an if.

  • if statements do not need an else — they can stand alone.

  • You may have any amount of else if sections, but they must be within if and else blocks

  • if and else if require logical statements contained within parentheses.

  • All blocks must have curly braces, { to open and } to close, with the desired executable code contained within.

The rules and structure will become clearer in the examples.


Examples

How do I print "Success!" if my expression is TRUE, and "Failure!" otherwise?

Click to see solution
# Randomly assign either TRUE or FALSE to our variable.
t_or_f <- sample(c(TRUE,FALSE),1)

if (t_or_f == TRUE) {
  # If t_or_f is TRUE, print success
  print("Success!")
} else {
  # Otherwise, print failure
  print("Failure!")
}
[1] "Failure!"

For variables that contain either TRUE or FALSE, we actually don’t need the == TRUE part of the if-statement. This is because if/else statements intrinsically utilize TRUE or FALSE values when it comes to executing code.

if (t_or_f) {
  print("Success!")
} else {
  print("Failure!")
}
[1] "Success!"

How do I print "Success!" if my expression is TRUE, "Failure!" if my expression is FALSE, and "Huh?" if it is neither?

Click to see solution
# Randomly assign either TRUE or FALSE to t_or_f.
schrodinger_boolean <- sample(c(TRUE,FALSE,"Something else"), 1)

if (schrodinger_boolean == TRUE) {
  print("Success!")
} else if (schrodinger_boolean == FALSE) {
  print("Failure!")
} else {
  print("Huh?")
}
[1] "Failure!"

Unlike in the first example, the sample space of schrodinger_boolean includes non-logical elements ("Something else"), so it isn’t possible to use the shorthand.

For loops

for loops allow us to execute similar code repeatedly until we’ve looped through all of an object’s elements. For example, if we wanted to format the dates in a list, we could use a for loop to run through each element of the list to format it.

While crucial in other programming languages, R has apply functions that are faster and more powerful than loops. As with many things, apply functions do have situations where they lack efficiency, so learning loops is still vital for mastering R.


Examples

How do print every value in a vector?

Click to see solution
for (i in 1:10) {
  # In the first iteration of the loop, i will be 1. The next, i will be 2, and so on until the vector's values are exhausted
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10

How do I break out of a loop before it finishes?

Click to see solution
for (i in 1:10) {
  if (i==7) {
    # When i==7, we will exit the loop without continuing.
    break
  }
  print(i)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6

How do I loop through a vector of names?

Click to see solution
friends <- c("Phoebe", "Ross", "Rachel", "Chandler", "Joey", "Monica")
my_string <- "So no one told you life was gonna be this way, "
for (friend in friends) {
  print(paste0(my_string, friend, "!"))
}
[1] "So no one told you life was gonna be this way, Phoebe!"
[1] "So no one told you life was gonna be this way, Ross!"
[1] "So no one told you life was gonna be this way, Rachel!"
[1] "So no one told you life was gonna be this way, Chandler!"
[1] "So no one told you life was gonna be this way, Joey!"
[1] "So no one told you life was gonna be this way, Monica!"

Check out the paste & paste0 page if you’re confused about its utility here.

How do I skip a loop if some expression evaluates to TRUE?

Click to see solution
friends <- c("Phoebe", "Ross", "Mike", "Rachel", "Chandler", "Joey", "Monica")
my_string <- "So no one told you life was gonna be this way, "
for (friend in friends) {
  if (friend == "Mike") {
    # `next` skips over the rest of the code for this loop
    # and continues to the next element
    next
  }
  print(paste0(my_string, friend, "!"))
}
[1] "So no one told you life was gonna be this way, Phoebe!"
[1] "So no one told you life was gonna be this way, Ross!"
[1] "So no one told you life was gonna be this way, Rachel!"
[1] "So no one told you life was gonna be this way, Chandler!"
[1] "So no one told you life was gonna be this way, Joey!"
[1] "So no one told you life was gonna be this way, Monica!"

Video Example 1 — Vectorized Functions and Loops

Click to see solution

This is usually how we write loops in other languages if we want to add the first 10 billion integers.

mytotal <- 0
for (i in 1:10000000000) {
  mytotal <- mytotal + i
}
mytotal
[1] 5e+19

This works, but takes a long time to run. The sum function is vectorized, meaning it will consider all values in a vector at the same time. It will very simply take every integer in the parentheses and add them all together.

sum(1:10000000000)
[1] 5e+19

Video Example 2 — Grocery Store Averages, Loop vs. Non-loop

Click to see solution

Let’s use some grocery store data to demonstrate the difference between loop strategies and non-loop strategies.

myDF <- read.csv("/class/datamine/data/8451/The_Complete_Journey_2_Master/5000_transactions.csv")
head(myDF)
  BASKET_NUM HSHD_NUM PURCHASE_ PRODUCT_NUM SPEND UNITS STORE_R WEEK_NUM YEAR
1         24     1809 03-JAN-16     5817389 -1.50    -1   SOUTH        1 2016
2         24     1809 03-JAN-16     5829886 -1.50    -1   SOUTH        1 2016
3         34     1253 03-JAN-16      539501  2.19     1    EAST        1 2016
4         60     1595 03-JAN-16     5260099  0.99     1    WEST        1 2016
5         60     1595 03-JAN-16     4535660  2.50     2    WEST        1 2016
6        168     3393 03-JAN-16     5602916  4.50     1   SOUTH        1 2016

This is how we find the average cost per line in other languages, for instance, C/C++, Python, Java, etc. The for loop being used here calculates the length of myDF$SPEND, and runs just enough times to reach the end.

amountspent <- 0       # we initialize a variable to keep track of the entire price of the purchases
numberofitems <- 0     # and we initialize a variable to keep track of the number of purchases
for (myprice in myDF$SPEND) {
  amountspent <- amountspent + myprice     # we add the price of the current purchase
  numberofitems <- numberofitems + 1       # and we increment (by 1) the number o purchases processed so far
}
amountspent     # this is the total amount spent on all purchases
[1] 3584366
numberofitems   # this is the total number of purchases
[1] 1e+06
amountspent/numberofitems       # so this is the average
[1] 3.584366

Now, that technically works, but it’s not efficient! Let’s try using the mean function instead to get an average:

mean(myDF$SPEND)
[1] 3.584366

As we can see, mean is a much more efficient way to use a vectorized function in R, to accomplish the same purpose. The vector is the column myDF$SPEND (where myDF is a dataframe and the $ allows us to specify the SPEND column in this dataframe). We can just focus our attention on that column from the data frame, and take a mean.

Video Example 3 — New Columns from Existing Data using Conditional Statements

Click to see solution

We’re looking at grocery store information again for this example. This time, we have two days from which purchases are considered contaminated: July 5th-6th, 2016. Let’s refresh on how the data is formatted.

myDF <- read.csv("/class/datamine/data/8451/The_Complete_Journey_2_Master/5000_transactions.csv")
head(myDF)
  BASKET_NUM HSHD_NUM PURCHASE_ PRODUCT_NUM SPEND UNITS STORE_R WEEK_NUM YEAR
1         24     1809 03-JAN-16     5817389 -1.50    -1   SOUTH        1 2016
2         24     1809 03-JAN-16     5829886 -1.50    -1   SOUTH        1 2016
3         34     1253 03-JAN-16      539501  2.19     1    EAST        1 2016
4         60     1595 03-JAN-16     5260099  0.99     1    WEST        1 2016
5         60     1595 03-JAN-16     4535660  2.50     2    WEST        1 2016
6        168     3393 03-JAN-16     5602916  4.50     1   SOUTH        1 2016

We’ll make a vector called "mystatus" that matches the length of our existing data.frame, setting the default value to "safe" for every entry.

mystatus <- rep("safe", times=nrow(myDF))

rep is perfect for our needs.

Looking at PURCHASE_, we see that the format is DD-MMM-YY, corresponding to 2-digit day, 3-character month abbreviation, and last two values of the year. We can now change the entries for the elements of mystatus that occurred on 05-JUL-16 or on 06-JUL-16 to "contaminated".

mystatus[(myDF$PURCHASE_ == "05-JUL-16")|(myDF$PURCHASE_ == "06-JUL-16")] <- "contaminated"

Remember that myDF$PURCHASE_ == will give us an index, and since length(mystatus) == nrow(myDF), the indices will match. We can factor mystatus to create a categorical vector, then rename it and add it to myDF.

myDF$safetystatus <- factor(mystatus)

Now the head of the data.frame looks like this…​

head(myDF)
  BASKET_NUM HSHD_NUM PURCHASE_ PRODUCT_NUM SPEND UNITS STORE_R WEEK_NUM YEAR
1         24     1809 03-JAN-16     5817389 -1.50    -1   SOUTH        1 2016
2         24     1809 03-JAN-16     5829886 -1.50    -1   SOUTH        1 2016
3         34     1253 03-JAN-16      539501  2.19     1    EAST        1 2016
4         60     1595 03-JAN-16     5260099  0.99     1    WEST        1 2016
5         60     1595 03-JAN-16     4535660  2.50     2    WEST        1 2016
6        168     3393 03-JAN-16     5602916  4.50     1   SOUTH        1 2016
  safetystatus
1         safe
2         safe
3         safe
4         safe
5         safe
6         safe

…​and the distribution of contaminated and safe entries is as follows:

table(myDF$safetystatus)
contaminated         safe
        2459       997541