TDM 10100: Project 6 - Control Flow / Conditions

Project Objectives

Motivation: Even though loops are not the most efficient way to code in R, understanding how they work is an essential programming skill. Many other languages rely heavily on loops, so practicing them here will strengthen your overall coding ability and make it easier to transfer your skills across languages. By comparing loops and also conditions with R’s vectorized and indexing methods, you will see why R is so powerful for data analysis.

Context: We will learn how to use conditionals and begin working with loops on actual datasets. While loops are common in languages like C and Python, R is a vector-oriented language, which often allows us to avoid loops. We generally prefer not to use loops in R but in this project, we will practice conditions and loops for comparison purposes.

Scope: R, if, else, loops, iteration

Learning Objectives
  • Understand and use if/else and ifelse

  • Learn some beginner looping and vectorized ways to avoid loops

  • Practice techniques in both example sets and full columns

Make sure to read about, and use the template found here, and the important information about project submissions here.

Dataset

  • /anvil/projects/tdm/data/social_media_addiction/social_media.csv

  • /anvil/projects/tdm/data/starwars/characters.csv

  • /anvil/projects/tdm/data/death_records/DeathRecords.csv (used in the video)

If AI is used in any cases, such as for debugging, research, etc, we now require that you submit a link to the entire chat history. For example, if you used ChatGPT, there is an “Share” option in the conversation sidebar. Click on “Create Link” and please add the shareable link as a part of your citation.

The project template in the Examples Book now has a “Link to AI Chat History” section; please have this included in all your projects. If you did not use any AI tools, you may write “None”.

We allow using AI for learning purposes; however, all submitted materials (code, comments, and explanations) must all be your own work and in your own words. No content or ideas should be directly applied or copy pasted to your projects. Please refer to the-examples-book.com/projects/fall2025/syllabus#guidance-on-generative-ai. Failing to follow these guidelines is considered as academic dishonesty.

Social Media Addiction

Social media is now a huge part of everyday life. This dataset aims to show the results of a cross-country survey on the usage patterns, academic impact, and relationship social media has on students and their lives. The sample consists of 705 students aged 16-25, representing 110 countries and 12 different social media platforms.

It is known that it is not healthy to have much time spent on social media. From the survey, students shared their most used social media platform, how it affected their physical and mental health, and more. Each was asked to rate their addiction score from 1 (low addiction) to 10 (high addiction), and the vast majority of these results were on the upper end of the range. When asked about their mental health on a scale of 1 to 10, the results were leaning towards the lower end.

This dataset contains 13 columns and some of these columns include:

  • Avg_Daily_Usage_Hours: average hours per day on social media

  • Sleep_Hours_Per_Night: average hours slept nightly

  • Mental_Health_Score: self-rated mental health score

Star Wars

This is dataset is on the smaller side - containing just one row for each of less than 100 of the main characters from the Star Wars movie franchise. The data here can be useful for finding patterns with what traits there are in characters.

In the Star Wars dataset, there are 13 columns and 96 rows of data. While some columns such as hair_color and eye_color go to very specific details of each character, and description gives a good summary of how each character is relevant in the movie(s), we will stick to a few specific columns:

  • name: full known name of each character

  • species: species (when known) of each character

  • height: measured height in meters

Questions

Question 1 (2 points)

There are many colors of lightsabers in the Star Wars universe. The most commonly recognized (specifically in the Skywalker Saga) are blue, green, purple, and red. Each color represents its wielder, often signaling their alignment with the Light or Dark Side, or their rank within the Jedi or Sith orders, such as green lightsabers typically being associated with Jedi Consulars.

Before we get back to the movie, let us talk about the if statements in R. An if statement runs a block of code only if a certain condition is TRUE. Something like:

if (condition) {
do any code here
}

The if and else loops in R are used to apply conditional logic. A standard structure looks like:

if (first_condition) {
    # first result
} else if (second_condition) {
    # second_result
} else {
    # remaining_result
}

For example, let’s check if we need coffee!

need_coffee <- TRUE

if (need_coffee) {
  print("Yes! Grab a coffee first...")
} else {
  print("Wow, you must be a superhero! No coffee needed")
}

Run it one more time with need_coffee = FALSE :)

The conditionals are commonly used to classify to different values, typically within loops or functions.

While not loops themselves, if and else are often used inside loops to evaluate each item in a vector or data structure.

Now, write an if - else conditional chain that classifies the lightsaber colors into:

  • Jedi Consular - green

  • Jedi Guardian - blue

  • Unique Balance - purple

  • Sith Lord - red

  • Rare Occurrence - any other colors

Your code should determine the classification of a single value stored in a variable called color.

Each if statement should check whether color matches one of the defined color categories. If it does not match any, the final else statement should assign it as a Rare Occurrence.

Add a print() statement within each condition level to declare the color of the lightsaber in the result of running the if/else.

Assign color to a color of your choice. This needs to be declared above/before your if/else chain so color will be defined when it is time for it to be classified. Try running the if/else for a few different color values.

So far, we have only checked a single condition at a time. Now, imagine you need to check multiple conditions. In these cases, you can use ifelse. For example, using the same color classifying conditions, build a chain of ifelse statements to determine the status of the wield of the lightsaber. For color, use the vector colors:

colors <- c("green", "blue", "red", "yellow", "blue", "red", "purple", "green", "red", "blue", "red", "blue")

roles <- ifelse(colors == "green", "Jedi Consular",
         ifelse(colors == "blue", "Jedi Guardian",
         ifelse(colors == "purple", "Unique Balance",
         ifelse(colors == "red", "Sith Lord", "Rare Occurrence"))))

If your R code feels cumbersome, think vectorized! In this case, the switch function is a cleaner alternative:

mystring <- "green"
foo <- switch(EXPR=mystring, green="Jedi Consular", blue="Jedi Guardian", purple="Unique Balance", red="Sith Lord", "Rare Occurrence")
foo
Deliverables

1.1 Output a few results (at least 3) of testing different colors in the if/else
1.2 Show the status of each wielder from the vector colors
1.3 In your own understanding, what are some differences between if/else and ifelse?

Did you see the note regarding the new AI policy? Click here and read it

Question 2 (2 points)

Read in the Social Media dataset as myDF and show the dimensions and the head() of the data.

It is often the case that for students (ages 18 - 24), there is very little sleep to be had in the day-to-day, but somehow enough time to be on an electronic device - social media alone - for many hours. Looking at the table of both Sleep_Hours_Per_Night and Avg_Daily_Usage_Hours shows that some students are not getting very much sleep (as little as 3.8 hours), while some of the average social media times were as high as a frightening 8.5 hours.

One of the main differences between if/else and ifelse is that if/else checks one condition at a time, and can only be used for single values, not vectors. ifelse is able to work through entire vectors at once. Each ifelse statement only supports a single if and else pair as its structure at a time, hence why the nested ifelse lines are sometimes required.

To compare the sleep hours to the social media hours, let’s create a new column Status.

Status should be the result of using ifelse to sort by the following:

  • social media hours > sleep hours

  • social media hours = sleep hours

  • Whatever remains (social media hours < sleep hours)

For each of these three choices, add some sort of label reflecting the students and their sleep to phone ratio, such as Bad Habit, Barely Existing, Doing Fine, Doing Good, Doom Scroll, Fine Habit, Good Habit, Healthy, Lump, Sloth, Thriving, Zombie, and so on.

Print the head() of the dataframe to view this new column. Use table() to compare the values between the three categories of the Status column.

Before you dive into this question, let’s quickly revisit the indexing projects we worked on in previous weeks and see how we can accomplish the same task using indexing:

myDF$Status <- "Good"

myDF$Status[myDF$Avg_Daily_Usage_Hours > myDF$Sleep_Hours_Per_Night] <- "Zombie"

myDF$Status[myDF$Avg_Daily_Usage_Hours == myDF$Sleep_Hours_Per_Night] <- "Doom Scroll"
Deliverables

2.1 What was the longest recorded sleep time of the students? The longest social media time?
2.2 Which habit ratio was the most common among the students?

Did you see the note regarding the new AI policy? Click here and read it

Question 3 (2 points)

To use for loops, you must know, or be able to easily calculate, the number of times the loop should repeat. In situations where you do not know how many times the desired operations need to be run, you can turn to the while loop. A while loop runs and repeats while a specified condition returns TRUE, and takes the following general form:

while (loopcondition) { do any
code in here
}

A while loop uses a single logical-valued loopcondition to control how many times it repeats. Upon execution, the loopcondition is evaluated. If the condition is found to be TRUE, the bracket area code is executed line by line as usual until complete, at which point the loopcondition is checked again. The loop terminates only when the condition evaluates to FALSE, and it does so immediately, the bracket code is not run one last time.

For more information, read about while loops here

Say a student’s screen_time is 10 hours. Not even using the Social Media dataset. Just make a simple variable contains the value 10 to represent this:

screen_time <- 10

Build a while loop that continues while the screen_time is over 2 hours. While this loop is going, it should print out the student’s screen time. After this, the screen_time variable should decrease by 1. This will print out eight lines, each declaring the student’s screen time, each line one less hour than before.

Use either print(paste("", [time_variable], "")) OR cat("", [time_variable], "") to combine printing out text and a variable value. It’s up to you. For example:

screen_time <- 10

while(screen_time > 2) {
    print(paste("Screen time:", screen_time, "hours"))
    # OR
    # cat("Screen time:", screen_time, "hours")
    screen_time <- screen_time - 1
    }

Notice how the while loop continues as long as the condition (screen_time > 2) was TRUE. Once it was FALSE, the loop broke and stopped running.

Make a second while loop for a variable sleep_time that is equal to 2. This loop should run until sleep_time is no longer less than 10, increasing by 1 each time it finishes. Make sure to print out each value of sleep_time to track its progress.

Finally, build one last while loop that combines screen_time and sleep_time. In this final while loop, print screen_time and sleep_time to track their values. At the end of this loop, screen_time should decrease by .5, and sleep_time should increase by .5. This loop should only run while screen_time is greater than 2.

Don’t forget to reset the values of screen_time and sleep_time between uses. After a loop finishes, these variables will hold their final values rather than their initial ones.

Deliverables

3.1 Iterative results from the screen_time loop, and the sleep_time loop
3.2 What are some differences you noticed/read about between print(paste()) and cat()?
3.3 Results showing the final loops increasing and decreasing the values by 0.5 per iteration, respectively.

Did you see the note regarding the new AI policy? Click here and read it

We can solve the same example without any loop, as follows:

screen_time <- seq(10, 2.5, by = -0.5)   # values from 10 down to 2.5
sleep_time  <- seq(2, 9.5, by = 0.5)     # values from 2 up to 9.5

cat(paste0("Log off - screen time: ", screen_time, " hours\n",
    "Sleep more - ", sleep_time, " hours\n"))

However, sometimes you may not know the length of the vector or how far the loop should run at the beginning. In such cases, using a while loop becomes more appropriate. For example, let’s assume you need to simulate rolling a die repeatedly until the sum of all rolls exceeds 100. It then reports the final total and how many rolls it took to reach that point. Since there is randomness in this example, it is not possible to know in advance when the loop will stop. Therefore, a better solution is to use a while loop with the total as the stopping condition, as shown below:

total <- 0
rolls <- 0

while (total <= 100) {
  roll <- sample(1:6, 1)  # roll a die (random number between 1 and 6)
  total <- total + roll
  rolls <- rolls + 1
}

cat("The total is", total, "and", rolls, "dice rolls were made.\n")

There are differences in system time between vector-based and loop-based processes. We can measure the cost of each approach using the system.time() function. For example, the following code generates 10,000 random numbers from a uniform distribution:

system.time( v <- runif(10000))

The output shows the user time (the CPU time R spends on calculations), the system time (the CPU time the operating system spends on tasks such as memory handling), and the elapsed time (the actual wall-clock time it took to complete the command).

We can perform an addition operation using a vector-based approach or using a loop-based approach, then compare the difference in processing time.

system.time(sum(1:10000))
system.time({i <- 0 ; for(j in 1:10000) {i <- i+j}; print(i)})

This code uses a for loop to calculate and print the sum of numbers from 1 to 10,000, while system.time() measures how long the calculation takes in R.

You can experiment with numbers larger than 10,000 and observe the difference between loop-based and vector-based calculations.

Although the following exercises will focus on loop-based practice, keep in mind that when working with large datasets in R, vector-based computations are generally much faster.

If you choose to write loops, there are a few important rules to follow:

1 - Initialize new objects to their full length before the loop, rather than expanding them inside the loop.

2 - Avoid performing tasks inside the loop that can be done outside of it.

3 - Avoid loops to produce clearer and possibly more efficient code, not simply to avoid loops

Question 4 (2 points)

Read in the Star Wars Character dataset as characters from /anvil/projects/tdm/data/starwars/characters.csv

In pseudocode, the goal of this question is to build a while loop that runs while the character count is less than 21. If the character’s species is Human, mark it as such. Otherwise, mark it in a combined category (non-Human).

To actually go about this, make two variables:

  • i <- 1 - go through the rows of the species column

  • char_count <- 0 - count up to 20 characters

While the char_count is less than 20, the loop should continue. At the end of the loop, make sure to increase both i and char_count by 1 each, to move to the next row of the dataset, and increase the running character count, respectively.

In this while loop, we need to use if and else:

i <- 1
char_count <- 0

while(char_count < 20) {
    if (characters$species[i] == "Human") {
        cat(char_count, "This is a human\n")
        }
    else {
    print("This is not a human")
  }
    i <- i + 1
    char_count <- char_count + 1
    }

characters$species[i] indicates that the current row being worked with is number i - i.e. If i = 1, the first row. If i = 2, the second row. And so on.

Also, you can see in the code that if the character is human, it prints out the character count and the message "This is a human". If they’re not, it prints "This is a not-human".

Deliverables

4.1 How many of the first 20 characters were non-humans?
4.2 Find how many of the first 20 characters were non-humans without using a loop.

Did you see the note regarding the new AI policy? Click here and read it

In the following video, Dr. Ward shows some examples how to run conditions by indexing with Death Records data:

Question 5 (2 points)

Another option for repeating a set of operations is the repeat statement. The while loop checks the condition at the beginning of each iteration. If the condition is found to be false, the while loop doesn’t run. In a repeat loop, there is no initial condition. This loop would just continue running indefinitely unless there is a break statement in it. The repeat loop will run at least once, regardless of any conditions. The general definition is simple:

repeat{
    do any code in here
}

A repeat loop is used to iterate over a block of code multiple number of times. There is no condition check in repeat loop to exit the loop. We must ourselves put a condition explicitly inside the body of the loop and use the break statement to exit the loop. Failing to do so will result into an infinite loop.

Let’s walk through an example by first defining my_vec to contain the values 1, 4, 5, 2, 8, 4, 6, 3, 9, 3, 2, 2, 4, 1:

my_vec <- c(1, 4, 5, 2, 8, 4, 6, 3, 9, 3, 2, 2, 4, 1)

Make initial variables i and total_count as follows (Remember that indexing in R starts at 1, unlike Python, where it starts at 0):

i <- 1
total_count <- 0

In a repeat loop, make my_score equal each i of my_vec. total_count should increase by my_score each time. This loop will break if total_count is ever greater than 40, and there will be a celebratory message saying you won. (Do not forget to use i ← i + 1 in the loop.)

repeat {
    my_score <- my_vec[i]
    cat(total_count, "+ ", my_score, "= ")
    total_count <- total_count + my_score
    cat(total_count, "\n")

    if (total_count > 40) {
        print("You win!!!!!!")
        break
    }
    i <- i + 1
}

Notice the following in the code above:

After defining my_score but before increasing total_count, we have some messages like

  • cat(total_count, "+ ", my_score, "= ")

Following the increase of total_count, have cat(total_count, "\n").

Also, when using cat(), it is sometimes useful to use \n. This creates a new line following whatever has printed.

We can write the same example in fully vectorized format as following:

my_vec <- c(1, 4, 5, 2, 8, 4, 6, 3, 9, 3, 2, 2, 4, 1)

cum_total <- cumsum(my_vec)
win_index <- which(cum_total > 40)[1]

cat(paste0(c(0, cum_total)[1:win_index], " + ", my_vec[1:win_index],
           " = ", cum_total[1:win_index]), sep = "\n")
cat("\nYou win!!!!!!\n")

Let us go back to Social Media addiction data defined at the beginning of this project. Using the Mental_Health_Score column from myDF, fill in all ????? in this example:

i <- ?????
total_count <- ?????

repeat {
    student_score <- myDF$Mental_Health_Score[?????]
    cat("Mental health of student", i, "is", student_score, "\n")
    cat("Current mental health score is", total_count, "\n\n")
    total_count <- ????? + student_score

    if (total_count >= 100) {
        print("ALLL DONEEEEE")
        break
    }

    i <- i + 1
}
Deliverables

5.1 How do while and repeat compare?
5.2 Iterative output of counting up to the final mental health score.

Did you see the note regarding the new AI policy? Click here and read it

== Submitting your Work

Once you have completed the questions, save your Jupyter notebook. You can then download the notebook and submit it to Gradescope.

  • firstname_lastname_project6.ipynb

You must double check your .ipynb after submitting it in gradescope. A very common mistake is to assume that your .ipynb file has been rendered properly and contains your code, markdown, and code output even though it may not. Please take the time to double check your work. See here for instructions on how to double check this.

You will not receive full credit if your .ipynb file does not contain all of the information you expect it to, or if it does not render properly in Gradescope. Please ask a TA if you need help with this.