TDM 10100: Project 6 - Control Flow / Conditions
Project Objectives
Motivation: Even though loops are not the most efficient way to code in R, understanding how they work is an essential programming skill. Many other languages rely heavily on loops, so practicing them here will strengthen your overall coding ability and make it easier to transfer your skills across languages. By comparing loops and also conditions with R’s vectorized and indexing methods, you will see why R is so powerful for data analysis.
Context: We will learn how to use conditionals and begin working with loops on actual datasets. While loops are common in languages like C and Python, R is a vector-oriented language, which often allows us to avoid loops. We generally prefer not to use loops in R but in this project, we will practice conditions and loops for comparison purposes.
Scope: R, if, else, loops, iteration
Dataset
-
/anvil/projects/tdm/data/social_media_addiction/social_media.csv
-
/anvil/projects/tdm/data/starwars/characters.csv
-
/anvil/projects/tdm/data/death_records/DeathRecords.csv (used in the video)
If AI is used in any cases, such as for debugging, research, etc, we now require that you submit a link to the entire chat history. For example, if you used ChatGPT, there is an “Share” option in the conversation sidebar. Click on “Create Link” and please add the shareable link as a part of your citation. The project template in the Examples Book now has a “Link to AI Chat History” section; please have this included in all your projects. If you did not use any AI tools, you may write “None”. We allow using AI for learning purposes; however, all submitted materials (code, comments, and explanations) must all be your own work and in your own words. No content or ideas should be directly applied or copy pasted to your projects. Please refer to the-examples-book.com/projects/fall2025/syllabus#guidance-on-generative-ai. Failing to follow these guidelines is considered as academic dishonesty. |
Social Media Addiction
Social media is now a huge part of everyday life. This dataset aims to show the results of a cross-country survey on the usage patterns, academic impact, and relationship social media has on students and their lives. The sample consists of 705 students aged 16-25, representing 110 countries and 12 different social media platforms.
It is known that it is not healthy to have much time spent on social media. From the survey, students shared their most used social media platform, how it affected their physical and mental health, and more. Each was asked to rate their addiction score from 1 (low addiction) to 10 (high addiction), and the vast majority of these results were on the upper end of the range. When asked about their mental health on a scale of 1 to 10, the results were leaning towards the lower end.
This dataset contains 13 columns and some of these columns include:
-
Avg_Daily_Usage_Hours
: average hours per day on social media -
Sleep_Hours_Per_Night
: average hours slept nightly -
Mental_Health_Score
: self-rated mental health score
Star Wars
This is dataset is on the smaller side - containing just one row for each of less than 100 of the main characters from the Star Wars movie franchise. The data here can be useful for finding patterns with what traits there are in characters.
In the Star Wars dataset, there are 13 columns and 96 rows of data. While some columns such as hair_color
and eye_color
go to very specific details of each character, and description
gives a good summary of how each character is relevant in the movie(s), we will stick to a few specific columns:
-
name
: full known name of each character -
species
: species (when known) of each character -
height
: measured height in meters
Questions
Question 1 (2 points)
There are many colors of lightsabers in the Star Wars universe. The most commonly recognized (specifically in the Skywalker Saga) are blue, green, purple, and red. Each color represents its wielder, often signaling their alignment with the Light or Dark Side, or their rank within the Jedi or Sith orders, such as green lightsabers typically being associated with Jedi Consulars.
Before we get back to the movie, let us talk about the if
statements in R. An if
statement runs a block of code only if a certain condition is TRUE. Something like:
if (condition) {
do any code here
}
The if
and else
loops in R are used to apply conditional logic. A standard structure looks like:
if (first_condition) {
# first result
} else if (second_condition) {
# second_result
} else {
# remaining_result
}
For example, let’s check if we need coffee!
need_coffee <- TRUE
if (need_coffee) {
print("Yes! Grab a coffee first...")
} else {
print("Wow, you must be a superhero! No coffee needed")
}
Run it one more time with need_coffee = FALSE
:)
The conditionals are commonly used to classify to different values, typically within loops or functions.
While not loops themselves, |
Now, write an if - else
conditional chain that classifies the lightsaber colors into:
-
Jedi Consular
- green -
Jedi Guardian
- blue -
Unique Balance
- purple -
Sith Lord
- red -
Rare Occurrence
- any other colors
Your code should determine the classification of a single value stored in a variable called color
.
Each if
statement should check whether color
matches one of the defined color categories. If it does not match any, the final else
statement should assign it as a Rare Occurrence
.
Add a |
Assign color
to a color of your choice. This needs to be declared above/before your if/else
chain so color
will be defined when it is time for it to be classified. Try running the if/else
for a few different color values.
So far, we have only checked a single condition at a time. Now, imagine you need to check multiple conditions. In these cases, you can use ifelse
. For example, using the same color classifying conditions, build a chain of ifelse
statements to determine the status of the wield of the lightsaber. For color
, use the vector colors
:
colors <- c("green", "blue", "red", "yellow", "blue", "red", "purple", "green", "red", "blue", "red", "blue")
roles <- ifelse(colors == "green", "Jedi Consular",
ifelse(colors == "blue", "Jedi Guardian",
ifelse(colors == "purple", "Unique Balance",
ifelse(colors == "red", "Sith Lord", "Rare Occurrence"))))
If your R code feels cumbersome, think vectorized! In this case, the switch
function is a cleaner alternative:
mystring <- "green"
foo <- switch(EXPR=mystring, green="Jedi Consular", blue="Jedi Guardian", purple="Unique Balance", red="Sith Lord", "Rare Occurrence")
foo
1.1 Output a few results (at least 3) of testing different colors in the if/else
1.2 Show the status of each wielder from the vector colors
1.3 In your own understanding, what are some differences between if/else
and ifelse
?
Did you see the note regarding the new AI policy? Click here and read it |
Question 2 (2 points)
Read in the Social Media dataset as myDF
and show the dimensions and the head()
of the data.
It is often the case that for students (ages 18 - 24), there is very little sleep to be had in the day-to-day, but somehow enough time to be on an electronic device - social media alone - for many hours. Looking at the table of both Sleep_Hours_Per_Night
and Avg_Daily_Usage_Hours
shows that some students are not getting very much sleep (as little as 3.8 hours), while some of the average social media times were as high as a frightening 8.5 hours.
One of the main differences between if/else
and ifelse
is that if/else
checks one condition at a time, and can only be used for single values, not vectors. ifelse
is able to work through entire vectors at once. Each ifelse
statement only supports a single if
and else
pair as its structure at a time, hence why the nested ifelse
lines are sometimes required.
To compare the sleep hours to the social media hours, let’s create a new column Status
.
Status
should be the result of using ifelse
to sort by the following:
-
social media hours > sleep hours
-
social media hours = sleep hours
-
Whatever remains (social media hours < sleep hours)
For each of these three choices, add some sort of label reflecting the students and their sleep to phone ratio, such as Bad Habit
, Barely Existing
, Doing Fine
, Doing Good
, Doom Scroll
, Fine Habit
, Good Habit
, Healthy
, Lump
, Sloth
, Thriving
, Zombie
, and so on.
Print the head()
of the dataframe to view this new column. Use table()
to compare the values between the three categories of the Status
column.
Before you dive into this question, let’s quickly revisit the indexing projects we worked on in previous weeks and see how we can accomplish the same task using indexing:
myDF$Status <- "Good"
myDF$Status[myDF$Avg_Daily_Usage_Hours > myDF$Sleep_Hours_Per_Night] <- "Zombie"
myDF$Status[myDF$Avg_Daily_Usage_Hours == myDF$Sleep_Hours_Per_Night] <- "Doom Scroll"
2.1 What was the longest recorded sleep time of the students? The longest social media time?
2.2 Which habit ratio was the most common among the students?
Did you see the note regarding the new AI policy? Click here and read it |
Question 3 (2 points)
To use for loops, you must know, or be able to easily calculate, the number of times the loop should repeat. In situations where you do not know how many times the desired operations need to be run, you can turn to the while
loop. A while loop runs and repeats while a specified condition returns TRUE
, and takes the following general form:
while (loopcondition) { do any
code in here
}
A while loop uses a single logical-valued loopcondition
to control how many times it repeats. Upon execution, the loopcondition
is evaluated. If the condition is found to be TRUE
, the bracket area code is executed line by line as usual until complete, at which point the loopcondition
is checked again. The loop terminates only when the condition evaluates to FALSE, and it does so immediately, the bracket code is not run one last time.
For more information, read about while
loops here
Say a student’s screen_time
is 10 hours
. Not even using the Social Media dataset. Just make a simple variable contains the value 10
to represent this:
screen_time <- 10
Build a while
loop that continues while the screen_time
is over 2 hours. While this loop is going, it should print out the student’s screen time. After this, the screen_time
variable should decrease by 1. This will print out eight lines, each declaring the student’s screen time, each line one less hour than before.
Use either
|
Notice how the |
Make a second while
loop for a variable sleep_time
that is equal to 2. This loop should run until sleep_time
is no longer less than 10, increasing by 1 each time it finishes. Make sure to print out each value of sleep_time
to track its progress.
Finally, build one last while
loop that combines screen_time
and sleep_time
. In this final while
loop, print screen_time
and sleep_time
to track their values. At the end of this loop, screen_time
should decrease by .5, and sleep_time
should increase by .5. This loop should only run while screen_time
is greater than 2.
Don’t forget to reset the values of |
3.1 Iterative results from the screen_time
loop, and the sleep_time
loop
3.2 What are some differences you noticed/read about between print(paste()) and cat()?
3.3 Results showing the final loops increasing and decreasing the values by 0.5 per iteration, respectively.
Did you see the note regarding the new AI policy? Click here and read it |
We can solve the same example without any loop, as follows:
However, sometimes you may not know the length of the vector or how far the loop should run at the beginning. In such cases, using a
|
There are differences in system time between vector-based and loop-based processes. We can measure the cost of each approach using the
The output shows the user time (the CPU time R spends on calculations), the system time (the CPU time the operating system spends on tasks such as memory handling), and the elapsed time (the actual wall-clock time it took to complete the command). We can perform an addition operation using a vector-based approach or using a loop-based approach, then compare the difference in processing time.
This code uses a for loop to calculate and print the sum of numbers from 1 to 10,000, while You can experiment with numbers larger than 10,000 and observe the difference between loop-based and vector-based calculations. Although the following exercises will focus on loop-based practice, keep in mind that when working with large datasets in R, vector-based computations are generally much faster. If you choose to write loops, there are a few important rules to follow: 1 - Initialize new objects to their full length before the loop, rather than expanding them inside the loop. 2 - Avoid performing tasks inside the loop that can be done outside of it. 3 - Avoid loops to produce clearer and possibly more efficient code, not simply to avoid loops |
Question 4 (2 points)
Read in the Star Wars Character dataset as characters
from /anvil/projects/tdm/data/starwars/characters.csv
In pseudocode, the goal of this question is to build a while
loop that runs while the character count is less than 21. If the character’s species
is Human
, mark it as such. Otherwise, mark it in a combined category (non-Human
).
To actually go about this, make two variables:
-
i <- 1
- go through the rows of thespecies
column -
char_count <- 0
- count up to 20 characters
While the char_count
is less than 20, the loop should continue. At the end of the loop, make sure to increase both i
and char_count
by 1 each, to move to the next row of the dataset, and increase the running character count, respectively.
In this while
loop, we need to use if
and else
:
i <- 1
char_count <- 0
while(char_count < 20) {
if (characters$species[i] == "Human") {
cat(char_count, "This is a human\n")
}
else {
print("This is not a human")
}
i <- i + 1
char_count <- char_count + 1
}
Also, you can see in the code that if the character is human, it prints out the character count and the message |
4.1 How many of the first 20 characters were non-humans?
4.2 Find how many of the first 20 characters were non-humans without using a loop.
Did you see the note regarding the new AI policy? Click here and read it |
In the following video, Dr. Ward shows some examples how to run conditions by indexing with Death Records data: |
Question 5 (2 points)
Another option for repeating a set of operations is the repeat
statement. The while
loop checks the condition at the beginning of each iteration. If the condition is found to be false, the while
loop doesn’t run. In a repeat
loop, there is no initial condition. This loop would just continue running indefinitely unless there is a break statement in it. The repeat
loop will run at least once, regardless of any conditions. The general definition is simple:
repeat{
do any code in here
}
A repeat loop is used to iterate over a block of code multiple number of times. There is no condition check in repeat loop to exit the loop. We must ourselves put a condition explicitly inside the body of the loop and use the break statement to exit the loop. Failing to do so will result into an infinite loop.
Let’s walk through an example by first defining my_vec
to contain the values 1, 4, 5, 2, 8, 4, 6, 3, 9, 3, 2, 2, 4, 1
:
my_vec <- c(1, 4, 5, 2, 8, 4, 6, 3, 9, 3, 2, 2, 4, 1)
Make initial variables i
and total_count
as follows (Remember that indexing in R starts at 1, unlike Python, where it starts at 0):
i <- 1
total_count <- 0
In a repeat
loop, make my_score
equal each i
of my_vec
. total_count
should increase by my_score
each time. This loop will break if total_count
is ever greater than 40
, and there will be a celebratory message saying you won. (Do not forget to use i ← i + 1
in the loop.)
repeat {
my_score <- my_vec[i]
cat(total_count, "+ ", my_score, "= ")
total_count <- total_count + my_score
cat(total_count, "\n")
if (total_count > 40) {
print("You win!!!!!!")
break
}
i <- i + 1
}
Notice the following in the code above: After defining
Following the increase of Also, when using |
We can write the same example in fully vectorized format as following:
|
Let us go back to Social Media addiction data defined at the beginning of this project. Using the Mental_Health_Score
column from myDF
, fill in all ?????
in this example:
i <- ?????
total_count <- ?????
repeat {
student_score <- myDF$Mental_Health_Score[?????]
cat("Mental health of student", i, "is", student_score, "\n")
cat("Current mental health score is", total_count, "\n\n")
total_count <- ????? + student_score
if (total_count >= 100) {
print("ALLL DONEEEEE")
break
}
i <- i + 1
}
5.1 How do while
and repeat
compare?
5.2 Iterative output of counting up to the final mental health score.
Did you see the note regarding the new AI policy? Click here and read it |
== Submitting your Work
Once you have completed the questions, save your Jupyter notebook. You can then download the notebook and submit it to Gradescope.
-
firstname_lastname_project6.ipynb
You must double check your .ipynb
after submitting it in gradescope. A very common mistake is to assume that your .ipynb
file has been rendered properly and contains your code, markdown, and code output even though it may not. Please take the time to double check your work. See here for instructions on how to double check this.
You will not receive full credit if your .ipynb
file does not contain all of the information you expect it to, or if it does not render properly in Gradescope. Please ask a TA if you need help with this.