mean is a function that calculates the average of a vector of values.

You will often find yourself using the na.rm argument, short for NA value removal. Most real-life data will contain missing values somewhere, and na.rm = TRUE will automatically remove those values from consideration during a function call or computation. na.rm = FALSE is the default, so make sure to include na.rm = TRUE if you’re unsure of your data’s composition.

As mentioned here, NA indicates a missing datum/value, while NaN indicates an impossible number (division by zero). We’ll show in Example 2 that na.rm does not care about the difference.


How do I get the average of a vector of values?

Click to see solution
[1] 2.5

How do I get the average of the values in a vector when some of the values are: NA, NaN? What happens if I want to include those values?

Click to see solution

First, we show the implication of not including na.rm = TRUE:

[1] NaN

That’s obviously not what we want. We would only ever want na.rm = F if we were checking for null values being present in the data.

Now, the rest of the examples, executed properly:

mean(c(1,2,3,NaN), na.rm=TRUE)
[1] 2
mean(c(1,2,3,NA), na.rm=TRUE)
[1] 2
mean(c(1,2,NA,NaN,4), na.rm=TRUE)
[1] 2.333333