# Variables

## Overview

A variable is a named reference to a value. Variables have types that define what type of data they hold.

In R, there 5 basic variable types: character, integer, double, logical, and complex. Additionally, you might run into the structures raw, list, closure, special, builtin, environment, and S4.

## Basic Variables

### Character

In R, a string is technically just a vector of characters. Characters start and end with double quotes (`"`), or single quotes (`'`). Strings aren’t as heavily utilized in R, as its functionality focuses more on statistical analysis.

``````typeof('my_character_vector')
typeof("my_character_vector")``````
```'character'
'character'```

### Integer

An integer is a numeric value without a fractional part (no decimal). In many dynamic languages, when you create a variable to hold a number without a decimal point, it will be an integer. For example, when using Python, the line `variable = 5` will result in `variable` having type integer, while the line `variable = 5.0` will result in `variable` having type float (Python’s version of a decimal number).

In R, however, this is not the case. In R, 5 and 5.0 will both be stored as doubles unless explicitly set as integers using `as.integer()`.

``````my_variable <- 5
typeof(my_variable)

my_variable <- 5.0
typeof(my_variable)``````
```'double'
'double'```
``````my_variable <- as.integer(5)
typeof(my_variable)``````
`'integer'`

A key exception to this rule is when creating a vector of integers.

``````test <- 1:5
typeof(test)
test <- 1.2:4
typeof(test)``````
```'integer'
'double'```

### Double

A double is a numeric value with a fractional part. As shown above, R will cast all numeric inputs to doubles, and you can verify this using `typeof`. To try and convert a variable to a double, you can use the `as.double()` function — notably, `as.double()` will convert a character vector to a double vector if possible.

``````char_to_double <- as.double("4.0")
typeof(char_to_double)``````
`'double'`

### Logical

Logical constants refer to the boolean values of `TRUE`, `FALSE`, and the bonus `NA`.

 You might see `T` and `F` used in place of `TRUE` and `FALSE`, which is not incorrect, but the difference is that `T` and `F` are variables given by R that can be changed, while `TRUE` and `FALSE` are constants and cannot be changed. If you save some other value to `T`/`F`, they will no longer correspond to `TRUE`/`FALSE`.

Logical values are critical in R. They are constantly used when indexing vectors and `data.frames`. Remember that the result of a comparison using logical operators is a vector of logical values. For example, the following code will return all values in our vector that are greater than 5.

``````vec <- 1:10
vec[vec > 5]``````

The inner statement `vec > 5` will only give you a logical vector, indicating `TRUE` or `FALSE` depending on the statement’s truth at each index:

``````vec <- 1:10
vec > 5``````
`[1] FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE`

So, `vec[vec > 5]` is really equivalent to the following.

``vec[c(FALSE,FALSE,FALSE,FALSE,FALSE,TRUE,TRUE,TRUE,TRUE,TRUE)]``

What if we wanted to count the number of even values in our vector? One may consider the following code:

``````vec <- 1:10
length(vec[vec %% 2 == 0])``````
`5`

However, a more succint version of this code is as follows.

``````vec <- 1:10
vec[vec %% 2 == 0]``````
`[1]  2  4  6  8 10`

As you can see, because R coerces the logical values to be 1 for `TRUE` and 0 for `FALSE`, we can easily discern the even values in our vector.

## Other Structures

### Complex

While it is less likely that you will need to use complex numbers in R, they have first-class support. Complex numbers are a pair of real numbers, the real and imaginary parts of which are separated by a comma. For example, the following code shows a few ways to create a complex number.

``````0i ^ (-3:3)
1i^2``````

R (and other programming languages) consider any variables that start with a number to be a number, unless surrounded by quotes. As such, inputting `1i` tells the program that `i` is an imaginary number.

Be careful, however, as code that you may expect to produce a complex number might not do that:

``sqrt(-1)``
```Warning message:
In sqrt(-1) : NaNs produced```

## Coercion

Coercion is the process of changing the type of a variable, either explicitly by using a special function or implicitly by performing an operation on a variable of one type, when the operation was meant for another type. The following is an example of coercion:

``typeof(paste("test", 5.0))``
`'character'`

Here, 5.0 is a double, and "test" is a character vector. `paste` is a function expecting character vector(s) as input, and returns the concatenation of the input vectors. We instead passed a character vector and a double, so R intelligently coerced the double to be a character so the operation will completed successfully.

In general, R will coerce types from more to less specific. In the above example, the coercion of 5.0 makes sense — it’s easy to consider 5.0 as the string "5.0", while it’s hard to turn "test" into a double. Another example is the following:

``````my_integer <- as.integer(5)
my_double <- 7.0
my_result <- my_integer + my_double
typeof(my_result)``````
`'double'`

This logic is important for preventing the loss of data — the number 5.12345 cannot be stored as an integer without losing information.

## Factors

A factor is R’s way of representing a categorical variable. There are elements in a factor (just like there are elements in a vector), but they are constrained to only be chosen from a specific set of values, called "levels". They are useful when a vector has only a few different values — "Male"/"Female" or "A"/"B"/"C".

There is the `factor()` function that is used to cast variables as factors, the `is.factor()` function to test if a variable is a factor, and the `levels()` function to list all of the factors for a variable.

### Examples

#### How do I test whether or not a vector is a factor?

Click to see solution
``````test_factor <- factor("Male")
is.factor(test_factor)``````
`[1] TRUE`

#### List the levels we have in `vec`.

Click to see solution
``````vec <- factor(c("Male", "Female", "Female"))
levels(vec)``````
`[1] "Female" "Male"`

#### How can I rename the levels of a factor?

Click to see solution
``````vec <- factor(c("Male", "Female", "Female"))
levels(vec)``````
`[1] "Female" "Male"`
``````levels(vec) <- c("F", "M")
vec``````
```[1] M F F
Levels: F M```
``````# be careful! Order matters, this is wrong:
vec <- factor(c("Male", "Female", "Female"))
levels(vec)``````
`[1] "Female" "Male"`
``````# here we incorrectly rename "Female"'s to "M" instead of "F"
levels(vec) <- c("M", "F")
vec``````
```[1] F M M
Levels: M F```

#### How can I find the number of levels of a factor?

Click to see solution
``````vec <- factor(c("Male", "Female", "Female"))
nlevels(vec)``````
`[1] 2`

## Dates

`Date` is a class which allows you to perform special operations like subtraction, where the number of days between dates are returned. Or addition, where you can add 30 to a Date and a Date is returned where the value is 30 days in the future.

You will usually need to specify the "format" argument based on the format of your date strings.

For example, if you had a string "07/05/1990", the format would be: `%m/%d/%Y`, where `%m` matches a zero-padded month value, `/’s match literal `/’s, `%d` matches a zero-padded day value, and `%Y` matches a 4 digit year in the format YYYY. If your string was `31-12-90`, the format string would be `%d-%m-%y`. Replace %d, %m, %Y, and %y according to your date strings. A full list of formats can be found here.

Working with dates can be difficult and confusing. See here for more information about a package called `lubridate` which provides a much easier interface to working with dates.

### Examples

#### How do I convert a string "07/05/1990" to a `Date`?

Click to see solution
``````my_string <- "07/05/1990"
my_date <- as.Date(my_string, format="%m/%d/%Y")
my_date``````
`[1] "1990-07-05"`

#### How do I convert a string "31-12-1990" to a `Date`?

Click to see solution
``````my_string <- "31-12-1990"
my_date <- as.Date(my_string, format="%d-%m-%Y")
my_date``````
`[1] "1990-12-31"`

#### How do I convert a string "12-31-1990" to a `Date`?

Click to see solution
``````my_string <- "12-31-1990"
my_date <- as.Date(my_string, format="%m-%d-%Y")
my_date``````
`[1] "1990-12-31"`

#### How do I convert a string "31121990" to a `Date`?

Click to see solution
``````my_string <- "31121990"
my_date <- as.Date(my_string, format="%d%m%Y")
my_date``````
`[1] "1990-12-31"`

## `NA`, `NaN`, and `NULL`

`NA`

`NA` stands for not available. In general, this represents a missing value or a lack of data. Technically, `NA` is a logical value. You can test this with the following code.

``class(NA)``
`NaN`

`NaN` stands for not a number. This is a special value that is used to indicate that there is a result, it just cannot be represented as a number (for example the result of 0/0). Technically, `NaN` is a double value. You can test this with the following code.

``class(NaN)``
`NULL`

If you have an understanding of `NULL` from other programming languages, you can carry it over to R. Otherwise, it is safe to think of `NULL` as something that is neither `TRUE` nor `FALSE`. Technically, `NULL` is its own thing. It is not a logical value, double value, etc. `NULL` is commonly used to represent an empty object or something that exists but isn’t really defined. When trying to distinguish between `NA` and `NULL`, think of `NA` as a missing value, and `NULL` as an undefined value.

### Examples

#### How do I tell if a value is `NA`?

Click to see solution
``````# test if a value is NA.
value <- NA
is.na(value)``````
`[1] TRUE`
``````# does is.nan return TRUE for NA?
is.nan(value)``````
`[1] FALSE`

#### How do I tell if a value is `NaN`?

Click to see solution
``````# test if a value is NaN.
value <- NaN
is.nan(value)``````
`[1] TRUE`
``````value <- 0/0
is.nan(value)``````
`[1] TRUE`
``````# does is.na return TRUE for NaN?
is.na(value)``````
`[1] TRUE`

#### How do I tell if a value is `NULL`?

Click to see solution
``````# test if a value is NULL.
value <- NULL
is.null(value)``````
`[1] TRUE`
``class(value)``
`[1] "NULL"`
``````# does is.na return TRUE for NULL?
is.na(value)``````
`logical(0)  # no`

### Resources

A good writeup on the differences between `NA` and `NULL`.