cut

cut breaks a vector x into factors specified by the argument breaks. cut is particularly useful to break Date data into categories like "Q1", "Q2", or 1998, 1999, 2000, etc.

You can find more useful information by running ?cut.POSIXt.

Examples

How can I create a new column in a data.frame df that is a factor based on the year?

Click to see solution
df$year <- cut(df$times, breaks="year")
str(df)
'data.frame':    24 obs. of  3 variables:
 $ times: POSIXct, format: "2020-06-01 06:00:00" "2020-07-01 06:00:00" ...
 $ value: int  48 62 55 4 83 77 5 53 68 46 ...
 $ year : Factor w/ 3 levels "2020-01-01","2021-01-01",..: 1 1 1 1 1 1 1 2 2 2 ...

How can I create a new column in a data.frame df that is a factor based on the quarter?

Click to see solution
df$quarter <- cut(df$times, breaks="quarter")
str(df)
'data.frame':    24 obs. of  4 variables:
 $ times  : POSIXct, format: "2020-06-01 06:00:00" "2020-07-01 06:00:00" ...
 $ value  : int  48 62 55 4 83 77 5 53 68 46 ...
 $ year   : Factor w/ 3 levels "2020-01-01","2021-01-01",..: 1 1 1 1 1 1 1 2 2 2 ...
 $ quarter: Factor w/ 9 levels "2020-04-01","2020-07-01",..: 1 2 2 2 3 3 3 4 4 4 ...

How can I create a new column in a data.frame df that is a factor based on every 2 weeks?

Click to see solution
df$biweekly <- cut(df$times, breaks="2 weeks")

For an example with the 7581 data set:

myDF <- read.csv("/depot/datamine/data/fars/7581.csv")

These are the values of the HOUR column:

table(myDF$HOUR)
    0     1     2     3     4     5     6     7     8     9    10    11    12    13    14    15
17704 18671 17262  9908  6438  5463  6749  7088  6308  6275  7311  8401  8929  9872 12066 14138

We can break these values into 6-hour intervals:

table( cut(myDF$HOUR, breaks=c(0,6,12,18,24,99), include.lowest=T) )
[0,6]  (6,12] (12,18] (18,24] (24,99]
82195   44312   85388   86567    1597

We can then find the total number of PERSONS who are involved in accidents during each 6-hour interval

tapply( myDF$PERSONS, cut(myDF$HOUR, breaks=c(0,6,12,18,24,99), include.lowest=T), sum )
[0,6]  (6,12] (12,18] (18,24] (24,99]
187397  119261  238193  230289    2269