cut
cut
breaks a vector x into factors specified by the argument breaks
. cut
is particularly useful to break Date data into categories like "Q1", "Q2", or 1998, 1999, 2000, etc.
You can find more useful information by running ?cut.POSIXt
.
Examples
How can I create a new column in a data.frame df
that is a factor based on the year?
Click to see solution
df$year <- cut(df$times, breaks="year")
str(df)
'data.frame': 24 obs. of 3 variables:
$ times: POSIXct, format: "2020-06-01 06:00:00" "2020-07-01 06:00:00" ...
$ value: int 48 62 55 4 83 77 5 53 68 46 ...
$ year : Factor w/ 3 levels "2020-01-01","2021-01-01",..: 1 1 1 1 1 1 1 2 2 2 ...
How can I create a new column in a data.frame df
that is a factor based on the quarter?
Click to see solution
df$quarter <- cut(df$times, breaks="quarter")
str(df)
'data.frame': 24 obs. of 4 variables:
$ times : POSIXct, format: "2020-06-01 06:00:00" "2020-07-01 06:00:00" ...
$ value : int 48 62 55 4 83 77 5 53 68 46 ...
$ year : Factor w/ 3 levels "2020-01-01","2021-01-01",..: 1 1 1 1 1 1 1 2 2 2 ...
$ quarter: Factor w/ 9 levels "2020-04-01","2020-07-01",..: 1 2 2 2 3 3 3 4 4 4 ...
How can I create a new column in a data.frame df
that is a factor based on every 2 weeks?
Click to see solution
df$biweekly <- cut(df$times, breaks="2 weeks")
For an example with the 7581 data set:
myDF <- read.csv("/depot/datamine/data/fars/7581.csv")
These are the values of the HOUR
column:
table(myDF$HOUR)
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17704 18671 17262 9908 6438 5463 6749 7088 6308 6275 7311 8401 8929 9872 12066 14138
We can break these values into 6-hour intervals:
table( cut(myDF$HOUR, breaks=c(0,6,12,18,24,99), include.lowest=T) )
[0,6] (6,12] (12,18] (18,24] (24,99] 82195 44312 85388 86567 1597
We can then find the total number of PERSONS
who are involved in accidents during each 6-hour interval
tapply( myDF$PERSONS, cut(myDF$HOUR, breaks=c(0,6,12,18,24,99), include.lowest=T), sum )
[0,6] (6,12] (12,18] (18,24] (24,99] 187397 119261 238193 230289 2269