R base functions

split

split is a function with arguments x, a vector or data.frame, and f, a factor vector that will divide the data into smaller groups.

A useful optional argument is drop, which indicates whether or not values not in a group should be removed. The default is drop = FALSE.

Examples

Read the first 10 lines of /anvil/projects/tdm/data/movies_and_tv/imdb2024/basics.tsv/. Using the strsplit function, we can find out how many times each of the individual genres occur.

Click to see solution
myDF <- fread("/anvil/projects/tdm/data/movies_and_tv/imdb2024/basics.tsv", nrows = 10)

myDF$genres

strsplit(myDF$genres, ',')

unlist(strsplit(myDF$genres, ','))

table(unlist(strsplit(myDF$genres, ',')))
'Documentary,Short''Animation,Short''Animation,Comedy,Romance''Animation,Short''Comedy,Short''Short''Short,Sport''Documentary,Short''Romance''Documentary,Short'

'Documentary''Short'
'Animation''Short'
'Animation''Comedy''Romance'
'Animation''Short'
'Comedy''Short'
'Short'
'Short''Sport'
'Documentary''Short'
'Romance'
'Documentary''Short'

'Documentary''Short''Animation''Short''Animation''Comedy''Romance''Animation''Short''Comedy''Short''Short''Short''Sport''Documentary''Short''Romance''Documentary''Short'

  Animation      Comedy Documentary     Romance       Short       Sport
          3           2           3           2           8           1

Using movies_and_tv/imdb2024/basics.tsv, for each of the genres, list how many times it occurs.

Click to see solution
genres <- fread("/anvil/projects/tdm/data/movies_and_tv/imdb2024/basics.tsv", select = "genres", col.names = "genres")

sort(table(unlist(strsplit(genres$genres, ","))), decreasing = TRUE)
       Drama      Comedy   Talk-Show       Short Documentary        News
    3151064     2181847     1372500     1191319     1062294     1051399
    Romance      Family  Reality-TV   Animation      Action       Crime
    1045327      824607      624854      556566      462531      459412
  Adventure   Game-Show       Music       Adult       Sport     Fantasy
     425130      424919      418888      353525      271872      234269
    Mystery      Horror    Thriller     History   Biography      Sci-Fi
     225390      202434      184618      165528      119759      117541
    Musical         War     Western   Film-Noir
      92140       38662       30931         873