R base
functions
subset
subset
is a function that helps you take subsets of data. By default, subset
removes NA rows.
subset does not perform any operation that can’t be accomplished by indexing.
|
Examples
Using the 5000_transactions csv file, create a smaller data set called refundsDF that contains only the lines of data for which the SPEND column is negative.
Click to see solution
myDF <- read.csv("/anvil/projects/tdm/data/8451/The_Complete_Journey_2_Master/5000_transactions.csv")
refundsDF <- subset(myDF, SPEND < 0)
head(refundsDF)
BASKET_NUM HSHD_NUM PURCHASE_ PRODUCT_NUM SPEND UNITS STORE_R WEEK_NUM YEAR <dbl> <dbl> <chr> <dbl> <dbl> <int> <chr> <int> <int> 1 24 1809 03-JAN-16 5817389 -1.50 -1 SOUTH 1 2016 2 24 1809 03-JAN-16 5829886 -1.50 -1 SOUTH 1 2016 93 4955 2570 06-JAN-16 5391980 -0.38 1 WEST 1 2016 355 28557 3153 22-JAN-16 5184651 -3.42 9 CENTRAL 3 2016 762 62654 4172 15-FEB-16 300529 -0.71 1 EAST 7 2016 2226 163292 3452 28-APR-16 899378 -0.61 1 SOUTH 17 2016
Using the 5000_transactions csv file, chow the number of refunds for each STORE_R value, just using indexing, in other words, without using the subset function.
Click to see solution
storeDF <- read.csv("/anvil/projects/tdm/data/8451/The_Complete_Journey_2_Master/5000_transactions.csv")
table(storeDF$STORE[(storeDF$SPEND < 0)])
CENTRAL EAST SOUTH WEST 2750 3269 2675 3952
table
table
is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table
is a function that accepts table
output, returning proportions of the counts.
Examples
Using the refundsDF subset from above, make a table of the STORE_R values in this refundsDF subset, and show the number of times that each STORE_R value appears in the refundsDF subset.
Click to see solution
regionTable <- table(refundsDF$STORE_R)
print(regionTable)
CENTRAL EAST SOUTH WEST 2750 3269 2675 3952