R base functions

subset

subset is a function that helps you take subsets of data. By default, subset removes NA rows.

subset does not perform any operation that can’t be accomplished by indexing.

Examples

Using the 5000_transactions csv file, create a smaller data set called refundsDF that contains only the lines of data for which the SPEND column is negative.

Click to see solution
myDF <- read.csv("/anvil/projects/tdm/data/8451/The_Complete_Journey_2_Master/5000_transactions.csv")

refundsDF <- subset(myDF, SPEND < 0)

head(refundsDF)
BASKET_NUM	HSHD_NUM	PURCHASE_	PRODUCT_NUM	SPEND	UNITS	STORE_R	WEEK_NUM	YEAR
	<dbl>	<dbl>	<chr>	<dbl>	<dbl>	<int>	<chr>	<int>	<int>
1	24	1809	03-JAN-16	5817389	-1.50	-1	SOUTH 	1	2016
2	24	1809	03-JAN-16	5829886	-1.50	-1	SOUTH 	1	2016
93	4955	2570	06-JAN-16	5391980	-0.38	1	WEST 	1	2016
355	28557	3153	22-JAN-16	5184651	-3.42	9	CENTRAL	3	2016
762	62654	4172	15-FEB-16	300529	-0.71	1	EAST 	7	2016
2226	163292	3452	28-APR-16	899378	-0.61	1	SOUTH 	17	2016

Using the 5000_transactions csv file, chow the number of refunds for each STORE_R value, just using indexing, in other words, without using the subset function.

Click to see solution
storeDF <- read.csv("/anvil/projects/tdm/data/8451/The_Complete_Journey_2_Master/5000_transactions.csv")

table(storeDF$STORE[(storeDF$SPEND < 0)])
CENTRAL EAST    SOUTH   WEST
   2750    3269    2675    3952

table

table is a function used to build a contingency table, which is a table that shows counts for categorical data, from one or more categories. prop.table is a function that accepts table output, returning proportions of the counts.

Examples

Using the refundsDF subset from above, make a table of the STORE_R values in this refundsDF subset, and show the number of times that each STORE_R value appears in the refundsDF subset.

Click to see solution
regionTable <- table(refundsDF$STORE_R)

print(regionTable)
CENTRAL EAST    SOUTH   WEST
   2750    3269    2675    3952