Superstore Sales
Description of the Data
There is a version of the Superstore Sales dataset at 'anvil/projects/tdm/data/sales/Superstore_modified.csv' that contains an 'Order Status' column, which was made by randomly selecting rows to have a status of "Pending" or "Shipped":
order_ids = myDF["Order ID"].unique()
unshipped_orders = np.random.choice(
order_ids,
size=int(0.15 * len(order_ids)),
replace=False
)
myDF.loc[
myDF["Order ID"].isin(unshipped_orders),
["Ship Date", "Ship Mode"]
] = np.nan
myDF["Order Status"] = np.where(
myDF["Ship Date"].isna(),
"Pending",
"Shipped"
)