Choosing a Model

Here, we overview ways to think about choosing a modeling technique.

These considerations are optional. Maybe there is only one or two which you are absolutely sure of in your problem; other times you might feel like there are many ways you could approach the problem. Going through each of these ways can at least illuminate your uncertainty; sometimes its helpful to know what we don’t know.

Flexibility vs. Interpretability

There is a tradeoff between highly flexible models and highly interpretable models. We showcase that, as models become more complex and able to accomodate our convoluted datasets, it often becomes difficult to try to discern just exactly how it is able to make the predictions or inferences it does- leading to the well known conception of a "black box", a model which works but we just don’t know how.

Classification vs. Regression

Quantitative data, like someone’s age, birth year, or income is considered regression data. Qualitative data, like what kind of eye color someone has, disease status, or voting pattern would be classification data.

Prediction vs. Inference

Sometimes, we want to make predictions, like if an image is a dog or a cat; other times, we want to infer the properties of something, like consumer shopping patterns. Here we explore the contrast between prediction and inference.

Supervision

You saw in the Starter Guide on the meaning of f(x) in data science that there is some associated output $Y$. Well that’s true…​if you are doing supervised learning. Here we explore supervised vs. unsupervised approaches.

Parameterization

Parameterization is when we assign paramaters to $\hat{f}$ and use them to build our function. There are techniques which don’t use parameterization, such as splines, and we explore when and why to parameterize.