Classification vs. Regression

Introduction

Here we identify a common way to organize problems: that of regression vs. classification problems.

Classification

Classification typically outputs qualitative responses, like disease status, eye color, type of car they own, etc. Often, we want to "classify" things into one group or another.

Regression

Regression typically outputs quantitative responses, such as income, age, or height.

Importance Of The Output Type, Not The Input Type

You will notice that these types differ by their responses, not so much their inputs. That is, what matters here is thinking about the output: we want to determine whether we want qualitative or quantitative output.

Challenges With Distinguishing Between The Two

The lines between these distinctions can get blurry, often because converting between them is sometimes relatively easy. For instance, we can easily encode qualitative information like eye color into a number: 1 for blue, 2 for green, etc. This tends to matter most with models that are fairly inflexible.

Additionally, regression in this context doesn’t always mean the type of modeling technique. For instance, logistic regression is often a binary (or qualitative) output, whereas least squares regression is quantitative. Both are regression modeling techniques, but they differ in this context of classification. For this reason, sometimes the terms "quantitative" or "qualitative" are used to be clear here instead of "regression" or "classification".