Frequentism vs. Bayesianism

This topic covers the philosophical approaches of frequentism as contrasted with Bayesianism with respect to data science. It assumes no depth in statistics nor philosophy. It then gives examples as to how the two paradigms might be used to solve a problem, and where their respective strengths probably lie.

Today, frequentism is still the most commonly used statistical paradigm across data science, although in the past 30 years there has been a resurgence in Bayesian research and usage. They both have their own strengths, and a data professional should learn both.

Frequentism

Frequentism claims that the chance of something occuring is the relative frequency in many trials. These probabilities can be discovered by running many trials that emulate or match the conditions of the event. For instance, we can discover the frequency at which we see heads appear in a coin flip by flipping our coin 10,000 times, and seeing what percentage of the time we got heads. Maybe it was 5022 times in our first trial run, or 50.22% chance of getting heads. Maybe we run the trial of 10,000 flips again, and get 4997 heads, or 49.97%. Maybe we run 10,000 trials of 10,000 flips, and get a total percentage of 50.002% chance of heads.

Bayesianism

Bayesianism claims that probabilities represent beliefs, and that we can vindicate- or contradict- our beliefs by assessing our beliefs against the data. The typical Bayesian analysis goes:

Develop a prior, that is your prior belief (typically represented as $p(\theta)$)
Assuming our prior is accurate, estimate the output given this prior using our sampling model (typically represented as $ p(y|\theta)$, which can be stated as "the probability of y given theta"). What are the chances that y is actually this value, if we think the prior is this value?
Develop the posterior, that is, your updated ("posterior" as in after) values (typically represented as $ p(\theta | y)$, which can be stated as "the probability of theta given y")

Bayesianism doesn’t tell us what our beliefs should be; it merely acknowledges how our beliefs should change given some new information.

What’s The Big Deal?

There is a fundamental difference in how we learn from our data. Let’s go back to the coin flip example. In frequentism, we would flip the coin n number of times, and get x% heads or tails. But in Bayesianism, we could start off with a prior assuming 50%- that is, we assume that half the coin flips are heads (we don’t have to use 50%- we could assume the coin is unbalanced, 55/45%, for instance). Then we would flip the coin n times, and see x% heads. What are the chances that x% is correct, if we assume that our prior is accurate? For instance, a 50.002% chance of heads seems realistic enough if our prior is 50%. It suggests that, maybe we need to raise our prior a little higher, ever so slightly, to match the data; but given our prior is so close, it might be acceptable to assume our prior is accurate.

The Critiques of Bayesianism

Developing Priors

But how did we pick 50% to be the correct prior? This is one of the primary problems with Bayesianism: the selection of prior can seem arbitrary or even outright misleading. In the case of a coin flip, with 50/50 chances on each side, the prior is apparent; but even there, it assumes we know the coin is not weighted on one side. How do we know that?

Let’s go to an even more difficult situation to pick priors, that of estimating the colors of flowers in a field. Suppose we have a field full of flowers that are too many to pick. We decide we will pick 100 flowers (representing our sample), and count the (exclusive) colors on each flower, so each flower only gets 1 primary color that is the closest color to ROYGBIV, with 100 colors total in our sample. We want to estimate the proportion of the 7 colors on the rainbow; in Bayesianism, we would have to guess this prior distribution of colors from the beginning. But how can we do that? Lets assume they are more or less equal, that is, each color has a 14.28% chance of appearing. But how can we even justify guessing that? We can’t look at the field and guess, because that would be looking at the data. The only way to develop an intuition about the statistical distribution of flower colors here is to pick the flowers and run our test- that is, to skip over the prior selection entirely. But that isn’t Bayesianism; its frequentism.

Bayesian Counter

The Bayesian counter to this argument hinges on selecting a mostly arbitrary prior from the beginning. Bayesians would say, start at the arbitrary point, look at a sample, and if the posterior is telling you to update, follow its directions. There are many methods to achieve this sort of updating that are beyond the scope of this article, but they typically entail employing sampling techniques and MCMC (Markov Chain Monte Carlo) methods. The basic idea is that we mostly have an uninformed guess from the beginning, but then at every batch of 100 flower colors, we can tweak our priors a little bit more. The process would look something like:

Arbitrary starting prior → look at 100 flower colors → develop posterior → update prior → look at 100 more flower colors → develop posterior → update prior → look at 100 more flower colors → develop posterior → update prior → rinse → repeat

Priors That Don’t Make Sense Are Both Irrational And Possible

Suppose an oil company scout is given a map of many square miles of land that the company has purchased and trying to find the best locations to drill for oil. He has it chunked into a grid, and has to develop a prior that represents the chance of finding oil on the map. He could in theory have a prior that is 0, namely, a 0% chance that there is any oil on this map. Of course, in a Bayesian framework this is possible; but realistically, would the scout be working at this company if he thought the company was so poor at buying land that literally none of the land on his map had oil? If they keep buying land with no oil he would soon be out of a job! It’s highly unlikely that he would ever choose a prior of 0% here, but it’s possible to do. Priors can be both highly irrational and possible.

Bad/Old Data Justifies Beliefs

The selection of what data to use can confirm our arbitrary priors. We use outdated, selective or otherwise confirming data to ensure that our data matches our expectations. So not only are we picking arbitrary priors, but we are updating those priors to confirm our beliefs. What is the point of analysis if we are merely confirming our beliefs? It might be therapeutic, but to call it analysis is misleading.

The Critiques of Frequentism

Beliefs Are Inherent In Any Analysis

Bayesians would say that frequentists make assumptions too, but instead of declaring those assumptions "up front"- that is, in the form of a prior- they hide them in the structure of their analysis. Returning back to the coin flip example, why 10,000 flips? Why 10,000 trials of 10,000 flips? Where does the data say "I require 10,000 trials of 10,000 flips to understand my statistical chances accurately"?

What Size n Is Large Enough To Declare A Probability?

Will 10,000 trials be enough? 10 million? 10 trillion? Which one is the right one, if they come away with very different results?

You Aren’t Discovering The Capital T Truth

This argument is less frequent and entails an epistomological (roughly, how we know what we know) argument, but it claims that frequentists think they are discovering the capital T (objective) Truth, when really they are either not sure of what Truth they are discovering, or any thing that they discover is really just a belief and has no formal contact with the Truth. The full outlines of this line of inquiry are too much for this article, but Bayesians suggest that they are honest in their attempt to update their beliefs, rather than discover the Truth.

What Does All This Mean For Data Professionals?

The Historical Context

Historically, frequentism has been both researched and used in practice far more than Bayesian statistical techniques, especially in the form of regression which is one of the most well known frequentist tools (although it can be used in a Bayesian context as well), although both share a roughly similar timeline of a few hundred years of development since their origination.

Strengths of Frequentism

Frequentism benefits from large datasets, many trials, easily emulatable conditions (to match the property or process we are trying to infer), and studying general patterns (such as regression lines of a dataset).

Strengths of Bayesianism

Bayesian analysis benefits from smaller datasets (relying on priors to overcome the lack of data), flexibility of analysis (since we are working with beliefs, not ground Truths), updating distributions instead of point estimates, and can accommodate patterns which are hard to generalize.