evanometrica: Bayes Area

tl;dr: Bayesian Basketball Dashboard here.

One of the themes of my career to this point, from my doctoral research to being a data scientist in industry, to writing this very blog, is the interaction between substantive expertise and quantitative analysis. In some disciplines such as scientific research, these areas two domains are inextricably linked; a scientist with domain expertise proposes a model or hypothesis for how some phenomena works, and then uses data to confirm or reject the hypothesis. In other disciplines, there has been a tension between the two. For example, Nate Silver describes a conflict between traditional political pundits and his form of data journalism.

If you know me, its no surprise that I care deeply about data. I even feel silly writing that sentence. It seems obvious. Ever since I wrote a senior thesis in college where I analyzed auction prices for sulfur dioxide permits, I have loved getting my hands on data and learning from it. But it also seems like an empty sentence; data is everywhere and people use it in countless ways. To say that you care about data 2017 feels like saying a fish cares about water.

What I think others would find a little surprising is my willingness to overlook or go beyond data and trust human expertise. Just because I have numbers stored somewhere doesn't mean I have evidence. Obviously, data can be misleading or biased in someway. But what I believe is that in the absence of good data, people can be (in the right contexts) very good at integrating various pieces of qualitative and quantitative information and forming judgements.

I enjoy reading very much, particularly non-fiction and history. I've built up much of my expertise from words on a page, not just numbers in an algorithm, Narratives provide lessons, that I (and others) can use to create a mental models of the world. Each piece of information in a history book is a piece of data. This data can help me think about predictions, and identify causal relationships. But each piece of information may be too unique to simply drop into a statistical algorithm.

Now, humans are not perfect at forming judgements; we fall victim to many heuristics that lead us astray. But instead of interpreting our fallibility as a statement that human judgement should be ignored or cast aside for better data, I think of it as an opportunity to build tools and algorithms that can overcome those heuristics. My PhD work focused on tools for a process called "deliberation with analysis" where stakeholders with different viewpoints could come together, and use quantitative analysis to explore across disparate beliefs to come to consensus. As a data scientist, I have taken particular pleasure in going the other direction, and building features for models based on learning from experts with years of work experience within the company.

The Bayes Area

The thesis I lay out above has led to my mild obsession with a statistical theorum that describes the process of combining of human judgement with data: Bayes' Law. No joke, I have gotten so much leverage— from my disertation, to tech talks, to interviews, by being able to write the formula down and go from there.* It's pretty simple; it describes how to combine some probabilistic belief (called a prior) with some new data to create an updated probability (called a posterior).

What I have always found interesting is that Bayes' Law is not very prescriptive about where the prior belief comes from. Statisticians have devised all sorts of complex Bayesian models, where the law is chained together multiple times (a hierarchical model) or into complex networks. Almost always there is some initial prior which is in some shape or form defined by human judgement (or goes through great lengths to avoid it).

And for the past four years, But I am always interesting in examining a couple of questions relating to this prior. Where can we draw priors from? How can we get non-technical experts to provide information in the form of priors? How can you make decisions when multiple parties have different priors? When is there enough new information to overcome differences in-priors?

Over Christmas break, I built a tool that would let me explore this concept in one of my other obsessions: the Golden State Warriors.

Before I dive into the tool, I want to talk about why I like this example. Sports are a sensible area to consider the relationship between data and human judgement. Sports generate tons lots of data, from the simple box scores that used to show up in newspapers to the advanced metrics coming from tracking where balls land on a court. However, there are also true subject matter experts: professionals who have experience playing or coaching in the games, writers who accumulate evidence from observation and interviews, data nerds inventing advanced analytics, and fans who spend hours watching games and following the evolution of a team.

Despite the perpondernce of data, there are also truly new things that have never been seen before. Steph Curry and Kevin Durant are two of the best basketball players, and each is a unique player. Until this year, they never played a basketball season together. Statisticians and analysts alike may be able to find historical comparisons, but it is never exact. Whether through data or domain expertise, people are generating models of how the two players will interact with each other and the rest of team, and what that will mean against their opponents.

In sports, general managers spend time in the off-season constructing teams by trading players, signing free agents, and picking rookies in the draft. Coaches figure out how those combinations of players will work together. Writers and pundits make guesses and predictions in their columns and on TV. Gamblers make futures bets, about how many games the team will win over the course of the season or who will make the playoffs. Then the season begins and information starts flowing in. But it isn't until the season is over that anyone really knows if the guesses made in the offseason were right or wrong.

Bayes' Law lets me think about how the new information affects my understanding, based on my off-season guesses. I can look at how that information would effect multiple different prior understandings. When can a coach and statistician, with different priors and different apporaches, agree that a team is going to be really good or really bad?

This may sound trivial, but I think it actually applies across many areas of modern society. In my dissertation, I considered how future observations of climate may allow planners with disparate beliefs about climate change today to reach consensus, and how they can structure policy that allows for decisions to be made once the evidence accumulates. I think these concepts also apply to how members of jury, with their own life experience and their own prejudices, process the information in a trial to reach a verdict.

The Model

Last year, as the Warriors kicked off their win-streak, I explained a simple beta-binomial model for considering how a team did. The basic logic follows:

A beta-distribution describes the probability that a random variable takes a specific value between zero an one. So you can imagine each teams end-of-season win percentage is that random variable.
Each individual may have a prior for the teams winning percentage. That prior can be described as expectation for that random variable and a level of confidence in that expectation (i.e. what is the probability that they are a 50% win team, and what is the probability that they are a 70% team).
As wins and losses are observed, the beta-distribution updates generating a new probability that a team ends with any given win percentage.
With a little algebra, the model translates that win-percentage into a probability of achieving any number of wins by the end of season.

Whats really interesting to me is how the model updates with different priors. My static analyses last year made it hard to look explore across priors. I figured some interactive graphics would allow users to consider the impacts of different priors on current information. I also added a database that automatically updates, so that you look at this for any team and get the most up to date information.

You can check out the model here. The interface definitely shows that I am not a front end engineer, and there is probably room for improvement There are also lots of the follow up analyses I can do, that will form the basis for follow-up posts.**

The Update Prior tab shows of the basic form of this model.

It asks the user to set 3 things

A range for what percentile team in the league they will be. Its pre-set to show teams that are in the top 90% of the league (the top 2 or 3 teams) each year. If you wanted to be entirely agnostic, you could change the range (by moving the slider on top) to 0, and it would show all teams. You can adjust this with the top two slider.
The team you currently are interested in evaluating (I set it to the Warriors, of course)
Describe how strongly you feel about your prior. In the beta-binomial model, you describe this in terms that are relatively interpretable: how many games of information is this prior worth. If you set this number to 10, it is equivalent to saying after ten games, your prior and your new information are equally weighted.

Then I show you two graphs:

On the left, I present your priors and updated posterior.

The bars historical distribution of win percentages for teams in the percentiles you chose.
The pink line is the beta-distribution that is originally fit to this data.
The darker red line is the beta-distribution, but rescaled to level of confidence you have. Notice the curve gets tights for larger numbers and wider for smaller numbers
The green line is the posterior probability that the team has any win percentage, based on the current record (printed above)

On the right, you can see that posterior as a prediction for number of wins. Mouse over any dot, and you will see the number wins and the likelihood of the team having at least that many wins.

Feel free to play with any of the sliders and drop downs. This will let you explore how different priors about the team (in terms of how good they were historically, or your relative strength in that prior).

If you're like me you are going to want to play with this a lot. So, the Compare Parameters tab is the same thing, just shown 3 times. This should let you see what happens and compare results directly when you start changing things. What if you have a different prior? What if you are interested in a different team? Well, compare them side by side on this tab.

The Explore Across Parameters tab presents the same model, but with a slightly different approach. Each dot this graph represents a different prior. The win percentage is on the X-axis, and the relative strength of the prior is on the Y-axis. Choose a team and a set number of wins (using the menu at the top), and the colors of each dot will tell you the probability that the team wins that many games.

I love this plot, because this is the place where we can see if there is consensus among disparate beliefs. Lets says there was a wager on the table that the Warriors would in 70 games. And lets say if we won the bet, we would win $1000 but if we lost we would $1000. This bet is little too rich for my blood, but I can pony up $500. If I can find one other person who is also willing to pony up $500, we should take the bet of we believe their is a 50% chance or greater Warriors would win. Well, we could use this plot to solve this, by putting each of priors on the plot and making seeing if we are bot in the green region. From the looks of it, if we thought the warriors were an 80% win team or better (usually 65 win team) before the season, we might be willing to agree.

I can also use this to frame my decision. Even as I followed the Warriors all summer, I would be heard pressed to tell tell you if I thought there were and 80% or 85% win team, and if I thought my prior was worth 5 or 20 games of information. But, those distinctions may only matter in the context of a specific bet.

If I thought the Warriors were anything less than 60% win team, I would definitely turn down the bet described above. This is a threshold value that tells me whether I can ignore the bet or should seek to further clarify my assessment. And if I thought they were a 90% win team I should definitely take the bet. I have some clear bounds on what I would need to have believed. If I am inside those boundaries, I may be interested in the bet, and I can do some reading, watch a lot of First-take on ESPN, or do some light-weight analysis to further clarify my thoughts.

I know this situation sounds a bit contrived. In sports betting, people are usually challenging one another and going head to head on their bets. But, if you listen to Bill Simmons and Cousin Sal (my best source of information on sports betting***), they talk constantly about bets they go in on together. Even though they don't really use the language of Bayesian analysis, they are frequently describing different priors and what information they would use to update it.

The process of seeing if multiple parties agree on priors and waiting to observe new information, is similar how many decisions in life and business are made. I have described about public policy contexts where stakeholders of government coallitions need to agree. Venture capital is another, where there are multiple rounds of funding, with multiple parties joining together in funding rounds and previous investors deciding whether the have seen enough evidence to reinvest.

If you want to go down this rabbit-hole with me, then welcome to the Bayes' Area.

* I once had a two hour argument with my sister about whether my great-grandmother was colorblind. The only information I had was Bayes' Law and the fact that I am colorblind. My sister is getting a Ph.D. in genetics.
** I know validating the model is a big one. Its a little complicated in this instance, but I look forward to showing that off
*** My second best is my old barber in L.A, Don.

evanometrica

Monday, February 20, 2017

Bayes Area

The Bayes Area

The Model

No comments:

Post a Comment

Blog Archive

About Me