Sunday, July 31, 2016

Research Matters

One of my first blog posts was about the fantastic book Between the World and Me, by Ta-Nehisi Coates. At that time, I said I was looking forward to writing about some academic research about racism and the use of force by police officers. As with many things in the blog, it took a while. In this case, it wasn't for lack of trying. Over the last six months, I found myself returning over and over again to google scholar, but was unable to find any compelling research in this area.

Then, the exact week that officer involved shootings became a major news story again, with two high-profile incidents, rallies across country, and then a shooting against police officers, a relatively high profile piece of economic research came out. A working paper, The Empirical Analysis of Racial Differences in the Police Use of Force by Roland G. Fryer, Jr  was posted on the website of the National Bureau of Economic Research (NBER). The paper examines whether African Americans, and other minority groups experience disproportionate amounts of force, after being stopped or encountered by the police.


Sunday, July 24, 2016

Hadoop... There It Is (Part 2)

Well, at long last, I have completed my Hadoop Raspberry Cluster. It took a couple of months to dive back into this project. I have my own personal cloud, running similar technology that power some of the worlds most important tech companies. However, my cloud is pretty lame. It less powerful than the MacBook Air that I am currently writing this post on. But, at least it's complete and time to write about it!


Saturday, May 21, 2016

Analyze That: Data Journalism and Trump

In the last week or so, I have encountered lot of discussion about the failure of data journalists (mostly the good folks at fivethirtyeight.com) to predict Trump's nomination to the Republican Ticket. In fact, that's understating it a little bit, they were quite confident that Trump would not be elected - famously Nate Silver put his chances of winning around 2 percent. In a recent podcast and 538 article, Nate Silver did some interesting post-mortem on the analysis. In part, he critiques his own methods and in part chastises himself for issuing a subjective prediction that did not come from a computational model. For this, he states that in this particular instance, he acted like a pundit. He was too focused on his own priors and underestimated the uncertainty due to a small sample size of "Trump-like" candidates. At the same time, he does defend his use of empirical approaches.


Sunday, May 15, 2016

Analyze That

One of the things I often enjoy doing with my friends is thinking through some political, policy, economic, or business problem. Sometimes this an issue in the news, sometimes it's something that one of us recently read about or heard about on a podcast. Other times, it's some random topic that we happened to stumble onto over the course of a conversation. Either way, we generally just have a good time breaking such a problem down. We often jokingly refer to this as "consulting the shit" out of a problem. 

Tuesday, May 10, 2016

Neuroses

A few weeks ago I posted about the difference between machine learning and econometrics. Though I talked a lot about the applications of the two techniques, I tried to avoid getting very detailed about any of the algorithms involved. Recently, I've also been doing a deep dive on one of these machine-learning algorithms: Neural Networks.

About two years ago, I started hearing a lot about "deep learning" and a powerful algorithm called a neural network. These things seem to be everywhere: from Siri and to facial recognition. I must admit, somewhat embarrassingly, it took me quite some time to figure out what exactly a neural network was.

Everything that I read, when trying to understand the Neural Network, suffered from one of two problems. Some pieces just weren't that technical, and described the analogy of his algorithm to a human brain. They talked about things called "hidden layers" without telling me what was actually going on in them. The other type of post had a lot of math very quickly. It's not that I couldn't understand the math, but I wanted the high-level summary technical summary, knowing I would work through the math later. 

But, two things became very clear. First, these the "blackest box" of the machine learning algorithms we have. Everything about inference and decision-making in the last post does not apply when using the NNs. Second, they are really really powerful. They are extraordinarily good at addressing some of the toughest data science problems. Given their recent success, they aren't going anywhere.


So it was time to learn!

Friday, April 1, 2016

The Better Angels

A review of The Better Angels of Our Nature: Why Violence Has Declined by Steven Pinker

I recently finished reading The Better Angels of Our Nature: Why Violence Has Declined by Steven Pinker, after hearing Bill Gates recommend it on the Ezra Klein Show.* This book argues that one of the strongest and most important trends throughout human history is a massive decline in violence. Pinker compiles a large number of studies to argue that this trend actually exists and then turns to psychology, sociology, and evolutionary biology to better understand it.

Pinker is Harvard Professor in the Department of Psychology, and a brief look at his bio shows that his work reaches across fields. His book is no way an argument that there are no injustices left or that humanity has achieved some mission to remove violence. Instead, he is coming at this topic as a social scientist with an eye towards history. He observed a phenomenon, one that he argues is frequently overlooked, and tries to explain it.


Wednesday, March 16, 2016

A cool little thing I did

Ok, so this a very brief post about how the sausage gets made, but I need it to test what I am doing, and some of you nerds might actually want to do this too.

So its not really worth getting into all the details here, but I have been trying to figure out for a while how I can get interactive visualizations onto this blog. After all, if it's a data science blog, we need to be able to see the data in a fun informative way. Oh, the other caveat is, I am totally cheap and didn't want to pay for my blog to be hosted.




Sunday, March 13, 2016

Machine Inference

Hal Varian, the chief economist at Google, put together an awesome presentation about the differences and similarities between machine learning and econometrics. Varian clearly knows his stuff. Working at Google, he is privy to some of the foremost experts on machine-learning. And he no slouch when it comes to economics; he is the author of many students' favorite microeconomics textbook.

After getting a Ph.D in policy analysis where I studied a fair amount of econometrics, I started working as a data scientist using machine learning. Though I am not the same level expert that Varian is on the issue, I too have found myself talking about this a fair amount. So, at the risk of repeating much of the same subject matter, I decided I would write my take on it. My thoughts are a little less technical than Varian's, and are largely aligned with his presentation. By working through a real-ish example, I hope I can describe the risks of thinking addressing a topic that really requires econometrics with machine learning.


Sunday, January 31, 2016

Show Me a Power Broker

Review of Power Broker and Show Me A Hero

This summer I read an all time great book, The Power Broker: Robert Moses and the Fall of New York. Writing about this book was one of the first blog posts I envisioned when designing this blog. It was also a major upset when Power Broker did not top my list of books in 2015, but I suspect it will only rise in esteem over time. In my mind, I have also begun linking it with arguably my favorite piece of media that debuted in 2015: the HBO mini-series, Show Me A Hero

The Power Broker is a biography of Robert Moses, a historical figure I had previously never heard of, but now constantly see references to. Moses, the chief urban planner in New York City serving in various roles from 1920's to the 1960's, was a cross between Leslie Knope and J. Edgar Hoover. He began as a progressive reformer, writing legislation in New York State as an aide to Governor Al Smith. As part of his reformist agenda, he championed the creation of park space. This eventually transitioned him into the head of Long Island Park Commission, where he designed a masterpiece recreational development.



Monday, January 18, 2016

Balls Out

An analysis of  the frequency with which customers choose Powerball numbers

I told you that I got a little obsessed with the Powerball lotto last week, when I broke down the behavioral economics behind my decision to join an office pool. Well the obsession continued, as I searched for an optimal strategy for playing the lottery. Plus, writing about the Powerball again allows me pull out more "ball"-related double entendres.

Optimal strategy, isn't it a lottery? That's a reasonable question, as the whole lottery just comes down to picking some random numbers. But, when the lotto gets as popular as this last one did, someone who buys the ticket is not just playing against a random number generator. They are also playing against everyone else who bought a ticket. This is, of course, because as more tickets are bought, the probability that many players purchase a winning ticket increases. If this happens, the winners split jackpot and each buyers' share is smaller.

Tuesday, January 12, 2016

Power Ballin'

Behavioral Economics and the Largest Lottery

On Saturday, my neighbors handed me a lotto ticket as a birthday gift. Of course, this was no ordinary lottery ticket; the Powerball had reached the biggest pot ever at nearly one billion dollars. Later that night, I remembered that I had shoved the ticket in my pocket. I had fleeting a moment of joy, where all of a sudden the future seemed wide open. Anything was possible. 

Then, I checked the results. It turned out the only plausible thing occurred, I lost. So did everyone else, and pot for the next lotto increased to well over a billion dollars. Since that moment, I have been more than a little obsessed. 


Even understanding probability as I do, I found myself falling prey to many interesting behavioral phenomena.

Sunday, January 10, 2016

Hadoop... There it is (Part 1)

Adventures in building my own personal cloud

Around Silicon Valley, people talk a lot about Moore's Law, which observes that microprocessors double in power (for a constant cost) roughly every 18 months. This law has produced something else that Silicon Valley types talk a lot about: Big Data.

That's right, Big Data did not just pop out of nowhere. As computers have more cheaper and more powerful, the cost of storing data has dropped dramatically. When computers were slow and expensive, people and companies had to make choices about what data to save. Today, its probably more expensive to hire people to spend time thinking about what data to store than it would be to just throw it on a computer somewhere. 

Actually, that last sentence is not quite correct. As powerful a modern computers are, most are not quite big enough to handle Big Data. In fact, the term Big Data defines data that it are too big to store on a single computer. Instead, the data are stored on whole bunch of computers networked together, called a cluster. So I should have said, just throw the data on a bunch of computers somewhere.

Working as a data scientist,  I work on this type of cluster. But I have never actually seen or touched the computers!  In an effort to understand a little bit more about how they work, I decided to build my own personal Big Data computer cluster. 

Sunday, January 3, 2016

How I get to work

I know… what could be more self-indulgent and boring to readers than a post about my morning commutes? However, I find myself explaining my commute to tons of people, because it’s pretty interesting. I actually believe that my commute is an interesting example of policy at work. I would also love to find out if anyone has estimated the economic value of my commute. So bear with me, hopefully this is not as self-indulgent as it seems.