Sunday, July 31, 2016

Research Matters

One of my first blog posts was about the fantastic book Between the World and Me, by Ta-Nehisi Coates. At that time, I said I was looking forward to writing about some academic research about racism and the use of force by police officers. As with many things in the blog, it took a while. In this case, it wasn't for lack of trying. Over the last six months, I found myself returning over and over again to google scholar, but was unable to find any compelling research in this area.

Then, the exact week that officer involved shootings became a major news story again, with two high-profile incidents, rallies across country, and then a shooting against police officers, a relatively high profile piece of economic research came out. A working paper, The Empirical Analysis of Racial Differences in the Police Use of Force by Roland G. Fryer, Jr  was posted on the website of the National Bureau of Economic Research (NBER). The paper examines whether African Americans, and other minority groups experience disproportionate amounts of force, after being stopped or encountered by the police.


Sunday, July 24, 2016

Hadoop... There It Is (Part 2)

Well, at long last, I have completed my Hadoop Raspberry Cluster. It took a couple of months to dive back into this project. I have my own personal cloud, running similar technology that power some of the worlds most important tech companies. However, my cloud is pretty lame. It less powerful than the MacBook Air that I am currently writing this post on. But, at least it's complete and time to write about it!


Saturday, May 21, 2016

Analyze That: Data Journalism and Trump

In the last week or so, I have encountered lot of discussion about the failure of data journalists (mostly the good folks at fivethirtyeight.com) to predict Trump's nomination to the Republican Ticket. In fact, that's understating it a little bit, they were quite confident that Trump would not be elected - famously Nate Silver put his chances of winning around 2 percent. In a recent podcast and 538 article, Nate Silver did some interesting post-mortem on the analysis. In part, he critiques his own methods and in part chastises himself for issuing a subjective prediction that did not come from a computational model. For this, he states that in this particular instance, he acted like a pundit. He was too focused on his own priors and underestimated the uncertainty due to a small sample size of "Trump-like" candidates. At the same time, he does defend his use of empirical approaches.


Sunday, May 15, 2016

Analyze That

One of the things I often enjoy doing with my friends is thinking through some political, policy, economic, or business problem. Sometimes this an issue in the news, sometimes it's something that one of us recently read about or heard about on a podcast. Other times, it's some random topic that we happened to stumble onto over the course of a conversation. Either way, we generally just have a good time breaking such a problem down. We often jokingly refer to this as "consulting the shit" out of a problem. 

Tuesday, May 10, 2016

Neuroses

A few weeks ago I posted about the difference between machine learning and econometrics. Though I talked a lot about the applications of the two techniques, I tried to avoid getting very detailed about any of the algorithms involved. Recently, I've also been doing a deep dive on one of these machine-learning algorithms: Neural Networks.

About two years ago, I started hearing a lot about "deep learning" and a powerful algorithm called a neural network. These things seem to be everywhere: from Siri and to facial recognition. I must admit, somewhat embarrassingly, it took me quite some time to figure out what exactly a neural network was.

Everything that I read, when trying to understand the Neural Network, suffered from one of two problems. Some pieces just weren't that technical, and described the analogy of his algorithm to a human brain. They talked about things called "hidden layers" without telling me what was actually going on in them. The other type of post had a lot of math very quickly. It's not that I couldn't understand the math, but I wanted the high-level summary technical summary, knowing I would work through the math later. 

But, two things became very clear. First, these the "blackest box" of the machine learning algorithms we have. Everything about inference and decision-making in the last post does not apply when using the NNs. Second, they are really really powerful. They are extraordinarily good at addressing some of the toughest data science problems. Given their recent success, they aren't going anywhere.


So it was time to learn!

Friday, April 1, 2016

The Better Angels

A review of The Better Angels of Our Nature: Why Violence Has Declined by Steven Pinker

I recently finished reading The Better Angels of Our Nature: Why Violence Has Declined by Steven Pinker, after hearing Bill Gates recommend it on the Ezra Klein Show.* This book argues that one of the strongest and most important trends throughout human history is a massive decline in violence. Pinker compiles a large number of studies to argue that this trend actually exists and then turns to psychology, sociology, and evolutionary biology to better understand it.

Pinker is Harvard Professor in the Department of Psychology, and a brief look at his bio shows that his work reaches across fields. His book is no way an argument that there are no injustices left or that humanity has achieved some mission to remove violence. Instead, he is coming at this topic as a social scientist with an eye towards history. He observed a phenomenon, one that he argues is frequently overlooked, and tries to explain it.


Wednesday, March 16, 2016

A cool little thing I did

Ok, so this a very brief post about how the sausage gets made, but I need it to test what I am doing, and some of you nerds might actually want to do this too.

So its not really worth getting into all the details here, but I have been trying to figure out for a while how I can get interactive visualizations onto this blog. After all, if it's a data science blog, we need to be able to see the data in a fun informative way. Oh, the other caveat is, I am totally cheap and didn't want to pay for my blog to be hosted.