I’m on a train back from Boston after attending the 2015 Open Data Science Conference. Two days of serious nerding out among afficionados of open-source software like R and Python. Herein I give some highlights. Notes from talks and workshops I attended are here
Some main points:
Data scientists do a lot of different things, and it’s still not clear what the job title refers to. Josh Wills of Cloudera says it’s a person who (1) knows more about statistics than any software developer, and (2) knows more about software development than any statistician. I have work to do on both fronts.
Lots of people are advocating for open data–for governments of all levels, NGOs and development institutions, and the private sector. Several talks on this. Here’s one.. I love this stuff because it feels like data science coming out into the world and making a difference that isn’t just a marketing insight.
Feature engineering is emerging as one of the most important tools in predictive modeling. This goes hand-in-hand with an emphasis on domain knowledge. To me this sounds a lot like the goal of traditional science, going back to Aistotle: figure out the cause of the observed effect. So a cynic might say that the overinflated empiricism of the data-science community has let out a little air, recognizing that, yes, there is such a thing as causality, and yes, it is a useful thing to seek out (something you can’t do with a learning algorithm). Of course this is totally compatible with the idea that data science is a useful complement to deduction-based science.
Other stuff: I got an introduction to running Hadoop in the cloud with Amazon Web Services. Unclear if I’ll ever have reason to use this, but I do like the nerd cred this gets me.