Intro After 3 or so years with Octopress, I’m moving my blog/website over to Hugo with the help of the new Blogdown R package. It had gotten to be a little absurd how much work I had to do to twist rendered RMarkdown documents into Octopress posts. Thankfully, Yihui Xie was already on the job of making my ridiculous blogging workflow obsolete.
Less than 48 hours after learning of Blogdown’s existence I’m well on my way to having my blog fully migrated and the words in my .
Problem Solution Coda Recap. Problem As I try to get my blog up and running using the new Hugo/blogdown framework, I find myself needing to reformat a bunch of dates in the yaml front matter from mm/dd/yyyy to yyyy-mm-dd for all of the .Rmd files in a directory. For this I’m going to use sed, something I’ve been meaning to get more practice with. I’m going to try to come up with the syntax before consulting google, and then see how close I was.
Motivation Flint maps (from winter 2016) Sentinel tests Animations Motivation The tweenr R package looks really cool! I want 1) to try it, 2) to revisit some old interests that never made it to the blog, and 3) to keep one foot in the water sector, despite some uncertainty as to its part in my professional future.
Early in 2016 I was playing around with Flint, Michigan lead data and wanted to see what trends I could pick out spatially.
I’m taking a break from helping DataFest participants untangle their R code to work on clearing my backlog of unfinished blog posts. One thing I’ve been meaning to talk about is a series of R packages I’ve written, in various stages of development. This is something I’ve done before, but still new enough to be excited about a working product.
All these packages are built to access data from US Federal agency APIs.
Suppose you want to know something about a random variable that is only measurable when its value is above a certain threshold. For example, you have a dataset of the concentration of nitrate in a river (measured monthly for the last 100 months), but the analytical instrument has a detection limit below which it stops working. Such data are called censored in the statistical community. How can you do inference on such data?
I’ve been writing a lot using both RMarkdown and Markdown lately. In the past I would typically default to RMarkdown because that’s the default in RStudio, but more recently I’ve been writing some pretty math-heavy documents and wanted something that could render them in real-time. This led me to Haroopad, which has mostly worked really well (with the exception of crashing so hard once that it actually deleted a previously saved version of the document I was working on), but alas, it doesn’t recognize the chunks of code I put in my Rmarkdown docs.
I’ve been working with water data collected by other people for the past 3 years or so. Typically, I’ve gotten my hands on it in 3 different ways:
email, dropbox, etc. from collaborators. Usually in .xlsx or .csv format. Inconsistently formatted; seldom tidy. Downloading manually from a website, e.g. USGS instantaneous flows pre-2007 Using an API for web services such as the water quality portal (WQP) or NWIS. I’ve mostly done this using R functions in the dataRetrieval package put out by USGS.
I’m on a train back from Boston after attending the 2015 Open Data Science Conference. Two days of serious nerding out among afficionados of open-source software like R and Python. Herein I give some highlights. Notes from talks and workshops I attended are here
Train from boston
Some main points:
Data scientists do a lot of different things, and it’s still not clear what the job title refers to.