Quick practice exercise with sed
Problem
As I try to get my blog up and running using the new Hugo/blogdown framework, I find myself needing to reformat a bunch of dates in the yaml front matter from mm/dd/yyyy
to yyyy-mm-dd
for all of the .Rmd files in a directory. For this I’m going to use sed
, something I’ve been meaning to get more practice with. I’m going to try to come up with the syntax before consulting google, and then see how close I was.
OK, it should be someting like:
sed 's_^date: \([0:1][0:9]\)/\([0:3][0:9]\)/\(1[4:7]\)_20\3-\1-\2_ '
I realize I’m guessing for way too much of this. First of all, I have no idea how to do this for all files in a directory. Second, I forget what to put at the end to make it do the replacement and write that to the file. I could go on. But first I’m going to copy everything to a new folder and try it.
mkdir sedtestfolder
cp post/*.Rmd sedtestfolder/
cd sedtestfolder
Try for a single file. I renamed it to file1.Rmd. I’m going to try:
sed s_'^date: \([0:1][0:9]\)/\([0:3][0:9]\)/\(1[4:7]\)_20\3-\1-\2_ ' file1.Rmd > better.Rmd
That did nothing except create a copy of file1.Rmd in better.Rmd Time to consult the guidebook.
Starting more simply.
echo "11/23/2015" | sed 's_11/23/(2015)_\1_'
Nope! Escape those parens.
echo "11/23/2015" | sed 's_11/23/\(2015\)_\1_'
## 2015
Good. Now put into the correct order:
echo "11/23/2015" | sed 's_\(11\)/\(23\)/\(2015\)_\3-\1-\2_'
## 2015-11-23
That’s it. Next with wildcards.
echo "date: 11/23/2015" | sed 's_\([0-1][0-9]\)/\([0-3][0-9]\)/\(20[0-3][0-9]\)_\3-\1-\2_'
## date: 2015-11-23
Hypens not colons. Hyphens not colons. Hyphens not colons. Got it.
Next step: get from file.
sed 's_\([0-1][0-9]\)/\([0-3][0-9]\)/\(20[0-3][0-9]\)_\3-\1-\2_' < file1.Rmd
Yup. And put into a new file.
sed 's_\([0-1][0-9]\)/\([0-3][0-9]\)/\(20[0-3][0-9]\)_\3-\1-\2_' < file1.Rmd > file2.Rmd
Now the hard part. Doing it for every file in the folder. Well, not really hard, but will require more than guessing and checking.
Solution
And the sulution is… use find | xargs sed
as seen in this post.
find ./ -type f -name "*.Rmd" | xargs sed -i 's_\([0-1][0-9]\)/\([0-3][0-9]\)/\(20[0-3][0-9]\)_\3-\1-\2_'
Coda
Now suppose I want to get the date from the file name. (Turns out the existing yaml front matter didn’t have consistent mm/dd/yyyy
format, but all the file names already include the yyyy-mm-dd
string.) I think I’ll need to do this in a loop.
for FILE in $(find ./ -type f -name "*.Rmd"); do
DATESTRING=$(echo $FILE | grep -o '[0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\}')
$FILE | sed -i 's/^date: .*\n/date: $DATESTRING\n/'
done
Hooray! That did it!
Recap.
My first (WRONG) try looked like this:
sed 's_^date: \([0:1][0:9]\)/\([0:3][0:9]\)/\(1[4:7]\)_20\3-\1-\2_ '
In the course of getting it right, learned a few things about sed
.
- Character ranges are like this
[0-9]
, not this[0:9]
. (This is a regex thing, not ased
thing.) - Use the
-i
flag to do thesed
edits in the same file the data are coming from. - Escape your parens (
\(
, etc.) for storing and later referencing. (I did this in my first try but lost it partway through) - Specify number of occurrences with
\{3\}
, etc. (regex, notsed
) - Double quotes allow for variable expansion; single quotes do not.