Friday, January 2, 2015

Something I Learnt About sed

Note: I’m writing this down to remind myself for the next time I’m using sed and any kind of regex.

I was working on some very simple task: replace a string in a file with a different string.
All of this should happen on our build machine during the product build. Nothing too fancy. But then I got bitten by different versions of sed available on the build machines.
We build our product on OSX, Windows and Linux and I had to realize, that the regex I used, didn’t work with every sed version on the build machines.

My Regex Confusion

To make it clear: I’m not a regex guru. I use it for simple things. When it get’s complex, I usually consult some tools like regex101.com to get it right.
I started with this simple regex to replace the number after the - (the number of commits). The buildnumber looks like 1.0.0-12345.
Here is the command I started with
`cat build.prop | sed "s/(brackets_build_version=\d+.\d+.\d+-)(\d+)/\1${NEW_COMMITS}/" > build.prop.new`.
The result:
`sed: 1: "s/(brackets_build_versi ...": \1 not defined in the RE`
Okay, that doesn’t help that much. After some research I have found the solution: -E needs to be added to the command.
Okay, that error went away, but the replace didn’t work either. Hm, now I had to debug the Regex and find the issue.
After several rounds of experimentation, I started to read the documentation for re_format. That explained a lot: there are modern RE and obsolete RE (they are sometimes referred to as Basic Regular Expression (BRE) and Extended Regular Expression (ERE)), that mainly exist for backward compatibility. They are not as powerful as the modern RE and lack some features.
The last working version looked like:
`cat build.prop | sed -E "s/(brackets_build_version=[[:digit:]]+.[[:digit:]]+.[[:digit:]]+)-([[:digit:]]+)/\1-${COMMITS}/"`
This worked! Heureka.
But wait, why do I have to used [[:digit:]] instead of the much shorter \d+? I don’t know and probably I will never find out. 

Deploy on Build Machines

Happy, that I finally found a solution that worked on OSX. I totally forgot about the other OSses. Once I made the changes and started the build, it failed on Linux and Windows.
WTH? What was I missing? Why doesn’t it even work on Linux? Quickly started my Linux VM and gave it try. The solution was simple: I had to call sed with -r in order to make it work.

Conclusion or What I have Learnt

sed behaves differently on different OS versions. This is probably no exciting news, but I thought I keep this as a reminder for myself.
Using bash or some derivative of it like cygwin or gitbash on Windows does make a difference for some tools. Especially sed comes in two variants: BSD and GNU.
OSX uses BSD and I guess linux and cygwin come with GNU. They might have different command line switched for the same option. So be careful and testing is always required.
Another difference is the Regex engine tha can be used. sed supports basic regular expressions (BRE’s) and extended (ERE or modern) regular expressions
As I mentioned bofore, enabling the extended regular expressions you need to call sed -E on OSX and sed -r on Linux and cygwin.