Wulf's Webden

The Webden on WordPress

Building it up

Another day, another task involving wrangling a text file to extract data and another chance to wield the might hammer of Vim to crack the problem. My starting point was a database extract with two columns of related identifiers which we’ll call A and B. I wanted to construct a line of HTML from each pair such that I can smoothly visit a page in an application to assess its contents. With just a few lines, this is easily done by hand but I’ve got fifty lines in this case. Fortunately, I also have the power of backreferences, which I learned last year.

The new trick I have added in the intervening time is a method of constructing the resulting scary looking statements step by step. Firstly, I search in a way that highlights the whole line. However, rather than using .*, which grabs everything in one chunk, I break it down into chunks that produce the same result. Rather than changing anything, I use an ampersand to simply repeat everything I found. For example, this morning’s data had a four character id consisting of two letters followed by two numbers, a tab and then another identifier, so I used:

:%s/..\d\d\t.*/&/

Next, I can use escaped parentheses to group the chunks of my search and back references to return them:

:%s/\(..\d\d\)\(\t\(.*\)/\1\2\3/

Again, it doesn’t change anything but I am now poised to finish off by rearranging how I call the references. I can insert them into the HTML statement, repeating as I want (for example, one of the components is useful as part of the label and as part of the URL) and ignoring \2, which is always a tab character but necessary in the search to allow me to pick up \1 and \3 cleanly.

It has similarities to that game where you start with one word and by a succession of alterations to a single letter change it into another word. However, while that is an interesting mental challenge, it is a diversion and costs time. The kind of stepwise development I am describing, approaching the problem crabwise, is all about saving time, making sure that before I strike and change the data, everything is lined up and that the problem is a series of easy steps rather than one complex and involved one.

Comments are closed.