A line based file splitter for the command line.

Have you ever wanted to extract only a certain set of lines from a file? Maybe you wanted to get everything from line 400 onwards, or just lines 25 to 50? Well I did. I call the end result ‘splitter’.

Splitter is a program designed to be used on the command line and it has been written entirely in Haskell. I have uploaded Splitter so that it is available on Hackage. You can find the source code for splitter on BitBucket, along with the source code for ‘range’ the library that I wrote in order to make the splitter program easier to deal with. The repositories are here:

But words are just words and I really need to show you some examples.

Show me an Example!

For this demo lets make a file that has twenty lines in it and, on every line, are the numbers one to twenty, like this one: Twenty Numbers.

If you were to get that file (calling it ‘twenty.txt’) then the following commands would have the following results. You could get single lines from files:

$ cat twenty.txt | splitter 3
three
$

You could get an entire range of lines from a file:

$ cat twenty.txt | splitter 5-9
five
six
seven
eight
nine
$

You could get multiple ranges from the file:

$ cat twenty.txt | splitter 10-14,2-4
two
three
four
ten
eleven
twelve
thirteen
fourteen
$

You can get ranges that are only bounded on one side:

$ cat twenty.txt | splitter -5,15-
one
two
three
four
five
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
$

You can invert the selection if you chose to:

$ cat twenty.txt | splitter -i -5,15-
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
$

And you can specify an infinite range if you really want to (even though it would be the same as ‘cat’):

$ cat twenty.txt | splitter *
one
two
three
four
five
six
seven
eight
nine
ten
eleven
twelve
thirteen
fourteen
fifteen
sixteen
seventeen
eighteen
nineteen
twenty
$

And the are a few more options that you can choose from that you can see by running ‘splitter –help’. I would recommend that you have a play around with it yourself. It will be possible to install it on any platform that has a cabal-install installed. Which will be part of the Haskell Platform.

Concluding Words

The bottom line is that splitter makes it really easy to only extract certain lines from your files. It also has the following features so that you can:

  • Select any range that you like; whether infinite or fixed.
  • Select infinite ranges.
  • Invert your selection so that you get all of the lines that you did NOT specify.
  • You can get the line numbers printed out with the lines in the file.
  • Lines are printed out when they are ready. Meaning that you can use splitter on a logfile in the same way that you can use ‘tail -f’.

I have tried to make it a highly useful and focussed tool to get certain lines from files using an easy to understand format to specify which lines that you want. For more detailed information you should check out the README file on BitBucket. It is perhaps the most comprehensive and up to date resource on the way to use the splitter tool.

Extra: Range code huh? That sounds useful.

While I was writing this I did indeed look around for Range libraries that would meet my criteria. I discovered the following:

  • ranges
    A nice looking package that has been marked as Obsolete by the Author. I did not want to have to be stuck on an obsolete version of code that would not be updated. Also, this library cannot handle infinite ranges.
  • Ranged-set
    T
    his is a nice library and it makes good use of Haskell classes but it does not support infinite ranges either and thus was not suitable for this project

So, getting excited and wanting to start from scratch, I wrote my own library called: range. That I have now placed on Hackage. Please feel free to use it for your own purposes and I will happily accept pull requests on that work.

About these ads

2 thoughts on “A line based file splitter for the command line.

  1. Hi,
    on a practical point of view you can do all this with classic unix commands, example:

    sed -n ’1,5p;12,$p’ twenty.txt | tac

    Will print lines 1 to 5, 12 to the EOF, in reverse.

    Cheers

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s