logo
Linux User
OpenSource Apache Embedded C Linux MySQL No deposit no GamStop bonuses Perl PHP Samba

Module Release May 2002: Search Text Files Using Regular Expressions

This month's module release covers LPI 101, Topic 1.3, Objective 7 which requires the Linux users/administrators to create:

  • "simple regular expressions and ... (use) related tools such as grep and sed to perform searches"

For those who are completely new to Unix, this module introduces one of the most powerful techniques developed on the platform and exported to the rest of an ungrateful IT world: regular expressions.

Regular expressions are patterns of characters which you wish to find - often in order to modify, replace or delete them. Patterns can be simple strings of characters, like words in prose, but are more likely to include 'special' characters which enable you to match very precise patterns, e.g. ^\ \+ will match any number of spaces found at the beginning of a line.

Once you have composed a pattern which matches the object you are looking for, tools like grep (standing for Get Regular Expression and Print, to screen) or descendents like egrep (Extended grep) can be used to search for a match in any part of your system. Indeed, if required, grep can be directed to search through every single byte of code on your system, in order to find matches.

Having found lines matching a pattern, you can immediately pipe them through a text filtering tool like sed (the Stream Editor) which can transform them in some way, e.g. replacing the matched pattern with something completely different and inserting the modifications back into the correct place in the original file.

Traditional Unix hackers would typically use a tool like awk to program large and very complex series of modifications to their systems in one run of a very short script. Years ago this was only possible on Unix because regular expressions, because of the way Unix systems stored their files in plain (ascii) text formatted with newline characters, and because the way Unix files could be connected through pipes.

Modern developers tend to prefer fully blown programming languages like Perl, or Python, over awk. These have grep and sed-like functionality built into them, provide more powerful general programming support and are portable, i.e. they can be deployed on non-Unix systems ... if you must.