Regular Expressions: Difference between revisions

From Elvanör's Technical Wiki
Jump to navigation Jump to search
No edit summary
Line 1: Line 1:
This article is an introduction to an extremely powerful tool available to any programmer, Regular Expressions.  
This article is an introduction to an extremely powerful tool available to any programmer, Regular Expressions.  


== Testing ==
= Testing =


* The easiest way to test regular expressions (PCRE style) is probably to use Perl directly like this.
* The easiest way to test regular expressions (PCRE style) is probably to use Perl directly like this.
Line 9: Line 9:
* You can also use grep but it does not support PCRE syntax natively (GNU grep has the -P switch which does).
* You can also use grep but it does not support PCRE syntax natively (GNU grep has the -P switch which does).


== Operators ==
= Operators =


* ^ represents the start of the string to search, or a newline in multiline mode. It is optional. Note that if you use syntax such as '^.*', you can just remove it altogether: it's better just not to use ^ in this case.
* ^ represents the start of the string to search, or a newline in multiline mode. It is optional. Note that if you use syntax such as '^.*', you can just remove it altogether: it's better just not to use ^ in this case.
Line 16: Line 16:
* ? allows to match optionally only one expression. It should be put after the expression, eg ab? will match either a or ab.
* ? allows to match optionally only one expression. It should be put after the expression, eg ab? will match either a or ab.


== Regular expressions in Python ==
= Regular expressions in Python =


[http://docs.python.org/lib/module-re.html Official documentation available here.]
[http://docs.python.org/lib/module-re.html Official documentation available here.]


=== Basic Operations ===
== Basic Operations ==


You can perform two basic operations: ''search'' and ''match''. In Perl, ''search'' is always used.
You can perform two basic operations: ''search'' and ''match''. In Perl, ''search'' is always used.
Line 27: Line 27:
* search(): Scan through a string, looking for any location where this RE matches.
* search(): Scan through a string, looking for any location where this RE matches.


=== Groups ===
== Groups ==


In a regular expression, we often want to extract a particular piece of information from a string. We need to enclose the relevant "sub expression" in parenthesis. In Python, we can then refer to this group by its number, or, if we add ?P<''name''> to the group, by its name. To create a group which will not be available later for retrieval, write (?:''expression'').
In a regular expression, we often want to extract a particular piece of information from a string. We need to enclose the relevant "sub expression" in parenthesis. In Python, we can then refer to this group by its number, or, if we add ?P<''name''> to the group, by its name. To create a group which will not be available later for retrieval, write (?:''expression'').
Line 36: Line 36:
** We could access the price value by math_object.group(1) or math_object.group('Price'). For the quantity, it would be group(2) or group('Quantity').
** We could access the price value by math_object.group(1) or math_object.group('Price'). For the quantity, it would be group(2) or group('Quantity').


== Regular expressions in Java ==
= Regular expressions in Java =


* Documentation for the [http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html Pattern] and [http://java.sun.com/javase/6/docs/api/java/util/regex/Matcher.html Matcher] classes.
* Documentation for the [http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html Pattern] and [http://java.sun.com/javase/6/docs/api/java/util/regex/Matcher.html Matcher] classes.
Line 42: Line 42:
* The matches() method of the Matcher class is similar to match() in Python; find() is similar to search().
* The matches() method of the Matcher class is similar to match() in Python; find() is similar to search().


== Regular expressions in Groovy ==
= Regular expressions in Groovy =


* [http://groovy.codehaus.org/Regular+Expressions Documentation.]
* [http://groovy.codehaus.org/Regular+Expressions Documentation.]

Revision as of 11:28, 17 February 2009

This article is an introduction to an extremely powerful tool available to any programmer, Regular Expressions.

Testing

  • The easiest way to test regular expressions (PCRE style) is probably to use Perl directly like this.
echo "input" | perl -n -e '/regexpHere/ and print'
  • You can also use grep but it does not support PCRE syntax natively (GNU grep has the -P switch which does).

Operators

  • ^ represents the start of the string to search, or a newline in multiline mode. It is optional. Note that if you use syntax such as '^.*', you can just remove it altogether: it's better just not to use ^ in this case.
  • $ delimits the end of the string or just before the newline in multiline mode. It is optional.
  • *? is the non-greedy operator, it will match as little text as possible. +?, ??, or {m,n}? are also available.
  • ? allows to match optionally only one expression. It should be put after the expression, eg ab? will match either a or ab.

Regular expressions in Python

Official documentation available here.

Basic Operations

You can perform two basic operations: search and match. In Perl, search is always used.

  • match(): Determine if the RE matches at the beginning of the string.
  • search(): Scan through a string, looking for any location where this RE matches.

Groups

In a regular expression, we often want to extract a particular piece of information from a string. We need to enclose the relevant "sub expression" in parenthesis. In Python, we can then refer to this group by its number, or, if we add ?P<name> to the group, by its name. To create a group which will not be available later for retrieval, write (?:expression).

  • Example:
    • regular_expression = re.compile(r"&price=(?P<Price>.*)&quantity=(?P<Quantity>.*)")
    • math_object = regular_expression.search(query_string)
    • We could access the price value by math_object.group(1) or math_object.group('Price'). For the quantity, it would be group(2) or group('Quantity').

Regular expressions in Java

  • The matches() method of the Matcher class is similar to match() in Python; find() is similar to search().

Regular expressions in Groovy

  • Groovy has the following shortcuts:
    • ==~ for matches().
    • =~ for creating a matcher. The matcher is coerced to a Boolean via its find() method, thus you can write stuff like
if ("hello" =~ /hel/)
  • A pattern can be directly created via ~/foo/.