Regular Expressions: Difference between revisions
Jump to navigation
Jump to search
No edit summary |
|||
Line 1: | Line 1: | ||
This article is an introduction to an extremely powerful tool available to any programmer, Regular Expressions. | This article is an introduction to an extremely powerful tool available to any programmer, Regular Expressions. | ||
= Testing = | |||
* The easiest way to test regular expressions (PCRE style) is probably to use Perl directly like this. | * The easiest way to test regular expressions (PCRE style) is probably to use Perl directly like this. | ||
Line 9: | Line 9: | ||
* You can also use grep but it does not support PCRE syntax natively (GNU grep has the -P switch which does). | * You can also use grep but it does not support PCRE syntax natively (GNU grep has the -P switch which does). | ||
= Operators = | |||
* ^ represents the start of the string to search, or a newline in multiline mode. It is optional. Note that if you use syntax such as '^.*', you can just remove it altogether: it's better just not to use ^ in this case. | * ^ represents the start of the string to search, or a newline in multiline mode. It is optional. Note that if you use syntax such as '^.*', you can just remove it altogether: it's better just not to use ^ in this case. | ||
Line 16: | Line 16: | ||
* ? allows to match optionally only one expression. It should be put after the expression, eg ab? will match either a or ab. | * ? allows to match optionally only one expression. It should be put after the expression, eg ab? will match either a or ab. | ||
= Regular expressions in Python = | |||
[http://docs.python.org/lib/module-re.html Official documentation available here.] | [http://docs.python.org/lib/module-re.html Official documentation available here.] | ||
== Basic Operations == | |||
You can perform two basic operations: ''search'' and ''match''. In Perl, ''search'' is always used. | You can perform two basic operations: ''search'' and ''match''. In Perl, ''search'' is always used. | ||
Line 27: | Line 27: | ||
* search(): Scan through a string, looking for any location where this RE matches. | * search(): Scan through a string, looking for any location where this RE matches. | ||
== Groups == | |||
In a regular expression, we often want to extract a particular piece of information from a string. We need to enclose the relevant "sub expression" in parenthesis. In Python, we can then refer to this group by its number, or, if we add ?P<''name''> to the group, by its name. To create a group which will not be available later for retrieval, write (?:''expression''). | In a regular expression, we often want to extract a particular piece of information from a string. We need to enclose the relevant "sub expression" in parenthesis. In Python, we can then refer to this group by its number, or, if we add ?P<''name''> to the group, by its name. To create a group which will not be available later for retrieval, write (?:''expression''). | ||
Line 36: | Line 36: | ||
** We could access the price value by math_object.group(1) or math_object.group('Price'). For the quantity, it would be group(2) or group('Quantity'). | ** We could access the price value by math_object.group(1) or math_object.group('Price'). For the quantity, it would be group(2) or group('Quantity'). | ||
= Regular expressions in Java = | |||
* Documentation for the [http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html Pattern] and [http://java.sun.com/javase/6/docs/api/java/util/regex/Matcher.html Matcher] classes. | * Documentation for the [http://java.sun.com/javase/6/docs/api/java/util/regex/Pattern.html Pattern] and [http://java.sun.com/javase/6/docs/api/java/util/regex/Matcher.html Matcher] classes. | ||
Line 42: | Line 42: | ||
* The matches() method of the Matcher class is similar to match() in Python; find() is similar to search(). | * The matches() method of the Matcher class is similar to match() in Python; find() is similar to search(). | ||
= Regular expressions in Groovy = | |||
* [http://groovy.codehaus.org/Regular+Expressions Documentation.] | * [http://groovy.codehaus.org/Regular+Expressions Documentation.] |
Revision as of 11:28, 17 February 2009
This article is an introduction to an extremely powerful tool available to any programmer, Regular Expressions.
Testing
- The easiest way to test regular expressions (PCRE style) is probably to use Perl directly like this.
echo "input" | perl -n -e '/regexpHere/ and print'
- You can also use grep but it does not support PCRE syntax natively (GNU grep has the -P switch which does).
Operators
- ^ represents the start of the string to search, or a newline in multiline mode. It is optional. Note that if you use syntax such as '^.*', you can just remove it altogether: it's better just not to use ^ in this case.
- $ delimits the end of the string or just before the newline in multiline mode. It is optional.
- *? is the non-greedy operator, it will match as little text as possible. +?, ??, or {m,n}? are also available.
- ? allows to match optionally only one expression. It should be put after the expression, eg ab? will match either a or ab.
Regular expressions in Python
Official documentation available here.
Basic Operations
You can perform two basic operations: search and match. In Perl, search is always used.
- match(): Determine if the RE matches at the beginning of the string.
- search(): Scan through a string, looking for any location where this RE matches.
Groups
In a regular expression, we often want to extract a particular piece of information from a string. We need to enclose the relevant "sub expression" in parenthesis. In Python, we can then refer to this group by its number, or, if we add ?P<name> to the group, by its name. To create a group which will not be available later for retrieval, write (?:expression).
- Example:
- regular_expression = re.compile(r"&price=(?P<Price>.*)&quantity=(?P<Quantity>.*)")
- math_object = regular_expression.search(query_string)
- We could access the price value by math_object.group(1) or math_object.group('Price'). For the quantity, it would be group(2) or group('Quantity').
Regular expressions in Java
- The matches() method of the Matcher class is similar to match() in Python; find() is similar to search().
Regular expressions in Groovy
- Groovy has the following shortcuts:
- ==~ for matches().
- =~ for creating a matcher. The matcher is coerced to a Boolean via its find() method, thus you can write stuff like
if ("hello" =~ /hel/)
- A pattern can be directly created via ~/foo/.