Regular Expressions: Difference between revisions

From Elvanör's Technical Wiki
Jump to navigation Jump to search
No edit summary
Line 3: Line 3:
== Operators ==
== Operators ==


* ^ delimits the start of the regular expression. It is optional.
* ^ represents the start of the string to search, or a newline in multiline mode. It is optional. Note that if you use syntax such as '^.*', you can just remove it altogether: it's better just not to use ^ in this case.
* $ delimits the end of the regular expression. It is optional.
* $ delimits the end of the string or just before the newline in multiline mode. It is optional.
* *? is the non-greedy operator, it will match ''as little text as possible.'' +?, ??, or {m,n}? are also available.
* *? is the non-greedy operator, it will match ''as little text as possible.'' +?, ??, or {m,n}? are also available.
* ? allows to match optionally only one expression. It should be put after the expression, eg ab? will match either a or ab.
* ? allows to match optionally only one expression. It should be put after the expression, eg ab? will match either a or ab.

Revision as of 10:15, 24 October 2007

This article is an introduction to an extremely powerful tool available to any programmer, Regular Expressions.

Operators

  • ^ represents the start of the string to search, or a newline in multiline mode. It is optional. Note that if you use syntax such as '^.*', you can just remove it altogether: it's better just not to use ^ in this case.
  • $ delimits the end of the string or just before the newline in multiline mode. It is optional.
  • *? is the non-greedy operator, it will match as little text as possible. +?, ??, or {m,n}? are also available.
  • ? allows to match optionally only one expression. It should be put after the expression, eg ab? will match either a or ab.

Regular expressions in Python

Official documentation available here.

Basic Operations

You can perform two basic operations: search and match. In Perl, search is always used.

  • match(): Determine if the RE matches at the beginning of the string.
  • search(): Scan through a string, looking for any location where this RE matches.

Groups

In a regular expression, we often want to extract a particular piece of information from a string. We need to enclose the relevant "sub expression" in parenthesis. In Python, we can then refer to this group by its number, or, if we add ?P<name> to the group, by its name. To create a group which will not be available later for retrieval, write (?:expression).

  • Example:
    • regular_expression = re.compile(r"&price=(?P<Price>.*)&quantity=(?P<Quantity>.*)")
    • math_object = regular_expression.search(query_string)
    • We could access the price value by math_object.group(1) or math_object.group('Price'). For the quantity, it would be group(2) or group('Quantity').