mirror of
https://git.postgresql.org/git/postgresql.git
synced 2025-02-17 19:30:00 +08:00
Docs: add an explicit example about controlling overall greediness of REs.
Per discussion of bug #13538.
This commit is contained in:
parent
3bdd7f90fc
commit
1b5d34ca62
@ -5203,10 +5203,37 @@ SELECT SUBSTRING('XY1234Z', 'Y*?([0-9]{1,3})');
|
||||
The quantifiers <literal>{1,1}</> and <literal>{1,1}?</>
|
||||
can be used to force greediness or non-greediness, respectively,
|
||||
on a subexpression or a whole RE.
|
||||
This is useful when you need the whole RE to have a greediness attribute
|
||||
different from what's deduced from its elements. As an example,
|
||||
suppose that we are trying to separate a string containing some digits
|
||||
into the digits and the parts before and after them. We might try to
|
||||
do that like this:
|
||||
<screen>
|
||||
SELECT regexp_matches('abc01234xyz', '(.*)(\d+)(.*)');
|
||||
<lineannotation>Result: </lineannotation><computeroutput>{abc0123,4,xyz}</computeroutput>
|
||||
</screen>
|
||||
That didn't work: the first <literal>.*</> is greedy so
|
||||
it <quote>eats</> as much as it can, leaving the <literal>\d+</> to
|
||||
match at the last possible place, the last digit. We might try to fix
|
||||
that by making it non-greedy:
|
||||
<screen>
|
||||
SELECT regexp_matches('abc01234xyz', '(.*?)(\d+)(.*)');
|
||||
<lineannotation>Result: </lineannotation><computeroutput>{abc,0,""}</computeroutput>
|
||||
</screen>
|
||||
That didn't work either, because now the RE as a whole is non-greedy
|
||||
and so it ends the overall match as soon as possible. We can get what
|
||||
we want by forcing the RE as a whole to be greedy:
|
||||
<screen>
|
||||
SELECT regexp_matches('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');
|
||||
<lineannotation>Result: </lineannotation><computeroutput>{abc,01234,xyz}</computeroutput>
|
||||
</screen>
|
||||
Controlling the RE's overall greediness separately from its components'
|
||||
greediness allows great flexibility in handling variable-length patterns.
|
||||
</para>
|
||||
|
||||
<para>
|
||||
Match lengths are measured in characters, not collating elements.
|
||||
When deciding what is a longer or shorter match,
|
||||
match lengths are measured in characters, not collating elements.
|
||||
An empty string is considered longer than no match at all.
|
||||
For example:
|
||||
<literal>bb*</>
|
||||
|
Loading…
Reference in New Issue
Block a user