Ansicht
Dokumentation

ABENREGEX_POSIX_PCRE_IMPROVE - REGEX POSIX PCRE IMPROVE

ABENREGEX_POSIX_PCRE_IMPROVE - REGEX POSIX PCRE IMPROVE

Vendor Master (General Section)   TXBHW - Original Tax Base Amount in Local Currency  
This documentation is copyright by SAP AG.
SAP E-Book

- New Features in PCRE Compared to POSIX

While topic Incompatibilities between POSIX and PCRE deals with incompatibilities and missing features when migrating from POSIX to PCRE, it is also worth taking a look at the vast array of new features PCRE has to offer.

An introduction to some of these features is provided in the following, the list is however far from complete.

Making Use of New Features for Patterns

Lazy Quantifiers

The most obvious downside of POSIX regular expressions in ABAP is the lack of lazy (also known as non-greedy or reluctant) quantifiers.

In PCRE a quantifier can be made lazy by adding a trailing ?::

Description PCRE Syntax
0 or 1, preferred 0 ??
0 or more, as few as possible *?
1 or more, as few as possible +?
at least n, no more than m, as few as possible {n,m}?
at least n, as few as possible {n,}?

Difference between greedy and non-greedy behavior,

Look-behind Assertions

Description PCRE Syntax
positive look-behind assertion; succeeds if the current match position is preceded by the given pattern (?<=...)
negative look-behind assertion; succeeds if the current match position is not preceded by the given pattern (?<!...)

Leading and trailing look-behind assertions, like look-ahead assertions, are not part of the actual match.

Multiline Mode

In some scenarios it is necessary to respect line feeds during matching, e.g. matching something only if it is located at the beginning of a line. PCRE makes this very convenient by providing a large amount of control over the handling of multiple lines in the matching process.

When creating a regular expression using method CREATE_PCRE of system class CL_ABAP_REGEX multi line handling can be controlled by the following parameters:

Parameter Description
DOT_ALL single line mode; if enabled, special character . also matches line feed characters
ENABLE_MULTILINE multi line mode; if enabled, special characters ^ and $ not only match the start and the end of the character string, but also the start and the end of a line respectively; a line is ended by a line feed character
NEWLINE_MODE controls what gets recognized as a line feed character

Despite their names, single line and multi line mode are not mutually exclusive and can be combined.

It is also possible to set these options directly in the pattern, which is especially useful for regular expressions used in statements FIND and REPLACE behind PCRE or in built-in functions behind pcre:

  • Single line mode can be enabled using the option setting syntax (?s) anywhere in the pattern.
  • Multi line mode can be enabled using the option setting syntax (?m) anywhere in the pattern.
  • What gets recognized as a line feed character can be controlled by the following syntax that can only appear at the start of the pattern:
  • (*CR) carriage return only

  • (*CRLF) carriage return followed by linefeed

  • (*ANY) any Unicode line feed sequence

  • (*NUL) the NUL character (binary zero)

While the first regular expression matches only the beginning of the character string, the second one also matches the beginning of new lines that are defined by the syntax \n for line feeds in a string template.

Named Capture Groups

PCRE supports the naming of capture groups, meaning you can assign a name to a capture group, e.g. using the (?<name>...) syntax. You can refer to a named capture group by its name, e.g. in a backreference using the \k<name> syntax.

The regular expression matches the character string. The capture group is used by its name to match further occurrences of the pattern defined for the group.

Subroutine Calls and Recursion

Apart from referring to the content of a group via backreferences, PCRE supports calling groups as subroutines using the (?n) syntax. It is also possible to call a named group as a subroutine, e.g. using the (?&name) syntax.

The example shows the calling of groups as subroutines in three blocks:

  • In the first block, the backreference \1 simply matches whatever the first capture group actually matched most recently, instead of matching all the things the capture group could match.
  • The second block shows, how this behavior can be achieved by calling the group as a subroutine using the (?n) syntax.
  • The third block shows, how by recursing over the whole pattern using the (?R) syntax in one branch of the alternation, it is ensured that there is a balanced but arbitrary number of opening (\() and closing (\)) parentheses to either side of the digits. Note that the pattern makes use of the possessive quantifier ++ that acts the same as + but prevents backtracking into what was matched by the quantifier.

Callouts

Callouts are another powerful feature. It invokes ABAP code from within the pattern during the matching process, passing data from the pattern to the callout routine.

Callouts are achieved with the (?C...). A callout routine cannot only access the numeric data n provided by the callout (?Cn) or the string data str provided by the callout (?C"str"), but also a lot of other properties and information about the current matcher state.

PCRE Regular Expression with Callouts

Making Use of New Features for Replacements

Conditional Substitution

PCRE's conditional substitution syntax allows you to check if a certain capture group did participate in the match, specifying different replacement strings for when it did and did not participate.

Conditional substitutions with {id...}.

Case Conversion

Using the \u, \U, \l and \L modifiers in the PCRE replacement string, the case of the inserted text can be converted to uppercase or lowercase. While \u and \l only affect the first character following them, \U and \L affect all following characters, until a different case conversion modifier or the termination operator \E is reached.

The case conversion syntax can also be combined with conditional substitution.

Replacements with case conversions. The latter two use conditional substitutions.






BAL_S_LOG - Application Log: Log header data   CPI1466 during Backup  
This documentation is copyright by SAP AG.

Length: 13686 Date: 20240328 Time: 113232     sap01-206 ( 158 ms )