Ansicht
Dokumentation

ABENREGEX_PCRE_SYNTAX_SPECIALS - REGEX PCRE SYNTAX SPECIALS

BAL Application Log Documentation General Data in Customer Master
This documentation is copyright by SAP AG.

- Special Characters

The following tables summarize the special characters in PCRE regular expressions.

Note

See also PCRE2 documentation pcre2syntax man page.

Pattern Syntax

Quoting

Syntax	Description
\x	handle x as a literal if x has no special meaning
\Q...\E,	handle enclosed characters as literal

Escaped Characters

Syntax	Description
\a	alarm (BEL character, 0x07)
\cx	control-x, where x is any ASCII printing character
\e	escape (0x1B)
\f	form feed (0x0C)
\n	line feed (by default 0x0A; depends on the active line-feed-mode)
\r	carriage return (0x0D)
\t	tab (0x09)
\0dd	character with octal code 0dd
\ddd	character with octal code ddd, or backreference
\o{ddd..}	character with octal code ddd..
\N{U+hh..}	character with Unicode code point hh..
\xhh	character with hex code hh
\x{hh..}	character with hex code hh..

Character Types

Syntax	Description
.	any character except line feed (unless in dotall mode, then any character)
\C	one code unit; only allowed in regular expressions created with the class CL_ABAP_REGEX with UNICODE_HANDLING set to RELAXED, as it could partially match UTF-16 characters otherwise
\d	a digit (respecting Unicode character properties)
\D	a character that is not a digit
\h	a horizontal white space character (respecting Unicode character properties)
\H	a character that is not a horizontal white space character
\N	a character that is not a line feed (depends on the active line-feed-mode)
\p{xx}	a character with the xx Unicode character property (see below)
\P{xx}	a character without the xx Unicode character property (see below)
\R	a line feed sequence; by default matches any Unicode line feed sequence
\s	a white space character (respecting Unicode character properties)
\S	a character that is not a white space character
\v	a vertical white space character (respecting Unicode character properties)
\V	a character that is not a vertical white space character
\w	a "word" character (respecting Unicode character properties)
\W	a "non-word" character
\X	a Unicode extended grapheme cluster

General Categories for Properties \p and \P

Based on the general categories as defined by the Unicode standard.

Category Identifier	Description
C	Other
Cc	Control
Cf	Format
Cn	Unassigned
Co	Private use
Cs	Surrogate
L	Letter
Ll	Lower case letter
Lm	Modifier letter
Lo	Other letter
Lt	Title case letter
Lu	Upper case letter
L&	Ll, Lu, or Lt
M	Mark
Mc	Spacing mark
Me	Enclosing mark
Mn	Non-spacing mark
N	Number
Nd	Decimal number
Nl	Letter number
No	Other number
P	Punctuation
Pc	Connector punctuation
Pd	Dash punctuation
Pe	Close punctuation
Pf	Final punctuation
Pi	Initial punctuation
Po	Other punctuation
Ps	Open punctuation
S	Symbol
Sc	Currency symbol
Sk	Modifier symbol
Sm	Mathematical symbol
So	Other symbol
Z	Separator
Zl	Line separator
Zp	Paragraph separator
Zs	Space separator

Script Names for \p and \P

Adlam

Ahom

Anatolian_Hieroglyphs

Arabic

Armenian

Avestan

Balinese

Bamum

Bassa_Vah

Batak

Bengali

Bhaiksuki

Bopomofo

Brahmi

Braille

Buginese

Buhid

Canadian_Aboriginal

Carian

Caucasian_Albanian

Chakma

Cham

Cherokee

Chorasmian

Common

Coptic

Cuneiform

Cypriot

Cyrillic

Deseret

Devanagari

Dives_Akuru

Dogra

Duployan

Egyptian_Hieroglyphs

Elbasan

Elymaic

Ethiopic

Georgian

Glagolitic

Gothic

Grantha

Greek

Gujarati

Gunjala_Gondi

Gurmukhi

Hangul

Hanifi_Rohingya

Hanunoo

Hatran

Hebrew

Hiragana

Imperial_Aramaic

Inherited

Inscriptional_Pahlavi

Inscriptional_Parthian

Javanese

Kaithi

Kannada

Katakana

Kayah_Li

Kharoshthi

Khitan_Small_Script

Khmer

Khojki

Khudawadi

Latin

Lepcha

Limbu

Linear_A

Linear_B

Lisu

Lycian

Lydian

Mahajani

Makasar

Malayalam

Mandaic

Manichaean

Marchen

Masaram_Gondi

Medefaidrin

Meetei_Mayek

Mende_Kikakui

Meroitic_Cursive

Meroitic_Hieroglyphs

Miao

Modi

Mongolian

Multani

Myanmar

Nabataean

Nandinagari

New_Tai_Lue

Newa

Nushu

Nyakeng_Puachue_Hmong

Ogham

Ol_Chiki

Old_Hungarian

Old_Italic

Old_North_Arabian

Old_Permic

Old_Persian

Old_Sogdian

Old_South_Arabian

Old_Turkic

Oriya

Osage

Osmanya

Pahawh_Hmong

Palmyrene

Pau_Cin_Hau

Phags_Pa

Phoenician

Psalter_Pahlavi

Rejang

Runic

Samaritan

Saurashtra

Sharada

Shavian

Siddham

SignWriting

Sinhala

Sogdian

Sora_Sompeng

Soyombo

Sundanese

Syloti_Nagri

Syriac

Tagalog

Tagbanwa

Tai_Le

Tai_Tham

Tai_Viet

Takri

Tamil

Tangut

Telugu

Thaana

Thai

Tibetan

Tifinagh

Tirhuta

Ugaritic

Wancho

Warang_Citi

Yezidi

Zanabazar_Square

Character Classes

Syntax	Description
[...]	positive character class
[^...]	negative character class
[x-y]	range
[[:xxx:]]	positive POSIX named set (see below)
[[:^xxx:]]	negative POSIX named set (see below)

Names for POSIX Named Sets

Syntax	Description
alnum	alphanumeric
alpha	alphabetic
ascii	0-127
blank	space or tab
cntrl	control character
digit	decimal digit
graph	printing, excluding space
lower	lower case letter
print	printing, including space
punct	printing, excluding alphanumeric
space	white space
upper	upper case letter
word	same as \w
xdigit	hexadecimal digit

POSIX named sets also make use of Unicode character properties if applicable.

Quantifiers

Syntax	Description
?	0 or 1, greedy
?+	0 or 1, possessive
??	0 or 1, lazy
*	0 or more, greedy
*+	0 or more, possessive
*?	0 or more, lazy
+	1 or more, greedy
++	1 or more, possessive
+?	1 or more, lazy
{n}	exactly n
{n,m}	at least n, no more than m, greedy
{n,m}+	at least n, no more than m, possessive
{n,m}?	at least n, no more than m, lazy
{n,}	n or more, greedy
{n,}+	n or more, possessive
{n,}?	n or more, lazy

Anchors and Basic Assertions

Syntax	Description
\b	word boundary
\B	not a word boundary
^	start of subject (also after an internal line feed, that is a line feed that does not occur at the end of the subject, in multiline mode)
\A	start of subject (if matching on a subject is done with a starting offset, \A can never match)
$	end of subject and before a line feed at the end of the subject (also before line feed in multiline mode)
\Z	end of subject and before a line feed at the end of the subject
\z	end of subject
\G	first matching position in subject (true if the current matching position is at the start point of the matching process, which may differ from the start of the subject e.g. if a starting offset is specified)

Reported Match Point Setting

Syntax	Description
\K	set reported start of match; e.g. the regex foo\Kbar matches foobar but reports that it has matched only bar

\K is respected in positive assertions, but ignored in negative ones.

Alternation

Syntax	Description
\|	start of alternative branch

Grouping and Capturing

Syntax	Description
(...)	capture group
(?<name>...)	named capture group (Perl style)
(?'name'...)	named capture group (Perl style)
(?P?<name>...)	named capture group (Python style)
(?:...)	non-capture group
(?\|...)	non-capture group; reset group numbers for capture groups in each alternative
(?>...)	atomic non-capture group
(*atomic:...)	atomic non-capture group

A name must not start with a digit. Unicode names are allowed.

Comments

Syntax	Description
(?#...)	comment (cannot be nested)
#...	extended mode: comment

In extended mode, an unescaped # introduces a comment which in this case continues to immediately after the next line feed character or character sequence in the pattern. This has to be a literal line feed character or character sequence, escape sequences that happen to represent a line feed like \n do not count.

Option Setting

Syntax	Description
(?i)	caseless / case-insensitive search
(?J)	allow duplicate named groups
(?m)	multiline mode
(?n)	no auto capture
(?s)	single line mode (dotall)
(?U)	default ungreedy quantifiers (lazy)
(?x)	extended mode: ignore white space except in classes
(?xx)	same as (?x) but also ignore space and tab in classes
(?-...)	unset option(s)
(?^)	unset i, m, n, s and x options

Changes of these options within a group are automatically cancelled at the end of the group.

Several options may be set at once, and a mixture of setting and unsetting such as (?i-x) is allowed, but there may be only one hyphen. Setting (but no unsetting) is allowed after (?^ for example (?^in). An option setting may appear at the start of a non-capture group, for example (?i:...).

Special Control Verbs

Special control verbs are only recognized at the very start of a pattern.

Syntax	Description
(*LIMIT_DEPTH=d)	set the backtracking limit to d
(*LIMIT_HEAP=d)	set the heap size limit to d * 1024 bytes
(*LIMIT_MATCH=d)	set the match limit to d
(*NOTEMPTY)	lock out matching of empty strings entirely
(*NOTEMPTY_ATSTART)	lock out matching of empty strings at the start of the subject
(*NO_AUTO_POSSES)	prevents quantifiers from automatically being made possessive when what follows cannot match the repeated item (e.g. by default, a+b is handled as a++b as an optimization)
(*NO_DOTSTAR_ANCHOR)	disable optimizations that apply to patterns whose top-level branches start with .*
(*NO_JIT)	do not JIT-compile this pattern
(*NO_START_OPT)	disable several optimizations for quickly reaching a "no match" result; this can be useful if you want callouts or backtracking control verbs to be executed in any case
(*UTF)	enable UTF-mode; this verb cannot be used in regular expressions created with the class CL_ABAP_REGEX with UNICODE_HANDLING set to RELAXED, as it would clash with usages of \C
(*UCP)	enable usage of Unicode character properties; for ABAP regular expressions this option is already enabled by default

The following special control verbs control the line break convention, i.e. what gets recognized as a line feed character. They do not affect \R:

Syntax	Description
(*CR)	carriage return only
(*LF)	line feed only
(*CRLF)	carriage return followed by line feed
(*ANYCRLF)	all three of the above
(*ANY)	any Unicode line feed sequence
(*NUL)	the NUL character (binary zero)

The following special control verbs control what \R matches:

Syntax	Description
(*BSR_ANYCRLF)	CR, LF and CRLF
(*BSR_UNICODE)	any Unicode line feed sequence

Look-ahead and Look-behind Assertions

Syntax	Description
(?=...)	positive look-ahead
(*pla:...)	positive look-ahead
(*positive_look-ahead:...)	positive look-ahead
(?!...)	negative look-ahead
(*nla:...)	negative look-ahead
(*negative_look-ahead:...)	negative look-ahead
(?<=...)	positive look-behind
(*plb:...)	positive look-behind
(*positive_look-behind:...)	positive look-behind
(?<!...)	negative look-behind
(*nlb:...)	negative look-behind
(*negative_look-behind:...)	negative look-behind

Each top-level branch of a look-behind must be of fixed length.

Non-Atomic Look-around Assertions

Syntax	Description
(?*...)	non-atomic positive look-ahead
(*napla:...)	non-atomic positive look-ahead
(*non_atomic_positive_look-ahead:...)	non-atomic positive look-ahead
(?<*...)	non-atomic positive look-behind
(*naplb:...)	non-atomic positive look-behind
(*non_atomic_positive_look-behind:...)	non-atomic positive look-behind

Backreferences

Syntax	Description
\n	reference by number n (can be ambiguous, see octal escapes)
\gn	reference by number n
\g{n}	reference by number n
\g+n	relative reference by number n
\g-n	relative reference by number n
\g{+n}	relative reference by number n
\g{-n}	relative reference by number n
\k?<name>	reference by name (Perl style)
\k'name'	reference by name (Perl style)
\g{name}	reference by name (Perl style)
\k{name}	reference by name (.NET style)
(?P=name)	reference by name (Perl style)

Subroutine References

Syntax	Description
(?R)	recurse whole pattern
(?n)	call subroutine by absolute number n
\g<n>	call subroutine by absolute number n (Oniguruma style)
\g'n'	call subroutine by absolute number n (Oniguruma style)
(?+n)	call subroutine by relative number n
(?-n)	call subroutine by relative number n
\g<+n>	call subroutine by relative number n
\g'+n'	call subroutine by relative number n
\g<-n>	call subroutine by relative number n
\g'-n'	call subroutine by relative number n
(?&name)	call subroutine by name (Perl style)
(?P>name)	call subroutine by name (Python style)
\g?<name>	call subroutine by name (Oniguruma style)
\g'name'	call subroutine by name (Oniguruma style)

Subroutine calls can be recursive. Left recursion is not possible however.

Parsing with PCRE Regular Expression

Conditional Patterns

Syntax	Description
(?(condition)yes-pattern)	match the yes-pattern if the condition holds
(?(condition)yes-pattern\|no-pattern)	match the yes-pattern if the condition holds, otherwise match the false-pattern

Where condition can be one of the following or any other assertion like a look-ahead or look-behind assertion:

Syntax	Description
n	absolute number n reference condition
+n	relative number n reference condition
-n	relative number n reference condition
?<name>	named reference condition (Perl style)
'name'	named reference condition (Perl style)
R	overall recursion condition
Rn	specific number n group recursion condition
R&name	specific named group recursion condition
DEFINE	define groups for reference (always yields false)
VERSION[>]=n.m	test PCRE2 version

Backtracking Control

The following backtracking control verbs are triggered immediately when they are reached:

Syntax	Description
(*ACCEPT)	force successful match;\lbr if triggered inside a group that is called as a subroutine, only the group is ended successfully;\lbr if triggered inside a positive assertion, the assertion succeeds;\lbrif triggered inside a negative assertion, the assertion fails
(*FAIL)	force backtrack
(*F)	force backtrack
(?!)	force backtrack; actually a negative look-ahead for the empty string, which always matches thus always failing the look-ahead; (FAIL) and (F) are synonyms for (?!)
(*MARK:NAME)	mark a position with name NAME, see (*SKIP:NAME) below; synonym (:NAME); NAME can contain any sequence of characters that does not include the closing parenthesis; an empty name will cause the mark to have no effect

The following backtracking control verbs are only triggered when a subsequent match failure causes a backtrack to reach them. All of them force a match failure, but differ in what happens afterwards:

Syntax	Description
(*COMMIT)	overall failure, no advance of starting point
(*PRUNE)	advance to next starting character; only applies if the pattern is not anchored, otherwise behaves the same
(*SKIP)	advance to current matching position
(*SKIP:NAME)	advance to position corresponding to an earlier (MARK:NAME); if not found, the (SKIP) is ignored
(*THEN)	local failure, backtrack to next alternation

If one of these verbs is located in a group called as a subroutine, its effects are confined to the subroutine call.

Callouts

Syntax	Description
(?C)	callout (assumed number 0)
(?Cn)	callout with numerical data n
(?C"text")	callout with string data text

PCRE Regular Expression with Callouts

Replacement Syntax

Capture Group Substitution

Syntax	Description
$id	substitute for the content of the capture group identified by id, $0 being the content of the whole match; id can either be a number referring to a capture group or the name of a named capture group
\dollar{id}	substitute for the content of the capture group identified by id, \dollar{0} being the content of the whole match; id can either be a number referring to a capture group or the name of a named capture group

When referring to a capture group that is not set (i.e. was not participating) in the match, the empty string will be substituted.

When referring to a capture group that does not exist, it is assumed unset and thus the empty string is substituted.

Conditional Substitution

Syntax	Description
{id:+matched:unmatched}	substitute for matched if the capture group identified by id was set in the match, otherwise for unmatched; id can either be a number referring to a capture group or the name of a named capture group
{id:-default}	substitute for the content of the capture group identified by id if said capture group was set in the match, otherwise for default; shorthand for {id:+\dollar{id}:default}; id can either be a number referring to a capture group or the name of a named capture group

When referring to a capture group that is not set (i.e. was not participating) in the match, the empty string will be substituted.

When referring to a capture group that does not exist, it is assumed unset and thus the empty string is substituted.

Case Conversion

Syntax	Description
\u	the first character after \u that is inserted into the replacement text is converted to uppercase
\U	all characters after \U up to the next \L or \E that are inserted into the replacement text are converted to uppercase
\l	the first character after \l that is inserted into the replacement text is converted to lowercase
\L	all characters after \L up to the next \U or \E that are inserted into the replacement text are converted to lowercase
\E	terminates the current upper- or lowercase transformation

Substituting Special Characters

Syntax	Description
\t	insert a tab (0x09)
\r	insert a carriage return (0x0D)
\n	insert a line feed
\f	insert a form feed (0x0C)
\xhh	insert the character with hex code hh
\x{hh..}	insert the character with hex code hh..

Quoting

Syntax	Description
\\	insert a literal \
\$	insert a literal $
\x	if \x has no special meaning, insert a literal x

PERFORM Short Reference General Data in Customer Master
This documentation is copyright by SAP AG.

Length: 79625 Date: 20240425 Time: 064640 sap01-206 ( 343 ms )

Ansicht Dokumentation

ABENREGEX_PCRE_SYNTAX_SPECIALS - REGEX PCRE SYNTAX SPECIALS

- Special Characters

Note

Pattern Syntax

Quoting

Escaped Characters

Character Types

Character Classes

Quantifiers

Anchors and Basic Assertions

Reported Match Point Setting

Alternation

Grouping and Capturing

Comments

Option Setting

Special Control Verbs

Look-ahead and Look-behind Assertions

Non-Atomic Look-around Assertions

Backreferences

Subroutine References

Conditional Patterns

Backtracking Control

Callouts

Replacement Syntax

Capture Group Substitution

Conditional Substitution

Case Conversion

Substituting Special Characters

Quoting

Ansicht
Dokumentation