YuPcre2 v1.24.0 for Delphi 11-12 Athens
YuPcre2 v1.24.0 for Delphi 11-12 Athens
YuPcre2 is a library of Delphi components and procedures that implement regular expression pattern matching using the same syntax and semantics as Perl, with just a few differences. There are two matching algorithms, the standard Perl and alternative DFA algorithm:
The Perl algorithm is what you are used to from Perl and j@vascript. It is fast and supports the complete pattern syntax. You will likely be using it most of the time.
DFA is a special purpose algorithm. If finds all possible matches and, in particular, it finds the longest. It never backtracks and supports partial matching better, in particular multi-segment matching of very long subject strings.
YuPcre2 has native interfaces for 8-bit, 16-bit, and 32-bit strings. Component wrappers are available for UnicodeString / WideString and AnsiString / Utf8String / RawBytestring:
The YuPcre2 RegEx2 classes descend from common ancestors which implement the core functionalities:
Match strings and and extract full or substring matches.
Search for regular expressions within streams and memory buffers. TDIRegExSearchStream descendants employ a buffered search within streams and files (of virtually unlimited size) and use little memory.
Replace full matches or partial substrings.
List full matches or partial substrings.
Format full matches or partial substrings by adding static or dynamic text.
Users familiar with the DIRegEx might be interessted in the differences between YuPcre2 and DIRegEx.
Pattern Syntax
YuPcre2 RegEx2 Workbench Application The YuPcre2 regular expression pattern syntax is mostly compatible with Perl. It includes the following:
Quoting
Escaped Characters
Character Types
General Category Properties for \p and \P
PCRE2 Special Category Properties for \p and \P
Script Names for \p and \P
Character Classes
Quantifiers
Anchors and Simple Assertions
Match Point Reset
Alternation
Capturing
Atomic Groups
Comment
Option Setting
Newline Convention
What \R Matches
Lookahead and Lookbehind Assertions
Backreferences
Subroutine References (possibly recursive)
Conditional Patterns
Backtracking Control
Callouts
YuPcre2 RegEx2 String Processing
YuPcre2 can Replace, List, or Format regular expressions matches or any of its substrings, useful for text editors and word processors. Variable portions of the match can be included into the result text. The full match can be referenced by number, substrings also by name. The character to introduce these reference is freely configurable. FormatOptions allow to turn features on or off as required.
Replace returns the original subject string with matches replaced, similar to but more flexible than Delphi's StringReplace() function.
List collects all string matches into a single string. It extracts multiple phone numbers, e-mail addresses, or URLs, with a single call.
YuPcre2 RegEx2 MaskControls
The YuPcre2 RegEx2 MaskControls Demo ApplicationYuPcre2 includes two regular expression mask edits: TDIRegEx2MaskEdit and TDIRegEx2ComboBox. Both controls validate keyboard input against a regular expression. They work similar to Delphi's TMaskEdit, but more flexible and powerful.
The regular expression mask edits can:
accept / reject specific characters at determined positions;
allow / reject particular characters if they follow defined character(s);
restrict input text to begin / end with exact character(s);
flag incomplete text to show that more input is needed.
Examples: Numbers, number ranges, dates, phone numbers, e-mail addresses, URLs, currency, and more.
Workbench Application
The YuPcre2 RegEx2 Workbench helps to design and test regular expressions. It allows to set options, measure execution times, and to save and load settings for later use.
The YuPcre2 RegEx2 Workbench is available as
Design-Time Component Editor and
Standalone Application.
Add the YuRegularExpressions unit with a new implementation of the TRegEx record wrapper for all Delphi versions with advanced record helpers. It is mostly interface-compatible with the Delphi implementation, updates Delphi's outdated PCRE engine to the latest PCRE2, includes multiple function improvements, and fixes many bugs which are present in all Delphi versions, up to the most recent.
Update to PCRE2 v10.45. This is a comparatively large release, incorporating new features, some bugfixes, and a few changes with slight pattern backwards compatibility implications:
Add a new feature called scan substring. This is a new type of assertion which matches the content of a capturing block to a sub-pattern.
Add support for Perl-style extended character classes, using the syntax (?[…]). This also allows expressing subtractions and intersections of character classes, but using a different syntax to UTS#18.
Add support for UTS#18 compatible character classes, using the new option PCRE2_ALT_EXTENDED_CLASS. This adds '[' as a metacharacter within character classes and the operators '&&', '--' and '~~', allowing subtractions and intersections of character classes to be easily expressed.
Significant improvements to the character class match engine. Compiled character classes are now more compact, and have faster matching for large or complex character sets, using binary search through the set.
New options:
PCRE2_EXTRA_NO_BS0 Disallow \0 as an escape for the #0 character.
PCRE2_EXTRA_PYTHON_OCTAL Use Python disambiguation rules for deciding whether \12 is a backreference or an octal escape.
PCRE2_EXTRA_NEVER_CALLOUT Disable callout syntax entirely.
PCRE2_EXTRA_TURKISH_CASING Use Turkish rules for case-insensitive matching.
Add new API function pcre2_set_optimize for controlling which optimizations are enabled.
A variety of extensions have been made to pcre2_substitute and its syntax for replacement strings. These now support:
\123 octal escapes.
titlecasing \u\L.
\1 backreferences.
\g<1> and $ backreferences.
$&, $`, $', and $_.
New function pcre2_set_substitute_case_callout to allow locale-aware case transformation.
Update Unicode support to UCD 16.
Case-insensitive matching of Unicode properties Ll, Lt, and Lu has been changed to match Perl. Previously, \p{Ll} would match only lower-case characters (even if case-insensitive matching was specified). This also affects case-insensitive matching of POSIX classes such as [:lower:].
Case-insensitive matching of backreferences now respects the PCRE2_EXTRA_CASELESS_RESTRICT option.
Parsing of the \x escape is stricter, and is no longer parsed as an escape for the #0 character if not followed by '{' or a hexadecimal digit. Use \x00 instead.
JIT compilation now fails with the new error code PCRE2_ERROR_JIT_UNSUPPORTED for patterns which use features not supported by the JIT compiler.