YuPcre2 1.8.0 D7-XE10.2 » Developer.Team

YuPcre2 1.8.0 D7-XE10.2

YuPcre2 1.8.0 D7-XE10.2
YuPcre2 1.8.0 D7-XE10.2


YuPcre2 is a library of Delphi components and procedures that implement regular expression pattern matching using the same syntax and semantics as Perl, with just a few differences. There are two matching algorithms, the standard Perl and alternative DFA algorithm:

The Perl algorithm is what you are used to from Perl and j@vascript. It is fast and supports the complete pattern syntax. You will likely be using it most of the time.
DFA is a special purpose algorithm. If finds all possible matches and, in particular, it finds the longest. It never backtracks and supports partial matching better, in particular multi-segment matching of very long subject strings.

YuPcre2 has native interfaces for 8-bit, 16-bit, and 32-bit strings. Component wrappers are available for UnicodeString / WideString and AnsiString / Utf8String / RawBytestring:

The YuPcre2 RegEx2 classes descend from common ancestors which implement the core functionalities:

Match strings and and extract full or substring matches.
Search for regular expressions within streams and memory buffers. TDIRegExSearchStream descendants employ a buffered search within streams and files (of virtually unlimited size) and use little memory.
Replace full matches or partial substrings.
List full matches or partial substrings.
Format full matches or partial substrings by adding static or dynamic text.

Users familiar with the DIRegEx might be interessted in the differences between YuPcre2 and DIRegEx.

Pattern Syntax

YuPcre2 RegEx2 Workbench Application The YuPcre2 regular expression pattern syntax is mostly compatible with Perl. It includes the following:

Quoting
Escaped Characters
Character Types
General Category Properties for \p and \P
PCRE2 Special Category Properties for \p and \P
Script Names for \p and \P
Character Classes
Quantifiers
Anchors and Simple Assertions
Match Point Reset
Alternation
Capturing
Atomic Groups
Comment
Option Setting
Newline Convention
What \R Matches
Lookahead and Lookbehind Assertions
Backreferences
Subroutine References (possibly recursive)
Conditional Patterns
Backtracking Control
Callouts

YuPcre2 RegEx2 String Processing

YuPcre2 can Replace, List, or Format regular expressions matches or any of its substrings, useful for text editors and word processors. Variable portions of the match can be included into the result text. The full match can be referenced by number, substrings also by name. The character to introduce these reference is freely configurable. FormatOptions allow to turn features on or off as required.

Replace returns the original subject string with matches replaced, similar to but more flexible than Delphi's StringReplace() function.
List collects all string matches into a single string. It extracts multiple phone numbers, e-mail addresses, or URLs, with a single call.

YuPcre2 RegEx2 MaskControls

The YuPcre2 RegEx2 MaskControls Demo ApplicationYuPcre2 includes two regular expression mask edits: TDIRegEx2MaskEdit and TDIRegEx2ComboBox. Both controls validate keyboard input against a regular expression. They work similar to Delphi's TMaskEdit, but more flexible and powerful.

The regular expression mask edits can:

accept / reject specific characters at determined positions;
allow / reject particular characters if they follow defined character(s);
restrict input text to begin / end with exact character(s);
flag incomplete text to show that more input is needed.

Examples: Numbers, number ranges, dates, phone numbers, e-mail addresses, URLs, currency, and more.

Workbench Application

The YuPcre2 RegEx2 Workbench helps to design and test regular expressions. It allows to set options, measure execution times, and to save and load settings for later use.

The YuPcre2 RegEx2 Workbench is available as

Design-Time Component Editor and
Standalone Application.

YuPcre2 1.8.0

Add new pcre2_config options: PCRE2_CONFIG_NEVER_BACKSLASH_C and PCRE2_CONFIG_COMPILED_WIDTHS.
Defined public names for all the pcre2_compile error numbers.
When an assertion contained (*ACCEPT) it caused all open capturing groups to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to misbehaviour for subsequent references to groups that started outside the assertion. ACCEPT in an assertion now closes only those groups that were started within that assertion.
Although pcre2_jit_match checks whether the pattern is compiled in a given mode, it was also expected that at least one mode is available. This is fixed and pcre2_jit_match returns with PCRE2_ERROR_JIT_BADOPTION when the pattern is not optimized by JIT at all.
If a backreference with a minimum repeat count of zero was first in a pattern, apart from assertions, an incorrect first matching character could be recorded. For example, for the pattern (?=(a))\1?b, “b” was incorrectly set as the first character of a match.
Characters in a leading positive assertion are considered for recording a first character of a match when the rest of the pattern does not provide one. However, a character in a non-assertive group within a leading assertion such as in the pattern (?=(a))\1?b caused this process to fail. This was an infelicity rather than an outright bug, because it did not affect the result of a match, just its speed. (In fact, in this case, the starting 'a' was subsequently picked up in the study.)
Allocate a single callout block on the stack at the start of pcre2_match and set its never-changing fields once only. Do the same for pcre2_dfa_match.
Save the extra compile options (set in the compile context) with the compiled pattern (they were not previously saved), add PCRE2_INFO_EXTRAOPTIONS to retrieve them.
Added PCRE2_CALLOUT_STARTMATCH and PCRE2_CALLOUT_BACKTRACK bits to a new field callout_flags in callout blocks. The bits are set by pcre2_match, but not by JIT or pcre2_dfa_match. These bits are provided to help with tracking how a backtracking match is proceeding.
When PCRE2_FIRSTLINE without PCRE2_NO_START_OPTIMIZE was used in non-JIT matching (both pcre2_match and pcre2_dfa_match) and the matched string started with the first code unit of a newline sequence, matching failed because it was not tried at the newline.
Code for giving up a non-partial match after failing to find a starting code unit anywhere in the subject was missing when searching for one of a number of code units (the bitmap case) in both pcre2_match and pcre2_dfa_match. This was a missing optimization rather than a bug.
The JIT compiler has been updated.
Avoid pointer overflow for unset captures in pcre2_substring_list_get. This could not actually cause a crash because it was always used in a memcpy() call with zero length.
Auto-possessification at the end of a capturing group was dependent on what follows the group (e.g. (a+)b would auto-possessify the a+) but this caused incorrect behaviour when the group was called recursively from elsewhere in the pattern where something different might follow. Iterators at the ends of capturing groups are no longer considered for auto-possessification if the pattern contains any recursions.


Only for V.I.P
Warning! You are not allowed to view this text.