DIHtmlParser v8.0.0 for Delphi 10.3 Rio Cracked

DIHtmlParser v8.0.0 for Delphi 10.3 Rio Cracked

DIHtmlParser is a component suite to parse, analyze, extract information from, and generate HTML, XHTML, and XML documents for Delphi (Embarcadero, CodeGear, Borland).

Overview
Full Unicode support (UnicodeString or WideString, depending on Delphi version).
Reads and writes over 70 character sets natively (independent of the OS). More than 150 are supported with the help of DIConverters.
Operates on TStreams, memory buffers or strings.
Returns a single piece of HTML to the application at a time.
Extends easily via the TDIHtmlParserPlugin interface.

Recognized HTML Pieces
DIHtmlParser recognizes 10 pieces of HTML plus 4 pieces of Non-HTML.

The HTML pieces are:

CData Sections: CData Sections, found in XML, are used to escape blocks of text containing characters which would otherwise be recognized as markup. A CData section begins with <![CDATA[ and ends with ]]>.
Comments: The Comments' contents are returned readily stripped of the comment markers. A comment starts with <!вЂ“ and ends with вЂ“>.
Document Type Definitions: A Document Type Definition defines the syntax of markup constructs. It begins with <!DOCTYPE and ends with >.
HTML Processing Instructions: HTML Processing Instructions are a mechanism to capture platform-specific idioms. They start with <? and end with >.
HTML-Tags: HTML-Tags are readily parsed into Name, Attributes and Values. DIHtmlParser recognizes Start Tags, End Tags and Empty Element Tags. Example: <TagName Attribute=вЂњValueвЂќ />.
Scripts: DIHtmlParser returns the contents between the <SCRIPT> and </SCRIPT> tags as simple text. The surrounding HTML tags are reported separately.
Styles: DIHtmlParser returns the contents between the <STYLE> and </STYLE> tags as simple text. The surrounding HTML tags are reported separately.
Text: Text is everything which is not markup. If the NormalizeWhiteSpace option is enabled, DIHtmlParser reduces multiple white space to a single character. Preformatted text wrapped by <PRE> and </PRE> is never normalized.
Titles: DIHtmlParser returns the contents between the <TITLE> and </TITLE> tags as simple text. Titles are not normal text because they are parsed differently.
XML Processing Instructions: XML Processing Instructions are similar to the HTML Processing Instructions with a slightly different syntax: They begin with <?XML and end with ?>.
The Non-HTML pieces are:

Active Server Pages (ASP): Active Server Page markup is often used to enclose scripting macros. It begins with <% and runs up to %>.
Custom-Tags: Custom Tags are similar to HTML-Tags and to what Delphi's Help calls Transparent Tags. For DIHtmlParser, a Custom-Tags' name must begin with a user-define start character just as # like in <#Name Attribute=вЂњValueвЂќ />.
PHP: PHP is a powerful and popular scripting language. Its markup begins with <?PHP and ends with ?>.
Server Side Includes (SSI): SSI, an extension of the Apache Web Server, starts with <!вЂ“# and continues up to вЂ“>. It allows to insert include files and other data into HTML documents on the fly.

Parsing Efficiency

DIHtmlParser is extremely fast, especially when parsing huge files. Thanks to the internal buffer mechanism, it does not need to load the entire file into memory at once but can read one small chunk after the other at a single time only. DIHtmlParser parses up to 50 000 tags per second even with an outdated 166 MHz processor. On modern machines the score goes up to more than 15 MB of HTML data per second.

DIHtmlParser only parses what it needs to parse. Thanks to its filtering mechanism, the parser can skip all pieces of HTML which the application did not request. Even though the parser must eventually touch each single character of a HTML document, it might only need to store a fraction of that data for further processing. We call this вЂњSmart ParsingвЂќ, as not storing unnecessary data is one of the greatest time savers.

Another trick of вЂњSmart ParsingвЂќ is to convert relevant tag and attribute strings into ordinal number IDs. As a result, the parser never needs to compare lengthy strings consisting of many characters but can easily get away with one simple number comparison instead. This improves performance and reduces processor load. Your own coding benefits from this technique, too, as tag and attribute IDs are part of the DIHtmlParser interface.

Individual Tag Filtering

Tag filtering forwards the general filtering to individual tags. It enables the programmer to instruct the parser to hold back all tags which are not relevant to the application. Why bother with <TABLE> tags if you are only interested in the images of a HTML document? Instead of having the application check each tag for an <IMG> tag, simply instruct the parser only to report <IMG> tags in the first place. This allows DIHtmlParser to optimize its parsing and your application no longer has to worry about unwanted tags.

DIHtmlParser 8.0.0 – 8 Oct 2019
Extend character support to the full range of Unicode Code Points from $000000 to $10FFFF.

Up to now, DIHtmlParser stored code points as WideChars. This limited Unicode support to the Basic Multilingual Plane (BMP) from $0000 to $FFFF. Code points from the Supplementary Planes were converted to the $FFFD replacement character. This went well with a great number of languages. But less common scripts did not work, just like the increasingly popular emojis from the Symbols and Pictographs Unicode blocks.

DIHtmlParser 8.0.0 overcomes these limitations and now covers the complete Unicode range. Changes are almost entirely internal and maintain backwards compatibility as much as possible. Existing applications should compile with no or minor changes only. WideChar routines are marked as deprecated and hint at their new complementary UCP routines.

TDIHtmlParser.Data is still a WideChar buffer. However, its contents is now fully UTF-16 encoded. This means that it may contain code points > $FFFF which take up two WideChars (surrogate pairs). As a result, indexed access to the buffer is no longer guaranteed. TDIHtmlParser.Data related methods, like TDIHtmlParser.DataAsStrTrimW are adjusted accordingly.

UnicodeString utility routines are rewritten to handle full UTF-16, including surrogate pairs. Most of them are in DIUtils.pas. YuUtf.pas also contains new utility routines for UTF-16 testing, encoding, and decoding. If possible, string handling routines now take NativeInt type parameters for the buffer length.

Other noteworthy changes:

TDIHtmlParser.UCP complements TDIHtmlParser.Char.
The WideChar property TDIHtmlParser.CustomTagStartChar has new a UCS4Char complement CustomTagStartUcp. The same holds for TDIHtmlWriterPlugin.CustomTagStartChar and CustomTagStartUcp.
TDICustomTag.GetStartCode has a new UCS4Char overload. So do GetEmptyElementCode and GetEndCode.
Changed the type of TDIHtmlParser.StartCol, EndCol, StartLine, EndLine, StartPos, and EndPos from unsigned Cardinal to signed NativeInt.
Removed conditional compilation directives DI_No_Classes and DI_No_Unicode_Component (source code only). TDIHtmlParser and TDIHtmlParserPlugin now always descends from TComponent and the Classes unit is always used. Source code only.
Improve DIUtils.pas Unicode processing to support Unicode Code Points from $000000 to $10FFFF. Adjust remaining source code accordingly.
Update DIUtils.pas Unicode functions to Unicode 12.1.0.
Delphi 4 and Delphi 5 crash when compiling DIUtils.pas. There is no error message, so it is not possible to work around the problem. Support for these compilers is therefore removed. At least Delphi 6 is now required to compile DIHtmlParser.
Remove DI.inc include file. Directly link in DICompilers.inc instead.

Only for V.I.P

Warning! You are not allowed to view this text.

Delphi

iamDeveloper from 31-10-2019, 01:32