YuStemmer 4.0.0 Full Source » Developer.Team

YuStemmer 4.0.0 Full Source

YuStemmer 4.0.0 Full Source
YuStemmer 4.0.0 Full Source | 8 Mb


YuStemmer is a natural language stemming library for 15 languages. It reduces an inflected word to a common root form. YuStemmer is algorithmic, which makes it small and fast. Word stemming is usually applied to query and search systems. It enables them to return related results with similar meaning but slightly different spelling. As an example, the English stemmer returns “write” for “write”, “writes”, “writing”, and “writings”.

Stemmers are available for these languages:

Danish, Dutch, English, Finnish, French, German, German2 (different Umlaut handling), Hungarian, Italian, Norwegian, Porter (English), Portuguese, Romanian, Russian, Spanish, Swedish, Turkish.
YuStemmer is fully algorithmic. No extensive lookup dictionaries are needed. This results in small memory footprint and excellent performance.

YuStemmer was initially developed for the DISQLite3 Full Text Search (FTS) engine which is prepared to use it out of the box. Besides that, YuStemmer fits many other purposes.

YuStemmer is organized into different classes, each of them optimized for a particular string type and text encoding:

TYuStemmer class:
ANSI text, 8-bit, usually in ISO-8859-1, unless otherwise noted.
Some stemmers might not be available for this class.
TYuStemmer_8 class:
UTF-8 text, 8-bit.
All stemmers are available for this class.
TYuStemmer_16 class:
UTF-16 text, 16-bit. This corresponds to Delphi's WideString / UnicodeString.
All stemmers are available for this class.
Make sure to choose the stemmer class matching your string type and character set. Otherwise you will suffer a performance penalty caused by avoidable string conversions. In Delphi, such conversions usually happen implicitly and go unnoticed by most developers. Therefore, pay close attention here to make the most of YuStemmer!

YuStemmer 4.0.0 – 3 Apr 2017
Support Delphi 10.2 Tokyo Win32 and Win64.
New stemmers:
Arabic: TYuStemmer_Arabic_8, TYuStemmer_Arabic_16.
Kraaij Pohlmann (Dutch): TYuStemmer_Kraaij_Pohlmann, TYuStemmer_Kraaij_Pohlmann_8, TYuStemmer_Kraaij_Pohlmann_16.
Latin: TYuStemmer_Latin, TYuStemmer_Latin_8, TYuStemmer_Latin_16.
Lovins (English): TYuStemmer_Lovins, TYuStemmer_Lovins_8, TYuStemmer_Lovins_16.
Slovene: TYuStemmer_Slovene_8, TYuStemmer_Slovene_16.
Tamil: TYuStemmer_Tamil_8, TYuStemmer_Tamil_16.
Fix TYuStemmer_Czech_8 and TYuStemmer_Czech_16 to handle Unicode properly.
Portuguese stemmer fix: Replace Spanish suffixes with Portuguese ones.
Greately expand test cases.


[/b]

[b] Only for V.I.P
Warning! You are not allowed to view this text.
SiteLock