mirror of
https://git.lyx.org/repos/lyx.git
synced 2024-11-24 02:35:20 +00:00
f5ae00132a
(cherry picked from commit c3484fa6c8
)
840 lines
28 KiB
Plaintext
840 lines
28 KiB
Plaintext
2018-11-12: Hunspell 1.7.0 release:
|
||
|
||
New features and bug fixes by László Németh, supported by FSF.hu Foundation:
|
||
|
||
- No annoying suggestion times any more, especially in languages with
|
||
compound word handling and complex morphology. By adding balanced
|
||
multi-level time limits, now the guaranteed suggestion time is there
|
||
within half a second, not seconds (nor dozen of seconds or more
|
||
in extreme cases) for longer misspellings, too.
|
||
|
||
- add SPELLML support for run-time dictionary extension with optional
|
||
affixation of user words. See new "Grammar By" feature of
|
||
language-specific user dictionaries of LibreOffice 6.0:
|
||
|
||
News: https://wiki.documentfoundation.org/ReleaseNotes/6.0#.E2.80.9CGrammar_By.E2.80.9D_spell_checking
|
||
|
||
Screencast with English example: https://www.youtube.com/watch?v=EsS3gaBTfOo
|
||
|
||
Screencast with German example: https://www.youtube.com/watch?v=aYVFDqCUb6I
|
||
|
||
- Improved, highly customizable suggestions on level of dictionary words:
|
||
Pronunciations and typical misspellings defined by optional "ph:" fields of
|
||
the dictionary words are used not only in n-gram suggestions, but as
|
||
elements of the REP replacement list getting the highest priority in normal
|
||
suggestions, also giving the best suggestions for short words, too.
|
||
More information: see "ph:" in man 5 hunspell.
|
||
|
||
- Handling multiple word suggestions is much more easier. Like in a
|
||
traditional spelling dictionary, for example, to get the correct suggestion
|
||
"a lot" for the typical misspelling "alot" at the first place, now it's
|
||
enough to put the following line to the dic(tionary) file:
|
||
|
||
a lot
|
||
|
||
- Limit compound overgeneration by dictionary based word pairs:
|
||
Now it's possible to filter bad compound words by listing
|
||
the correct word pairs with space in the dictionary, as in a traditional
|
||
spelling dictionary.
|
||
|
||
- clean-up suggestion:
|
||
|
||
- no n-gram and compound word suggestions, if "good" suggestion
|
||
exists, ie. uppercase, REP, ph: or dictionary word pair suggestions
|
||
|
||
- word pairs are always suggested, if they exist in the dic file
|
||
|
||
- word pairs have top priority in suggestions, and
|
||
these are the only suggestions if there is no other good suggestion.
|
||
|
||
- also dictionary word pairs separated by dash instead of space
|
||
are handled specially in two-word suggestion (depending from the
|
||
language)
|
||
|
||
- limit bad suggestions by improved n-gram suggestion rules:
|
||
|
||
don't suggest capitalized dictionary words for lower
|
||
case misspellings in n-gram suggestions, except
|
||
|
||
- PHONE usage, or
|
||
- in the case of German, where not only proper
|
||
nouns are capitalized, or
|
||
- the capitalized word has special pronunciation
|
||
|
||
and don't suggest if the difference of lengths of misspellings and
|
||
suggestions is 5 or more characters.
|
||
|
||
- Extend dotless i and dotted I rules to Crimean Tatar language
|
||
Allow dotted I in dictionary, and disable bad capitalization of i.
|
||
|
||
- BREAK: extended recursive word breaking algorithm to handle words or
|
||
words with suffixes when they already contain word break characters,
|
||
for example, "e-mail" is a dictionary word with a word break character, and
|
||
it wasn't accepted before in compounds in some languages.
|
||
|
||
- FORBIDDENWORD precedes BREAK: Now it's possible to forbid compound
|
||
forms recognized by BREAK word breaking by adding the bad compounds to
|
||
the dictionary with FORBIDDENWORD flags.
|
||
|
||
- lower limit for "doubletwochars" suggestion algorithm:
|
||
one of the typical misspellings recognized by Hunspell suggestion
|
||
mechanism is the syllable duplication. Along the old pattern
|
||
ABABA -> ABA, for example nutrITITIon -> nutrITIon, now also the
|
||
simpler ABAB -> AB pattern is recognized in non-starting position,
|
||
for example, regretTETEd -> regretTEd.
|
||
|
||
- lower limit for longswapchar and movechar: recognized only max.
|
||
4-character distances to avoid slow and bad suggestions.
|
||
|
||
- fix compound handling for new Hungarian orthography reform
|
||
|
||
- Allow suggestion search for prefix + *two suffixes*:
|
||
Remove artificial performance limit to get correct
|
||
suggestions for relatively simple misspellings in
|
||
Hungarian, etc., when the word form contains prefix
|
||
and both derivative and inflectional suffixes, too:
|
||
|
||
lefikszálása -> lefixálása
|
||
|
||
Improvements for command-line Hunspell:
|
||
|
||
- Remove false alarms during checking OpenDocument (ODF)
|
||
documents by ignoring <text:span> elements. (LibreOffice
|
||
creates a lot of <text:span> elements also within words
|
||
during text reediting, resulted often huge amount of broken
|
||
words before this fix.)
|
||
|
||
- List filenames during filtering multiple files in command-line:
|
||
|
||
Examples:
|
||
|
||
$ hunspell -l *.odt
|
||
a.odt: mispelling
|
||
b.odt: egzample
|
||
|
||
$ hunspell -l -G *.odt
|
||
a.odt: good
|
||
b.odt: words
|
||
|
||
- Dictionary search by option -D doesn't wait for the standard input
|
||
(fixed by Siva Mahadevan)
|
||
|
||
Other improvements:
|
||
|
||
- makealias dictionary compression: add option --minimize-diff
|
||
to reuse free positions of alias lists to create minimal and
|
||
readable diffs for alias compressed dictionaries stored in
|
||
revision control systems, as dictionaries of LibreOffice.
|
||
|
||
- Brazilian-Portuguese translation by Rafael Fontenelle
|
||
|
||
- Catalan translation by robert dot buj at gmail
|
||
|
||
- Minor bug fixes by several contributors, see git log
|
||
|
||
2017-09-03: Hunspell 1.6.2 release:
|
||
- Library changes: no. Same as 1.6.1.
|
||
- Command line tool:
|
||
- Added German translation
|
||
- Fixed bug with wrong output encoding, not respecting system locale.
|
||
|
||
2017-03-25: Hunspell 1.6.1 release:
|
||
- Library changes:
|
||
- Performance improvements in suggest()
|
||
- Fixes regressions for Hungarian related to compounding.
|
||
- Fixes regressions for Korean related to ICONV.
|
||
- Command line tool:
|
||
- Added Tajik translation
|
||
- Fix regarding serching of OOo dicts installed in user folder
|
||
- Manpages:
|
||
- Fix microsoft-cp1251 to cp1251. Dicts should not use the first.
|
||
- Typos.
|
||
|
||
2016-12-22: Hunspell 1.6.0 release:
|
||
- Library changes:
|
||
- Performance improvement in ngsuggest(), suggestions should be faster.
|
||
- Revert MAXWORDLEN to 100 as in 1.3.3 for performance reasons.
|
||
- MAXWORDLEN can be set during build time with -D defines.
|
||
- Fix crash when word with 102 consecutive X is spelled.
|
||
- Command line tool:
|
||
- -D shows all loaded dictionares insted of only the first.
|
||
- -D properly lists all available dictionaries on Windows.
|
||
|
||
2016-11-30: Hunspell 1.5.4 release:
|
||
- Fixes the command COMPOUNDSYLLABLE used in Hungarian dictionary.
|
||
|
||
2016-11-28: Hunspell 1.5.3 release:
|
||
- Removed a #include from hunspell.hxx that was creating trouble
|
||
|
||
2016-11-27: Hunspell 1.5.2 release:
|
||
- Reverted full backward compatibility with 1.4 public API, again
|
||
|
||
2016-11-27: Hunspell 1.5.1 release:
|
||
- Reverted full backward compatibility with 1.4 public API
|
||
|
||
2016-11-18: Hunspell 1.5.0 release:
|
||
- Lot of stability fixes
|
||
- Fixed compilation errors on various systems (Windows, FreeBSD)
|
||
- Small performance improvement compared to 1.4.0
|
||
- The C++ API is updated to use modern C++ types (string, vector).
|
||
Backward compatibility is kept for most of the functions except for
|
||
the following:
|
||
- get_wordchars();
|
||
- get_version();
|
||
- input_conv(string, string);
|
||
- removed get_csconv();
|
||
|
||
2016-04-15: Hunspell 1.4.0 release:
|
||
- various abi changes due to moving away from char* to std::string
|
||
|
||
2014-06-02: Hunspell 1.3.3 release:
|
||
- OpenDocument (ODF and Flat ODF) support (ODF needs unzip program)
|
||
- various bug fixes
|
||
|
||
2011-02-02: Hunspell 1.3.2 release:
|
||
- fix library versioning
|
||
- improved manual
|
||
|
||
2011-02-02: Hunspell 1.3.1 release:
|
||
- bug fixes
|
||
|
||
2011-01-26: Hunspell 1.2.15/1.3 release:
|
||
- new features: MAXDIFF, ONLYMAXDIFF, MAXCPDSUGS, FORBIDWARN, see manual
|
||
- bug fixes
|
||
|
||
2011-01-21:
|
||
- new features: FORCEUCASE and WARN, see manual
|
||
- new options: -r to filter potential mistakes (rare words
|
||
signed by flag WARN in the dictionary)
|
||
- limited and optimized suggestions
|
||
|
||
2011-01-06: Hunspell 1.2.14 release:
|
||
- bug fix
|
||
2011-01-03: Hunspell 1.2.13 release:
|
||
- bug fixes
|
||
- improved compound handling and
|
||
other improvements supported by OpenTaal Foundation, Netherlands
|
||
2010-07-15: Hunspell 1.2.12 release
|
||
2010-05-06: Hunspell 1.2.11 release:
|
||
- Maintenance release bug fixes
|
||
2010-04-30: Hunspell 1.2.10 release:
|
||
- Maintenance release bug fixes
|
||
2010-03-03: Hunspell 1.2.9 release:
|
||
- Maintenance release bug fixes and warnings
|
||
- MAP support for composed characters or character sequences
|
||
2008-11-01: Hunspell 1.2.8 release:
|
||
- Default BREAK feature and better hyphenated word suggestion to accept
|
||
and fix (compound) words with hyphen characters by spell checker
|
||
instead of by work breaking code of OpenOffice.org. With this feature
|
||
it's possible to accept hyphenated compound words, such as "scot-free",
|
||
where "scot" is not a correct English word.
|
||
|
||
- ICONV & OCONV: input and output conversion tables for optional character
|
||
handling or using special inner format. Example:
|
||
|
||
# Accepting de facto replacements of the Romanian comma acuted letters
|
||
SET UTF-8
|
||
ICONV 4
|
||
ICONV ÅŸ È™
|
||
ICONV ţ ț
|
||
ICONV Ş Ș
|
||
ICONV Ţ Ț
|
||
|
||
Typical usage of ICONV/OCONV is to manage an inner format for a segmental
|
||
writing system, like the Ethiopic script of the Amharic language.
|
||
|
||
- Extended CHECKCOMPOUNDPATTERN to handle conpound word alternations, like
|
||
sandhi feature of Telugu and other writing systems.
|
||
|
||
- SIMPLIFIEDTRIPLE compound word feature: allow simplified Swedish and
|
||
Norwegian compound word forms, like tillåta (till|låta) and
|
||
bussjåfør (buss|sjåfør)
|
||
|
||
- wordforms: word generator script for dictionary developers (Hunspell
|
||
version of unmunch).
|
||
|
||
- bug fixes
|
||
|
||
2008-08-15: Hunspell 1.2.7 release:
|
||
- FULLSTRIP: new option for affix handling. With FULLSTRIP, affix rules can
|
||
strip full words, not only one less characters.
|
||
- COMPOUNDRULE works with all flag types. (COMPOUNDRULE is for pattern
|
||
matching. For example, en_US dictionary of OpenOffice.org uses COMPOUNDRULE
|
||
for ordinal number recognition: 1st, 2nd, 11th, 12th, 22nd, 112th, 1000122nd
|
||
etc.).
|
||
- optimized suggestions:
|
||
- modified 1-character distance suggestion algorithms: search a TRY character
|
||
in all position instead of all TRY characters in a character position
|
||
(it can give more readable suggestion order, also better suggestions
|
||
in the first positions, when TRY characters are sorted by frequency.)
|
||
For example, suggestions for "moze":
|
||
ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6),
|
||
maze, more, mote, ooze, mole etc. (Hunspell 1.2.7).
|
||
- extended compound word checking for better COMPOUNDRULE related
|
||
suggestions, for example English ordinal numbers: 121323th -> 121323rd
|
||
(it needs also a th->rd REP definition).
|
||
- bug fixes
|
||
|
||
2008-07-15: Hunspell 1.2.6 release:
|
||
- bug fix release (fix affix rule condition checking of sk_SK dictionary,
|
||
iconv support in stemming and morphological analysis of the Hunspell
|
||
utility, see also Changelog)
|
||
|
||
2008-07-09: Hunspell 1.2.5 release:
|
||
- bug fix release (fix affix rule condition checking of en_GB dictionary,
|
||
also morphological analysis by dictionaries with two-level suffixes)
|
||
|
||
2008-06-18: Hunspell 1.2.4-2 release:
|
||
- fix GCC compiler warnings
|
||
|
||
2008-06-17: Hunspell 1.2.4 release:
|
||
- add free_list() for C, C++ interfaces to deallocate suggestion lists
|
||
|
||
- bug fixes
|
||
|
||
2008-06-17: Hunspell 1.2.3 release:
|
||
- extended XML interface to use morphological functions by standard
|
||
spell checking interface, spell() and suggest(). See hunspell.3 manual page.
|
||
|
||
- default dash suggestions for compound words: newword-> new word and new-word
|
||
|
||
- new manual pages: hunspell.3, hzip.1, hunzip.1.
|
||
|
||
- bug fixes
|
||
|
||
2008-04-12: Hunspell 1.2.2 release:
|
||
- extended dictionary (dic file) support to use multiple base and
|
||
special dictionaries.
|
||
|
||
- new and improved options of command line hunspell:
|
||
-m: morphological analysis or flag debug mode (without affix
|
||
rule data it signs the flag of the affix rules)
|
||
-s: stemming mode
|
||
-D: list available dictionaries and search path
|
||
-d: support extra dictionaries by comma separated list. Example:
|
||
|
||
hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt
|
||
|
||
- forbidding in personal dictionary (with asterisk, / signs affixation)
|
||
|
||
- optional compressed dictionary format "hzip" for aff and dic files
|
||
usage:
|
||
hzip example.aff example.dic
|
||
mv example.aff example.dic /tmp
|
||
hunspell -d example
|
||
hunzip example.aff.hz >example.aff
|
||
hunzip example.dic.hz >example.dic
|
||
|
||
- new affix compression tool "affixcompress": compression tool for
|
||
large (millions of words) dictionaries.
|
||
|
||
- support encrypted dictionaries for closed OpenOffice.org extensions or
|
||
other commercial programs
|
||
|
||
- improved manual
|
||
|
||
- bug fixes
|
||
|
||
2007-11-01: Hunspell 1.2.1 release:
|
||
- new memory efficient condition checking algorithm for affix rules
|
||
|
||
- new morphological functions:
|
||
- stem() for stemming
|
||
- analyze() for morphological analysis
|
||
- generate() for morphological generation
|
||
|
||
- new demos:
|
||
- analyze: stemming, morphological analysis and generation
|
||
- chmorph: morphological conversion of texts
|
||
|
||
2007-09-05: Hunspell 1.1.12 release:
|
||
- dictionary based phonetic suggestion for words with
|
||
special or foreign pronounciation or alternative (bad) transliteration
|
||
(see Changelog, tests/phone.* and manual).
|
||
|
||
- improved data structure and memory optimization for dictionaries
|
||
with variable count fields
|
||
|
||
- bug fixes for Unicode encoding dictionaries and ngram suggestions
|
||
|
||
- improved REP suggestions with space: it works without dictionary
|
||
modification
|
||
|
||
- updated and new project files for Windows API
|
||
|
||
2007-08-27: Hunspell 1.1.11 release:
|
||
- portability fixes
|
||
|
||
2007-08-23: Hunspell 1.1.10 release:
|
||
- pronounciation based suggestion using Björn Jacke's original Aspell
|
||
phonetic transcription algorithm (http://aspell.net), relicensed under
|
||
GPL/LGPL/MPL tri-license with the permission of the author
|
||
|
||
- keyboard base suggestion by KEY (see manual)
|
||
|
||
- better time limits for suggestion search
|
||
|
||
- test environment for suggestion based on Wikipedia data
|
||
|
||
- bug fixes for non standard Mozilla platforms etc.
|
||
|
||
2007-07-25: Hunspell 1.1.9 release:
|
||
- better tokenization:
|
||
- for URLs, mail addresses and directory paths (default: skip these tokens)
|
||
- for colons in words (for Finnish and Swedish)
|
||
|
||
- new examples:
|
||
- affixation of personal dictionary words
|
||
- digits in words
|
||
|
||
- bug fixes (see ChangeLog)
|
||
|
||
2007-07-16: Hunspell 1.1.8 release:
|
||
- better Mac OS X/Cygwin and Windows compatibility
|
||
|
||
- fix Hunspell's Valgrind environment and memory handling errors
|
||
detected by Valgrind
|
||
|
||
- other bug fixes (see ChangeLog)
|
||
|
||
2007-07-06: Hunspell 1.1.7 release:
|
||
- fix warning messages of OpenOffice.org build
|
||
|
||
2007-06-29: Hunspell 1.1.6 release:
|
||
- check capitalization of the following word forms
|
||
- words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG
|
||
- allcap words and suffixes: UNICEF's - UNICEF'S
|
||
- prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA
|
||
|
||
- suggestion for missing sentence spacing: something.The -> something. The
|
||
|
||
- Hunspell executable: improved locale support
|
||
- -i option: custom input encoding
|
||
- use locale data for default dictionary names.
|
||
- tools/hunspell.cxx: fix 8-bit tokenization (letters without
|
||
casing, like ß or Hebrew characters now are handled well)
|
||
- dictionary search path (automatic detection of OpenOffice.org directories)
|
||
- DICPATH environmental variable
|
||
- -D option: show directory path of loaded dictionary
|
||
|
||
- patches and bug fixes for Mozilla, OpenOffice.org.
|
||
|
||
2007-03-19: Hunspell 1.1.5 release:
|
||
- optimizations: 10-100% speed up, smaller code size and memory footprint
|
||
(conditional experimental code and warning messages)
|
||
|
||
- extended Unicode support:
|
||
- non BMP Unicode characters in dictionary words and affixes (except
|
||
affix rules and conditions)
|
||
- support BOM sequence in aff and dic files
|
||
|
||
- IGNORE feature for Arabic diacritics and other optional characters
|
||
|
||
- New edit distance suggestion methods:
|
||
- capitalisation: nasa -> NASA
|
||
- long swap: permenant -> permanent
|
||
- long move: Ghandi -> Gandhi, greatful -> grateful
|
||
- double two characters: vacacation -> vacation
|
||
- spaces in REP sug.: REP alot a_lot (NOTE: "a lot" must be a dictionary word)
|
||
|
||
- patches and bug fixes for Mozilla, OpenOffice.org, Emacs, MinGW, Aqua,
|
||
German and Arabic language, etc.
|
||
|
||
2006-02-01: Hunspell 1.1.4 release:
|
||
- Improved suggestion for typical OCR bugs (missing spaces between
|
||
capitalized words). For example: "aNew" -> "a New".
|
||
http://qa.openoffice.org/issues/show_bug.cgi?id=58202
|
||
|
||
- tokenization fixes (fix incomplete tokenization of input texts on big-endian
|
||
platforms, and locale-dependent tokenization of dictionary entries)
|
||
|
||
2006-01-06: Hunspell 1.1.3.2 release:
|
||
- fix Visual C++ compiling errors
|
||
|
||
2006-01-05: Hunspell 1.1.3 release:
|
||
- GPL/LGPL/MPL tri-license for Mozilla integration
|
||
|
||
- Alias compression of flag sets and morphological descriptions.
|
||
(For example, 16 MB Arabic dic file can be compressed to 1 MB.)
|
||
|
||
- Improved suggestion.
|
||
|
||
- Improved, language independent German sharp s casing with CHECKSHARPS
|
||
declaration.
|
||
|
||
- Unicode tokenization in Hunspell program.
|
||
|
||
- Bug fixes (at new and old compound word handling methods), etc.
|
||
|
||
2005-11-11: Hunspell 1.1.2 release:
|
||
|
||
- Bug fixes (MAP Unicode, COMPOUND pattern matching, ONLYINCOMPOUND
|
||
suggestions)
|
||
|
||
- Checked with 51 regression tests in Valgrind debugging environment,
|
||
and tested with 52 OOo dictionaries on i686-pc-linux platform.
|
||
|
||
2005-11-09: Hunspell 1.1.1 release:
|
||
|
||
- Compound word patterns for complex compound word handling and
|
||
simple word-level lexical scanning. Ideal for checking
|
||
Arabic and Roman numbers, ordinal numbers in English, affixed
|
||
numbers in agglutinative languages, etc.
|
||
http://qa.openoffice.org/issues/show_bug.cgi?id=53643
|
||
|
||
- Support ISO-8859-15 encoding for French (French oe ligatures are
|
||
missing from the latin-1 encoding).
|
||
http://qa.openoffice.org/issues/show_bug.cgi?id=54980
|
||
|
||
- Implemented a flag to forbid obscene word suggestion:
|
||
http://qa.openoffice.org/issues/show_bug.cgi?id=55498
|
||
|
||
- Checked with 50 regression tests in Valgrind debugging environment,
|
||
and tested with 52 OOo dictionaries.
|
||
|
||
- other improvements and bug fixes (see ChangeLog)
|
||
|
||
2005-09-19: Hunspell 1.1.0 release
|
||
|
||
* complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta)
|
||
|
||
* improved ngram suggestion with swap character detection and
|
||
case insensitivity
|
||
|
||
------ examples for ngram improvement (input word and suggestions) -----
|
||
|
||
1. pernament (instead of permanent)
|
||
|
||
MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
|
||
ornament, ornamentals, ornamental, ornamentally
|
||
|
||
Hunspell 1.0.9: ornamental, ornament, tournament
|
||
|
||
Hunspell 1.1.0: permanent
|
||
|
||
Note: swap character detection
|
||
|
||
|
||
2. PERNAMENT (instead of PERMANENT)
|
||
|
||
MySpell 3.2: -
|
||
|
||
Hunspell 1.0.9: -
|
||
|
||
Hunspell 1.1.0: PERMANENT
|
||
|
||
|
||
3. Unesco (instead of UNESCO)
|
||
|
||
MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's,
|
||
Frescoed, Fresco, Escorts, Escorting
|
||
|
||
Hunspell 1.0.9: Genesco, Ionesco, Fresco
|
||
|
||
Hunspell 1.1.0: UNESCO
|
||
|
||
|
||
4. siggraph's (instead of SIGGRAPH's)
|
||
|
||
MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's,
|
||
physiography, digraphs, serigraph, stratigraphy's, stratigraphy
|
||
epigraphs
|
||
|
||
Hunspell 1.0.9: serigraph's, epigraph's, digraph's
|
||
|
||
Hunspell 1.1.0: SIGGRAPH's
|
||
|
||
--------------- end of examples --------------------
|
||
|
||
* improved testing environment with suggestion checking and memory debugging
|
||
|
||
memory debugging of all tests with a simple command:
|
||
|
||
VALGRIND=memcheck make check
|
||
|
||
* lots of other improvements and bug fixes (see ChangeLog)
|
||
|
||
|
||
2005-08-26: Hunspell 1.0.9 release
|
||
|
||
* improved related character map suggestion
|
||
|
||
* improved ngram suggestion
|
||
|
||
------ examples for ngram improvement (O=old, N = new ngram suggestions) --
|
||
|
||
1. Permenant (instead of Permanent)
|
||
|
||
O: Endangerment, Ferment, Fermented, Deferment's, Empowerment,
|
||
Ferment's, Ferments, Fermenting, Countermen, Weathermen
|
||
|
||
N: Permanent, Supermen, Preferment
|
||
|
||
Note: Ngram suggestions was case sensitive.
|
||
|
||
2. permenant (instead of permanent)
|
||
|
||
O: supermen, newspapermen, empowerment, endangerment, preferments,
|
||
preferment, permanent, preferment's, permanently, impermanent
|
||
|
||
N: permanent, supermen, preferment
|
||
|
||
Note: new suggestions are also weighted with longest common subsequence,
|
||
first letter and common character positions
|
||
|
||
3. pernemant (instead of permanent)
|
||
|
||
O: pimpernel's, pimpernel, pimpernels, permanently, permanents, permanent,
|
||
supernatant, impermanent, semipermanent, impermanently
|
||
|
||
N: permanent, supernatant, pimpernel
|
||
|
||
Note: new method also prefers root word instead of not
|
||
relevant affixes ('s, s and ly)
|
||
|
||
|
||
4. pernament (instead of permanent)
|
||
|
||
O: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
|
||
ornament, ornamentals, ornamental, ornamentally
|
||
|
||
N: ornamental, ornament, tournament
|
||
|
||
Note: Both ngram methods misses here.
|
||
|
||
|
||
5. obvus (instad of obvious):
|
||
|
||
O: obvious, Corvus, obverse, obviously, Jacobus, obtuser, obtuse,
|
||
obviates, obviate, Travus
|
||
|
||
N: obvious, obtuse, obverse
|
||
|
||
Note: new method also prefers common first letters.
|
||
|
||
|
||
6. unambigus (instead of unambiguous)
|
||
|
||
O: unambiguous, unambiguity, unambiguously, ambiguously, ambiguous,
|
||
unambitious, ambiguities, ambiguousness
|
||
|
||
N: unambiguous, unambiguity, unambitious
|
||
|
||
|
||
|
||
7. consecvence (instead of consequence)
|
||
|
||
O: consecutive, consecutively, consecutiveness, nonconsecutive, consequence,
|
||
consecutiveness's, convenience's, consistences, consistence
|
||
|
||
N: consequence, consecutive, consecrates
|
||
|
||
|
||
An example in a language with rich morphology:
|
||
|
||
8. Misisipiben (instead of Mississippiben [`in Mississippi' in Hungarian]):
|
||
|
||
O: Misikédéiben, Pisisedéiben, Misikéiéiben, Pisisekéiben, Misikéiben,
|
||
Misikéidéiben, Misikékéiben, Misikéikéiben, Misikéiméiben, Mississippiiben
|
||
|
||
N: Mississippiben, Mississippiiben, Misiiben
|
||
|
||
Note: Suggesting not relevant affixes was the biggest fault in ngram
|
||
suggestion for languages with a lot of affixes.
|
||
|
||
--------------- end of examples --------------------
|
||
|
||
* support twofold prefix cutting
|
||
|
||
* lots of other improvements and bug fixes (see ChangeLog)
|
||
|
||
* test Hunspell with 54 OpenOffice.org dictionaries:
|
||
|
||
source: ftp://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries
|
||
|
||
testing shell script:
|
||
-------------------------------------------------------
|
||
for i in `ls *zip | grep '^[a-z]*_[A-Z]*[.]'`
|
||
do
|
||
dic=`basename $i .zip`
|
||
mkdir $dic
|
||
echo unzip $dic
|
||
unzip -d $dic $i 2>/dev/null
|
||
cd $dic
|
||
echo unmunch and test $dic
|
||
unmunch $dic.dic $dic.aff 2>/dev/null | awk '{print$0"\t"}' |
|
||
hunspell -d $dic -l -1 >$dic.result 2>$dic.err || rm -f $dic.result
|
||
cd ..
|
||
done
|
||
--------------------------------------------------------
|
||
|
||
test result (0 size is o.k.):
|
||
|
||
$ for i in *_*/*.result; do wc -c $i; done
|
||
0 af_ZA/af_ZA.result
|
||
0 bg_BG/bg_BG.result
|
||
0 ca_ES/ca_ES.result
|
||
0 cy_GB/cy_GB.result
|
||
0 cs_CZ/cs_CZ.result
|
||
0 da_DK/da_DK.result
|
||
0 de_AT/de_AT.result
|
||
0 de_CH/de_CH.result
|
||
0 de_DE/de_DE.result
|
||
0 el_GR/el_GR.result
|
||
6 en_AU/en_AU.result
|
||
0 en_CA/en_CA.result
|
||
0 en_GB/en_GB.result
|
||
0 en_NZ/en_NZ.result
|
||
0 en_US/en_US.result
|
||
0 eo_EO/eo_EO.result
|
||
0 es_ES/es_ES.result
|
||
0 es_MX/es_MX.result
|
||
0 es_NEW/es_NEW.result
|
||
0 fo_FO/fo_FO.result
|
||
0 fr_FR/fr_FR.result
|
||
0 ga_IE/ga_IE.result
|
||
0 gd_GB/gd_GB.result
|
||
0 gl_ES/gl_ES.result
|
||
0 he_IL/he_IL.result
|
||
0 hr_HR/hr_HR.result
|
||
200694989 hu_HU/hu_HU.result
|
||
0 id_ID/id_ID.result
|
||
0 it_IT/it_IT.result
|
||
0 ku_TR/ku_TR.result
|
||
0 lt_LT/lt_LT.result
|
||
0 lv_LV/lv_LV.result
|
||
0 mg_MG/mg_MG.result
|
||
0 mi_NZ/mi_NZ.result
|
||
0 ms_MY/ms_MY.result
|
||
0 nb_NO/nb_NO.result
|
||
0 nl_NL/nl_NL.result
|
||
0 nn_NO/nn_NO.result
|
||
0 ny_MW/ny_MW.result
|
||
0 pl_PL/pl_PL.result
|
||
0 pt_BR/pt_BR.result
|
||
0 pt_PT/pt_PT.result
|
||
0 ro_RO/ro_RO.result
|
||
0 ru_RU/ru_RU.result
|
||
0 rw_RW/rw_RW.result
|
||
0 sk_SK/sk_SK.result
|
||
0 sl_SI/sl_SI.result
|
||
0 sv_SE/sv_SE.result
|
||
0 sw_KE/sw_KE.result
|
||
0 tet_ID/tet_ID.result
|
||
0 tl_PH/tl_PH.result
|
||
0 tn_ZA/tn_ZA.result
|
||
0 uk_UA/uk_UA.result
|
||
0 zu_ZA/zu_ZA.result
|
||
|
||
In en_AU dictionary, there is an abbrevation with two dots (`eqn..'), but
|
||
`eqn.' is missing. Presumably it is a dictionary bug. Myspell also
|
||
haven't accepted it.
|
||
|
||
Hungarian dictionary contains pseudoroots and forbidden words.
|
||
Unmunch haven't supported these features yet, and generates bad words, too.
|
||
|
||
* check affix rules and OOo dictionaries. Detected bugs in cs_CZ,
|
||
es_ES, es_NEW, es_MX, lt_LT, nn_NO, pt_PT, ro_RO, sk_SK and sv_SE dictionaries).
|
||
|
||
Details:
|
||
--------------------------------------------------------
|
||
cs_CZ
|
||
warning - incompatible stripping characters and condition:
|
||
SFX D us ech [^ighk]os
|
||
SFX D us y [^i]os
|
||
SFX Q os ech [^ghk]es
|
||
SFX M o ech [^ghkei]a
|
||
SFX J ém ej ám
|
||
SFX J ém ejme ám
|
||
SFX J ém ejte ám
|
||
SFX A ou¾it up oupit
|
||
SFX A ou¾it upme oupit
|
||
SFX A ou¾it upte oupit
|
||
SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy
|
||
SFX A nout l [aeiouyáéíóúýùìr][^aeiouyáéíóúýùìrl][^aeiouy
|
||
|
||
es_ES
|
||
warning - incompatible stripping characters and condition:
|
||
SFX W umar úse [ae]husar
|
||
SFX W emir iñáis eñir
|
||
|
||
es_NEW
|
||
warning - incompatible stripping characters and condition:
|
||
SFX I unan únen unar
|
||
|
||
es_MX
|
||
warning - incompatible stripping characters and condition:
|
||
SFX A a ote e
|
||
SFX W umar úse [ae]husar
|
||
SFX W emir iñáis eñir
|
||
|
||
lt_LT
|
||
warning - incompatible stripping characters and condition:
|
||
SFX U ti siuosi tis
|
||
SFX U ti siuosi tis
|
||
SFX U ti siesi tis
|
||
SFX U ti siesi tis
|
||
SFX U ti sis tis
|
||
SFX U ti sis tis
|
||
SFX U ti simës tis
|
||
SFX U ti simës tis
|
||
SFX U ti sitës tis
|
||
SFX U ti sitës tis
|
||
|
||
nn_NO
|
||
warning - incompatible stripping characters and condition:
|
||
SFX D ar rar [^fmk]er
|
||
SFX U Øre orde ere
|
||
SFX U Øre ort ere
|
||
|
||
pt_PT
|
||
warning - incompatible stripping characters and condition:
|
||
SFX g ãos oas ão
|
||
SFX g ãos oas ão
|
||
|
||
ro_RO
|
||
warning - bad field number:
|
||
SFX L 0 le [^cg] i
|
||
SFX L 0 i [cg] i
|
||
SFX U 0 i [^i] ii
|
||
warning - incompatible stripping characters and condition:
|
||
SFX P l i l [<- there is an unnecessary tabulator here)
|
||
SFX I a ii [gc] a
|
||
warning - bad field number:
|
||
SFX I a ii [gc] a
|
||
SFX I a ei [^cg] a
|
||
|
||
sk_SK
|
||
warning - incompatible stripping characters and condition:
|
||
SFX T µa» olú kla»
|
||
SFX T µa» olúc kla»
|
||
SFX T sµa» ¹lú sla»
|
||
SFX T sµa» ¹lúc sla»
|
||
SFX R µc» lèiem åc»
|
||
SFX R iás» ätie mias»
|
||
SFX R iez» iem [^i]ez»
|
||
SFX R iez» ie¹ [^i]ez»
|
||
SFX R iez» ie [^i]ez»
|
||
SFX R iez» eme [^i]ez»
|
||
SFX R iez» ete [^i]ez»
|
||
SFX R iez» ú [^i]ez»
|
||
SFX R iez» úc [^i]ez»
|
||
SFX R iez» z [^i]ez»
|
||
SFX R iez» me [^i]ez»
|
||
SFX R iez» te [^i]ez»
|
||
|
||
sv_SE
|
||
warning - bad field number:
|
||
SFX C 0 net nets [^e]n
|
||
--------------------------------------------------------
|
||
|
||
2005-08-01: Hunspell 1.0.8 release
|
||
|
||
- improved compound word support
|
||
- fix German S handling
|
||
- port MySpell files and MAP feature
|
||
|
||
2005-07-22: Hunspell 1.0.7 release
|
||
|
||
2005-07-21: new home page: http://hunspell.sourceforge.net
|