mirror of
https://git.lyx.org/repos/lyx.git
synced 2024-11-06 11:23:45 +00:00
840 lines
28 KiB
Plaintext
840 lines
28 KiB
Plaintext
|
2018-11-12: Hunspell 1.7.0 release:
|
|||
|
|
|||
|
New features and bug fixes by L<>szl<7A> N<>meth, supported by FSF.hu Foundation:
|
|||
|
|
|||
|
- No annoying suggestion times any more, especially in languages with
|
|||
|
compound word handling and complex morphology. By adding balanced
|
|||
|
multi-level time limits, now the guaranteed suggestion time is there
|
|||
|
within half a second, not seconds (nor dozen of seconds or more
|
|||
|
in extreme cases) for longer misspellings, too.
|
|||
|
|
|||
|
- add SPELLML support for run-time dictionary extension with optional
|
|||
|
affixation of user words. See new "Grammar By" feature of
|
|||
|
language-specific user dictionaries of LibreOffice 6.0:
|
|||
|
|
|||
|
News: https://wiki.documentfoundation.org/ReleaseNotes/6.0#.E2.80.9CGrammar_By.E2.80.9D_spell_checking
|
|||
|
|
|||
|
Screencast with English example: https://www.youtube.com/watch?v=EsS3gaBTfOo
|
|||
|
|
|||
|
Screencast with German example: https://www.youtube.com/watch?v=aYVFDqCUb6I
|
|||
|
|
|||
|
- Improved, highly customizable suggestions on level of dictionary words:
|
|||
|
Pronunciations and typical misspellings defined by optional "ph:" fields of
|
|||
|
the dictionary words are used not only in n-gram suggestions, but as
|
|||
|
elements of the REP replacement list getting the highest priority in normal
|
|||
|
suggestions, also giving the best suggestions for short words, too.
|
|||
|
More information: see "ph:" in man 5 hunspell.
|
|||
|
|
|||
|
- Handling multiple word suggestions is much more easier. Like in a
|
|||
|
traditional spelling dictionary, for example, to get the correct suggestion
|
|||
|
"a lot" for the typical misspelling "alot" at the first place, now it's
|
|||
|
enough to put the following line to the dic(tionary) file:
|
|||
|
|
|||
|
a lot
|
|||
|
|
|||
|
- Limit compound overgeneration by dictionary based word pairs:
|
|||
|
Now it's possible to filter bad compound words by listing
|
|||
|
the correct word pairs with space in the dictionary, as in a traditional
|
|||
|
spelling dictionary.
|
|||
|
|
|||
|
- clean-up suggestion:
|
|||
|
|
|||
|
- no n-gram and compound word suggestions, if "good" suggestion
|
|||
|
exists, ie. uppercase, REP, ph: or dictionary word pair suggestions
|
|||
|
|
|||
|
- word pairs are always suggested, if they exist in the dic file
|
|||
|
|
|||
|
- word pairs have top priority in suggestions, and
|
|||
|
these are the only suggestions if there is no other good suggestion.
|
|||
|
|
|||
|
- also dictionary word pairs separated by dash instead of space
|
|||
|
are handled specially in two-word suggestion (depending from the
|
|||
|
language)
|
|||
|
|
|||
|
- limit bad suggestions by improved n-gram suggestion rules:
|
|||
|
|
|||
|
don't suggest capitalized dictionary words for lower
|
|||
|
case misspellings in n-gram suggestions, except
|
|||
|
|
|||
|
- PHONE usage, or
|
|||
|
- in the case of German, where not only proper
|
|||
|
nouns are capitalized, or
|
|||
|
- the capitalized word has special pronunciation
|
|||
|
|
|||
|
and don't suggest if the difference of lengths of misspellings and
|
|||
|
suggestions is 5 or more characters.
|
|||
|
|
|||
|
- Extend dotless i and dotted I rules to Crimean Tatar language
|
|||
|
Allow dotted I in dictionary, and disable bad capitalization of i.
|
|||
|
|
|||
|
- BREAK: extended recursive word breaking algorithm to handle words or
|
|||
|
words with suffixes when they already contain word break characters,
|
|||
|
for example, "e-mail" is a dictionary word with a word break character, and
|
|||
|
it wasn't accepted before in compounds in some languages.
|
|||
|
|
|||
|
- FORBIDDENWORD precedes BREAK: Now it's possible to forbid compound
|
|||
|
forms recognized by BREAK word breaking by adding the bad compounds to
|
|||
|
the dictionary with FORBIDDENWORD flags.
|
|||
|
|
|||
|
- lower limit for "doubletwochars" suggestion algorithm:
|
|||
|
one of the typical misspellings recognized by Hunspell suggestion
|
|||
|
mechanism is the syllable duplication. Along the old pattern
|
|||
|
ABABA -> ABA, for example nutrITITIon -> nutrITIon, now also the
|
|||
|
simpler ABAB -> AB pattern is recognized in non-starting position,
|
|||
|
for example, regretTETEd -> regretTEd.
|
|||
|
|
|||
|
- lower limit for longswapchar and movechar: recognized only max.
|
|||
|
4-character distances to avoid slow and bad suggestions.
|
|||
|
|
|||
|
- fix compound handling for new Hungarian orthography reform
|
|||
|
|
|||
|
- Allow suggestion search for prefix + *two suffixes*:
|
|||
|
Remove artificial performance limit to get correct
|
|||
|
suggestions for relatively simple misspellings in
|
|||
|
Hungarian, etc., when the word form contains prefix
|
|||
|
and both derivative and inflectional suffixes, too:
|
|||
|
|
|||
|
lefiksz<73>l<EFBFBD>sa -> lefix<69>l<EFBFBD>sa
|
|||
|
|
|||
|
Improvements for command-line Hunspell:
|
|||
|
|
|||
|
- Remove false alarms during checking OpenDocument (ODF)
|
|||
|
documents by ignoring <text:span> elements. (LibreOffice
|
|||
|
creates a lot of <text:span> elements also within words
|
|||
|
during text reediting, resulted often huge amount of broken
|
|||
|
words before this fix.)
|
|||
|
|
|||
|
- List filenames during filtering multiple files in command-line:
|
|||
|
|
|||
|
Examples:
|
|||
|
|
|||
|
$ hunspell -l *.odt
|
|||
|
a.odt: mispelling
|
|||
|
b.odt: egzample
|
|||
|
|
|||
|
$ hunspell -l -G *.odt
|
|||
|
a.odt: good
|
|||
|
b.odt: words
|
|||
|
|
|||
|
- Dictionary search by option -D doesn't wait for the standard input
|
|||
|
(fixed by Siva Mahadevan)
|
|||
|
|
|||
|
Other improvements:
|
|||
|
|
|||
|
- makealias dictionary compression: add option --minimize-diff
|
|||
|
to reuse free positions of alias lists to create minimal and
|
|||
|
readable diffs for alias compressed dictionaries stored in
|
|||
|
revision control systems, as dictionaries of LibreOffice.
|
|||
|
|
|||
|
- Brazilian-Portuguese translation by Rafael Fontenelle
|
|||
|
|
|||
|
- Catalan translation by robert dot buj at gmail
|
|||
|
|
|||
|
- Minor bug fixes by several contributors, see git log
|
|||
|
|
|||
|
2017-09-03: Hunspell 1.6.2 release:
|
|||
|
- Library changes: no. Same as 1.6.1.
|
|||
|
- Command line tool:
|
|||
|
- Added German translation
|
|||
|
- Fixed bug with wrong output encoding, not respecting system locale.
|
|||
|
|
|||
|
2017-03-25: Hunspell 1.6.1 release:
|
|||
|
- Library changes:
|
|||
|
- Performance improvements in suggest()
|
|||
|
- Fixes regressions for Hungarian related to compounding.
|
|||
|
- Fixes regressions for Korean related to ICONV.
|
|||
|
- Command line tool:
|
|||
|
- Added Tajik translation
|
|||
|
- Fix regarding serching of OOo dicts installed in user folder
|
|||
|
- Manpages:
|
|||
|
- Fix microsoft-cp1251 to cp1251. Dicts should not use the first.
|
|||
|
- Typos.
|
|||
|
|
|||
|
2016-12-22: Hunspell 1.6.0 release:
|
|||
|
- Library changes:
|
|||
|
- Performance improvement in ngsuggest(), suggestions should be faster.
|
|||
|
- Revert MAXWORDLEN to 100 as in 1.3.3 for performance reasons.
|
|||
|
- MAXWORDLEN can be set during build time with -D defines.
|
|||
|
- Fix crash when word with 102 consecutive X is spelled.
|
|||
|
- Command line tool:
|
|||
|
- -D shows all loaded dictionares insted of only the first.
|
|||
|
- -D properly lists all available dictionaries on Windows.
|
|||
|
|
|||
|
2016-11-30: Hunspell 1.5.4 release:
|
|||
|
- Fixes the command COMPOUNDSYLLABLE used in Hungarian dictionary.
|
|||
|
|
|||
|
2016-11-28: Hunspell 1.5.3 release:
|
|||
|
- Removed a #include from hunspell.hxx that was creating trouble
|
|||
|
|
|||
|
2016-11-27: Hunspell 1.5.2 release:
|
|||
|
- Reverted full backward compatibility with 1.4 public API, again
|
|||
|
|
|||
|
2016-11-27: Hunspell 1.5.1 release:
|
|||
|
- Reverted full backward compatibility with 1.4 public API
|
|||
|
|
|||
|
2016-11-18: Hunspell 1.5.0 release:
|
|||
|
- Lot of stability fixes
|
|||
|
- Fixed compilation errors on various systems (Windows, FreeBSD)
|
|||
|
- Small performance improvement compared to 1.4.0
|
|||
|
- The C++ API is updated to use modern C++ types (string, vector).
|
|||
|
Backward compatibility is kept for most of the functions except for
|
|||
|
the following:
|
|||
|
- get_wordchars();
|
|||
|
- get_version();
|
|||
|
- input_conv(string, string);
|
|||
|
- removed get_csconv();
|
|||
|
|
|||
|
2016-04-15: Hunspell 1.4.0 release:
|
|||
|
- various abi changes due to moving away from char* to std::string
|
|||
|
|
|||
|
2014-06-02: Hunspell 1.3.3 release:
|
|||
|
- OpenDocument (ODF and Flat ODF) support (ODF needs unzip program)
|
|||
|
- various bug fixes
|
|||
|
|
|||
|
2011-02-02: Hunspell 1.3.2 release:
|
|||
|
- fix library versioning
|
|||
|
- improved manual
|
|||
|
|
|||
|
2011-02-02: Hunspell 1.3.1 release:
|
|||
|
- bug fixes
|
|||
|
|
|||
|
2011-01-26: Hunspell 1.2.15/1.3 release:
|
|||
|
- new features: MAXDIFF, ONLYMAXDIFF, MAXCPDSUGS, FORBIDWARN, see manual
|
|||
|
- bug fixes
|
|||
|
|
|||
|
2011-01-21:
|
|||
|
- new features: FORCEUCASE and WARN, see manual
|
|||
|
- new options: -r to filter potential mistakes (rare words
|
|||
|
signed by flag WARN in the dictionary)
|
|||
|
- limited and optimized suggestions
|
|||
|
|
|||
|
2011-01-06: Hunspell 1.2.14 release:
|
|||
|
- bug fix
|
|||
|
2011-01-03: Hunspell 1.2.13 release:
|
|||
|
- bug fixes
|
|||
|
- improved compound handling and
|
|||
|
other improvements supported by OpenTaal Foundation, Netherlands
|
|||
|
2010-07-15: Hunspell 1.2.12 release
|
|||
|
2010-05-06: Hunspell 1.2.11 release:
|
|||
|
- Maintenance release bug fixes
|
|||
|
2010-04-30: Hunspell 1.2.10 release:
|
|||
|
- Maintenance release bug fixes
|
|||
|
2010-03-03: Hunspell 1.2.9 release:
|
|||
|
- Maintenance release bug fixes and warnings
|
|||
|
- MAP support for composed characters or character sequences
|
|||
|
2008-11-01: Hunspell 1.2.8 release:
|
|||
|
- Default BREAK feature and better hyphenated word suggestion to accept
|
|||
|
and fix (compound) words with hyphen characters by spell checker
|
|||
|
instead of by work breaking code of OpenOffice.org. With this feature
|
|||
|
it's possible to accept hyphenated compound words, such as "scot-free",
|
|||
|
where "scot" is not a correct English word.
|
|||
|
|
|||
|
- ICONV & OCONV: input and output conversion tables for optional character
|
|||
|
handling or using special inner format. Example:
|
|||
|
|
|||
|
# Accepting de facto replacements of the Romanian comma acuted letters
|
|||
|
SET UTF-8
|
|||
|
ICONV 4
|
|||
|
ICONV ş ș
|
|||
|
ICONV ţ ț
|
|||
|
ICONV Ş Ș
|
|||
|
ICONV Ţ Ț
|
|||
|
|
|||
|
Typical usage of ICONV/OCONV is to manage an inner format for a segmental
|
|||
|
writing system, like the Ethiopic script of the Amharic language.
|
|||
|
|
|||
|
- Extended CHECKCOMPOUNDPATTERN to handle conpound word alternations, like
|
|||
|
sandhi feature of Telugu and other writing systems.
|
|||
|
|
|||
|
- SIMPLIFIEDTRIPLE compound word feature: allow simplified Swedish and
|
|||
|
Norwegian compound word forms, like tillåta (till|låta) and
|
|||
|
bussjåfør (buss|sjåfør)
|
|||
|
|
|||
|
- wordforms: word generator script for dictionary developers (Hunspell
|
|||
|
version of unmunch).
|
|||
|
|
|||
|
- bug fixes
|
|||
|
|
|||
|
2008-08-15: Hunspell 1.2.7 release:
|
|||
|
- FULLSTRIP: new option for affix handling. With FULLSTRIP, affix rules can
|
|||
|
strip full words, not only one less characters.
|
|||
|
- COMPOUNDRULE works with all flag types. (COMPOUNDRULE is for pattern
|
|||
|
matching. For example, en_US dictionary of OpenOffice.org uses COMPOUNDRULE
|
|||
|
for ordinal number recognition: 1st, 2nd, 11th, 12th, 22nd, 112th, 1000122nd
|
|||
|
etc.).
|
|||
|
- optimized suggestions:
|
|||
|
- modified 1-character distance suggestion algorithms: search a TRY character
|
|||
|
in all position instead of all TRY characters in a character position
|
|||
|
(it can give more readable suggestion order, also better suggestions
|
|||
|
in the first positions, when TRY characters are sorted by frequency.)
|
|||
|
For example, suggestions for "moze":
|
|||
|
ooze, doze, Roze, maze, more etc. (Hunspell 1.2.6),
|
|||
|
maze, more, mote, ooze, mole etc. (Hunspell 1.2.7).
|
|||
|
- extended compound word checking for better COMPOUNDRULE related
|
|||
|
suggestions, for example English ordinal numbers: 121323th -> 121323rd
|
|||
|
(it needs also a th->rd REP definition).
|
|||
|
- bug fixes
|
|||
|
|
|||
|
2008-07-15: Hunspell 1.2.6 release:
|
|||
|
- bug fix release (fix affix rule condition checking of sk_SK dictionary,
|
|||
|
iconv support in stemming and morphological analysis of the Hunspell
|
|||
|
utility, see also Changelog)
|
|||
|
|
|||
|
2008-07-09: Hunspell 1.2.5 release:
|
|||
|
- bug fix release (fix affix rule condition checking of en_GB dictionary,
|
|||
|
also morphological analysis by dictionaries with two-level suffixes)
|
|||
|
|
|||
|
2008-06-18: Hunspell 1.2.4-2 release:
|
|||
|
- fix GCC compiler warnings
|
|||
|
|
|||
|
2008-06-17: Hunspell 1.2.4 release:
|
|||
|
- add free_list() for C, C++ interfaces to deallocate suggestion lists
|
|||
|
|
|||
|
- bug fixes
|
|||
|
|
|||
|
2008-06-17: Hunspell 1.2.3 release:
|
|||
|
- extended XML interface to use morphological functions by standard
|
|||
|
spell checking interface, spell() and suggest(). See hunspell.3 manual page.
|
|||
|
|
|||
|
- default dash suggestions for compound words: newword-> new word and new-word
|
|||
|
|
|||
|
- new manual pages: hunspell.3, hzip.1, hunzip.1.
|
|||
|
|
|||
|
- bug fixes
|
|||
|
|
|||
|
2008-04-12: Hunspell 1.2.2 release:
|
|||
|
- extended dictionary (dic file) support to use multiple base and
|
|||
|
special dictionaries.
|
|||
|
|
|||
|
- new and improved options of command line hunspell:
|
|||
|
-m: morphological analysis or flag debug mode (without affix
|
|||
|
rule data it signs the flag of the affix rules)
|
|||
|
-s: stemming mode
|
|||
|
-D: list available dictionaries and search path
|
|||
|
-d: support extra dictionaries by comma separated list. Example:
|
|||
|
|
|||
|
hunspell -d en_US,en_med,de_DE,de_med,de_geo UNESCO.txt
|
|||
|
|
|||
|
- forbidding in personal dictionary (with asterisk, / signs affixation)
|
|||
|
|
|||
|
- optional compressed dictionary format "hzip" for aff and dic files
|
|||
|
usage:
|
|||
|
hzip example.aff example.dic
|
|||
|
mv example.aff example.dic /tmp
|
|||
|
hunspell -d example
|
|||
|
hunzip example.aff.hz >example.aff
|
|||
|
hunzip example.dic.hz >example.dic
|
|||
|
|
|||
|
- new affix compression tool "affixcompress": compression tool for
|
|||
|
large (millions of words) dictionaries.
|
|||
|
|
|||
|
- support encrypted dictionaries for closed OpenOffice.org extensions or
|
|||
|
other commercial programs
|
|||
|
|
|||
|
- improved manual
|
|||
|
|
|||
|
- bug fixes
|
|||
|
|
|||
|
2007-11-01: Hunspell 1.2.1 release:
|
|||
|
- new memory efficient condition checking algorithm for affix rules
|
|||
|
|
|||
|
- new morphological functions:
|
|||
|
- stem() for stemming
|
|||
|
- analyze() for morphological analysis
|
|||
|
- generate() for morphological generation
|
|||
|
|
|||
|
- new demos:
|
|||
|
- analyze: stemming, morphological analysis and generation
|
|||
|
- chmorph: morphological conversion of texts
|
|||
|
|
|||
|
2007-09-05: Hunspell 1.1.12 release:
|
|||
|
- dictionary based phonetic suggestion for words with
|
|||
|
special or foreign pronounciation or alternative (bad) transliteration
|
|||
|
(see Changelog, tests/phone.* and manual).
|
|||
|
|
|||
|
- improved data structure and memory optimization for dictionaries
|
|||
|
with variable count fields
|
|||
|
|
|||
|
- bug fixes for Unicode encoding dictionaries and ngram suggestions
|
|||
|
|
|||
|
- improved REP suggestions with space: it works without dictionary
|
|||
|
modification
|
|||
|
|
|||
|
- updated and new project files for Windows API
|
|||
|
|
|||
|
2007-08-27: Hunspell 1.1.11 release:
|
|||
|
- portability fixes
|
|||
|
|
|||
|
2007-08-23: Hunspell 1.1.10 release:
|
|||
|
- pronounciation based suggestion using Bj<42>rn Jacke's original Aspell
|
|||
|
phonetic transcription algorithm (http://aspell.net), relicensed under
|
|||
|
GPL/LGPL/MPL tri-license with the permission of the author
|
|||
|
|
|||
|
- keyboard base suggestion by KEY (see manual)
|
|||
|
|
|||
|
- better time limits for suggestion search
|
|||
|
|
|||
|
- test environment for suggestion based on Wikipedia data
|
|||
|
|
|||
|
- bug fixes for non standard Mozilla platforms etc.
|
|||
|
|
|||
|
2007-07-25: Hunspell 1.1.9 release:
|
|||
|
- better tokenization:
|
|||
|
- for URLs, mail addresses and directory paths (default: skip these tokens)
|
|||
|
- for colons in words (for Finnish and Swedish)
|
|||
|
|
|||
|
- new examples:
|
|||
|
- affixation of personal dictionary words
|
|||
|
- digits in words
|
|||
|
|
|||
|
- bug fixes (see ChangeLog)
|
|||
|
|
|||
|
2007-07-16: Hunspell 1.1.8 release:
|
|||
|
- better Mac OS X/Cygwin and Windows compatibility
|
|||
|
|
|||
|
- fix Hunspell's Valgrind environment and memory handling errors
|
|||
|
detected by Valgrind
|
|||
|
|
|||
|
- other bug fixes (see ChangeLog)
|
|||
|
|
|||
|
2007-07-06: Hunspell 1.1.7 release:
|
|||
|
- fix warning messages of OpenOffice.org build
|
|||
|
|
|||
|
2007-06-29: Hunspell 1.1.6 release:
|
|||
|
- check capitalization of the following word forms
|
|||
|
- words with mixed capitalisation: OpenOffice.org - OPENOFFICE.ORG
|
|||
|
- allcap words and suffixes: UNICEF's - UNICEF'S
|
|||
|
- prefixes with apostrophe and proper names: Sant'Elia - SANT'ELIA
|
|||
|
|
|||
|
- suggestion for missing sentence spacing: something.The -> something. The
|
|||
|
|
|||
|
- Hunspell executable: improved locale support
|
|||
|
- -i option: custom input encoding
|
|||
|
- use locale data for default dictionary names.
|
|||
|
- tools/hunspell.cxx: fix 8-bit tokenization (letters without
|
|||
|
casing, like ß or Hebrew characters now are handled well)
|
|||
|
- dictionary search path (automatic detection of OpenOffice.org directories)
|
|||
|
- DICPATH environmental variable
|
|||
|
- -D option: show directory path of loaded dictionary
|
|||
|
|
|||
|
- patches and bug fixes for Mozilla, OpenOffice.org.
|
|||
|
|
|||
|
2007-03-19: Hunspell 1.1.5 release:
|
|||
|
- optimizations: 10-100% speed up, smaller code size and memory footprint
|
|||
|
(conditional experimental code and warning messages)
|
|||
|
|
|||
|
- extended Unicode support:
|
|||
|
- non BMP Unicode characters in dictionary words and affixes (except
|
|||
|
affix rules and conditions)
|
|||
|
- support BOM sequence in aff and dic files
|
|||
|
|
|||
|
- IGNORE feature for Arabic diacritics and other optional characters
|
|||
|
|
|||
|
- New edit distance suggestion methods:
|
|||
|
- capitalisation: nasa -> NASA
|
|||
|
- long swap: permenant -> permanent
|
|||
|
- long move: Ghandi -> Gandhi, greatful -> grateful
|
|||
|
- double two characters: vacacation -> vacation
|
|||
|
- spaces in REP sug.: REP alot a_lot (NOTE: "a lot" must be a dictionary word)
|
|||
|
|
|||
|
- patches and bug fixes for Mozilla, OpenOffice.org, Emacs, MinGW, Aqua,
|
|||
|
German and Arabic language, etc.
|
|||
|
|
|||
|
2006-02-01: Hunspell 1.1.4 release:
|
|||
|
- Improved suggestion for typical OCR bugs (missing spaces between
|
|||
|
capitalized words). For example: "aNew" -> "a New".
|
|||
|
http://qa.openoffice.org/issues/show_bug.cgi?id=58202
|
|||
|
|
|||
|
- tokenization fixes (fix incomplete tokenization of input texts on big-endian
|
|||
|
platforms, and locale-dependent tokenization of dictionary entries)
|
|||
|
|
|||
|
2006-01-06: Hunspell 1.1.3.2 release:
|
|||
|
- fix Visual C++ compiling errors
|
|||
|
|
|||
|
2006-01-05: Hunspell 1.1.3 release:
|
|||
|
- GPL/LGPL/MPL tri-license for Mozilla integration
|
|||
|
|
|||
|
- Alias compression of flag sets and morphological descriptions.
|
|||
|
(For example, 16 MB Arabic dic file can be compressed to 1 MB.)
|
|||
|
|
|||
|
- Improved suggestion.
|
|||
|
|
|||
|
- Improved, language independent German sharp s casing with CHECKSHARPS
|
|||
|
declaration.
|
|||
|
|
|||
|
- Unicode tokenization in Hunspell program.
|
|||
|
|
|||
|
- Bug fixes (at new and old compound word handling methods), etc.
|
|||
|
|
|||
|
2005-11-11: Hunspell 1.1.2 release:
|
|||
|
|
|||
|
- Bug fixes (MAP Unicode, COMPOUND pattern matching, ONLYINCOMPOUND
|
|||
|
suggestions)
|
|||
|
|
|||
|
- Checked with 51 regression tests in Valgrind debugging environment,
|
|||
|
and tested with 52 OOo dictionaries on i686-pc-linux platform.
|
|||
|
|
|||
|
2005-11-09: Hunspell 1.1.1 release:
|
|||
|
|
|||
|
- Compound word patterns for complex compound word handling and
|
|||
|
simple word-level lexical scanning. Ideal for checking
|
|||
|
Arabic and Roman numbers, ordinal numbers in English, affixed
|
|||
|
numbers in agglutinative languages, etc.
|
|||
|
http://qa.openoffice.org/issues/show_bug.cgi?id=53643
|
|||
|
|
|||
|
- Support ISO-8859-15 encoding for French (French oe ligatures are
|
|||
|
missing from the latin-1 encoding).
|
|||
|
http://qa.openoffice.org/issues/show_bug.cgi?id=54980
|
|||
|
|
|||
|
- Implemented a flag to forbid obscene word suggestion:
|
|||
|
http://qa.openoffice.org/issues/show_bug.cgi?id=55498
|
|||
|
|
|||
|
- Checked with 50 regression tests in Valgrind debugging environment,
|
|||
|
and tested with 52 OOo dictionaries.
|
|||
|
|
|||
|
- other improvements and bug fixes (see ChangeLog)
|
|||
|
|
|||
|
2005-09-19: Hunspell 1.1.0 release
|
|||
|
|
|||
|
* complete comparison with MySpell 3.2 (from OpenOffice.org 2 beta)
|
|||
|
|
|||
|
* improved ngram suggestion with swap character detection and
|
|||
|
case insensitivity
|
|||
|
|
|||
|
------ examples for ngram improvement (input word and suggestions) -----
|
|||
|
|
|||
|
1. pernament (instead of permanent)
|
|||
|
|
|||
|
MySpell 3.2: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
|
|||
|
ornament, ornamentals, ornamental, ornamentally
|
|||
|
|
|||
|
Hunspell 1.0.9: ornamental, ornament, tournament
|
|||
|
|
|||
|
Hunspell 1.1.0: permanent
|
|||
|
|
|||
|
Note: swap character detection
|
|||
|
|
|||
|
|
|||
|
2. PERNAMENT (instead of PERMANENT)
|
|||
|
|
|||
|
MySpell 3.2: -
|
|||
|
|
|||
|
Hunspell 1.0.9: -
|
|||
|
|
|||
|
Hunspell 1.1.0: PERMANENT
|
|||
|
|
|||
|
|
|||
|
3. Unesco (instead of UNESCO)
|
|||
|
|
|||
|
MySpell 3.2: Genesco, Ionesco, Genesco's, Ionesco's, Frescoing, Fresco's,
|
|||
|
Frescoed, Fresco, Escorts, Escorting
|
|||
|
|
|||
|
Hunspell 1.0.9: Genesco, Ionesco, Fresco
|
|||
|
|
|||
|
Hunspell 1.1.0: UNESCO
|
|||
|
|
|||
|
|
|||
|
4. siggraph's (instead of SIGGRAPH's)
|
|||
|
|
|||
|
MySpell 3.2: serigraph's, photograph's, serigraphs, physiography's,
|
|||
|
physiography, digraphs, serigraph, stratigraphy's, stratigraphy
|
|||
|
epigraphs
|
|||
|
|
|||
|
Hunspell 1.0.9: serigraph's, epigraph's, digraph's
|
|||
|
|
|||
|
Hunspell 1.1.0: SIGGRAPH's
|
|||
|
|
|||
|
--------------- end of examples --------------------
|
|||
|
|
|||
|
* improved testing environment with suggestion checking and memory debugging
|
|||
|
|
|||
|
memory debugging of all tests with a simple command:
|
|||
|
|
|||
|
VALGRIND=memcheck make check
|
|||
|
|
|||
|
* lots of other improvements and bug fixes (see ChangeLog)
|
|||
|
|
|||
|
|
|||
|
2005-08-26: Hunspell 1.0.9 release
|
|||
|
|
|||
|
* improved related character map suggestion
|
|||
|
|
|||
|
* improved ngram suggestion
|
|||
|
|
|||
|
------ examples for ngram improvement (O=old, N = new ngram suggestions) --
|
|||
|
|
|||
|
1. Permenant (instead of Permanent)
|
|||
|
|
|||
|
O: Endangerment, Ferment, Fermented, Deferment's, Empowerment,
|
|||
|
Ferment's, Ferments, Fermenting, Countermen, Weathermen
|
|||
|
|
|||
|
N: Permanent, Supermen, Preferment
|
|||
|
|
|||
|
Note: Ngram suggestions was case sensitive.
|
|||
|
|
|||
|
2. permenant (instead of permanent)
|
|||
|
|
|||
|
O: supermen, newspapermen, empowerment, endangerment, preferments,
|
|||
|
preferment, permanent, preferment's, permanently, impermanent
|
|||
|
|
|||
|
N: permanent, supermen, preferment
|
|||
|
|
|||
|
Note: new suggestions are also weighted with longest common subsequence,
|
|||
|
first letter and common character positions
|
|||
|
|
|||
|
3. pernemant (instead of permanent)
|
|||
|
|
|||
|
O: pimpernel's, pimpernel, pimpernels, permanently, permanents, permanent,
|
|||
|
supernatant, impermanent, semipermanent, impermanently
|
|||
|
|
|||
|
N: permanent, supernatant, pimpernel
|
|||
|
|
|||
|
Note: new method also prefers root word instead of not
|
|||
|
relevant affixes ('s, s and ly)
|
|||
|
|
|||
|
|
|||
|
4. pernament (instead of permanent)
|
|||
|
|
|||
|
O: tournaments, tournament, ornaments, ornament's, ornamenting, ornamented,
|
|||
|
ornament, ornamentals, ornamental, ornamentally
|
|||
|
|
|||
|
N: ornamental, ornament, tournament
|
|||
|
|
|||
|
Note: Both ngram methods misses here.
|
|||
|
|
|||
|
|
|||
|
5. obvus (instad of obvious):
|
|||
|
|
|||
|
O: obvious, Corvus, obverse, obviously, Jacobus, obtuser, obtuse,
|
|||
|
obviates, obviate, Travus
|
|||
|
|
|||
|
N: obvious, obtuse, obverse
|
|||
|
|
|||
|
Note: new method also prefers common first letters.
|
|||
|
|
|||
|
|
|||
|
6. unambigus (instead of unambiguous)
|
|||
|
|
|||
|
O: unambiguous, unambiguity, unambiguously, ambiguously, ambiguous,
|
|||
|
unambitious, ambiguities, ambiguousness
|
|||
|
|
|||
|
N: unambiguous, unambiguity, unambitious
|
|||
|
|
|||
|
|
|||
|
|
|||
|
7. consecvence (instead of consequence)
|
|||
|
|
|||
|
O: consecutive, consecutively, consecutiveness, nonconsecutive, consequence,
|
|||
|
consecutiveness's, convenience's, consistences, consistence
|
|||
|
|
|||
|
N: consequence, consecutive, consecrates
|
|||
|
|
|||
|
|
|||
|
An example in a language with rich morphology:
|
|||
|
|
|||
|
8. Misisipiben (instead of Mississippiben [`in Mississippi' in Hungarian]):
|
|||
|
|
|||
|
O: Misik<69>d<EFBFBD>iben, Pisised<65>iben, Misik<69>i<EFBFBD>iben, Pisisek<65>iben, Misik<69>iben,
|
|||
|
Misik<69>id<69>iben, Misik<69>k<EFBFBD>iben, Misik<69>ik<69>iben, Misik<69>im<69>iben, Mississippiiben
|
|||
|
|
|||
|
N: Mississippiben, Mississippiiben, Misiiben
|
|||
|
|
|||
|
Note: Suggesting not relevant affixes was the biggest fault in ngram
|
|||
|
suggestion for languages with a lot of affixes.
|
|||
|
|
|||
|
--------------- end of examples --------------------
|
|||
|
|
|||
|
* support twofold prefix cutting
|
|||
|
|
|||
|
* lots of other improvements and bug fixes (see ChangeLog)
|
|||
|
|
|||
|
* test Hunspell with 54 OpenOffice.org dictionaries:
|
|||
|
|
|||
|
source: ftp://ftp.services.openoffice.org/pub/OpenOffice.org/contrib/dictionaries
|
|||
|
|
|||
|
testing shell script:
|
|||
|
-------------------------------------------------------
|
|||
|
for i in `ls *zip | grep '^[a-z]*_[A-Z]*[.]'`
|
|||
|
do
|
|||
|
dic=`basename $i .zip`
|
|||
|
mkdir $dic
|
|||
|
echo unzip $dic
|
|||
|
unzip -d $dic $i 2>/dev/null
|
|||
|
cd $dic
|
|||
|
echo unmunch and test $dic
|
|||
|
unmunch $dic.dic $dic.aff 2>/dev/null | awk '{print$0"\t"}' |
|
|||
|
hunspell -d $dic -l -1 >$dic.result 2>$dic.err || rm -f $dic.result
|
|||
|
cd ..
|
|||
|
done
|
|||
|
--------------------------------------------------------
|
|||
|
|
|||
|
test result (0 size is o.k.):
|
|||
|
|
|||
|
$ for i in *_*/*.result; do wc -c $i; done
|
|||
|
0 af_ZA/af_ZA.result
|
|||
|
0 bg_BG/bg_BG.result
|
|||
|
0 ca_ES/ca_ES.result
|
|||
|
0 cy_GB/cy_GB.result
|
|||
|
0 cs_CZ/cs_CZ.result
|
|||
|
0 da_DK/da_DK.result
|
|||
|
0 de_AT/de_AT.result
|
|||
|
0 de_CH/de_CH.result
|
|||
|
0 de_DE/de_DE.result
|
|||
|
0 el_GR/el_GR.result
|
|||
|
6 en_AU/en_AU.result
|
|||
|
0 en_CA/en_CA.result
|
|||
|
0 en_GB/en_GB.result
|
|||
|
0 en_NZ/en_NZ.result
|
|||
|
0 en_US/en_US.result
|
|||
|
0 eo_EO/eo_EO.result
|
|||
|
0 es_ES/es_ES.result
|
|||
|
0 es_MX/es_MX.result
|
|||
|
0 es_NEW/es_NEW.result
|
|||
|
0 fo_FO/fo_FO.result
|
|||
|
0 fr_FR/fr_FR.result
|
|||
|
0 ga_IE/ga_IE.result
|
|||
|
0 gd_GB/gd_GB.result
|
|||
|
0 gl_ES/gl_ES.result
|
|||
|
0 he_IL/he_IL.result
|
|||
|
0 hr_HR/hr_HR.result
|
|||
|
200694989 hu_HU/hu_HU.result
|
|||
|
0 id_ID/id_ID.result
|
|||
|
0 it_IT/it_IT.result
|
|||
|
0 ku_TR/ku_TR.result
|
|||
|
0 lt_LT/lt_LT.result
|
|||
|
0 lv_LV/lv_LV.result
|
|||
|
0 mg_MG/mg_MG.result
|
|||
|
0 mi_NZ/mi_NZ.result
|
|||
|
0 ms_MY/ms_MY.result
|
|||
|
0 nb_NO/nb_NO.result
|
|||
|
0 nl_NL/nl_NL.result
|
|||
|
0 nn_NO/nn_NO.result
|
|||
|
0 ny_MW/ny_MW.result
|
|||
|
0 pl_PL/pl_PL.result
|
|||
|
0 pt_BR/pt_BR.result
|
|||
|
0 pt_PT/pt_PT.result
|
|||
|
0 ro_RO/ro_RO.result
|
|||
|
0 ru_RU/ru_RU.result
|
|||
|
0 rw_RW/rw_RW.result
|
|||
|
0 sk_SK/sk_SK.result
|
|||
|
0 sl_SI/sl_SI.result
|
|||
|
0 sv_SE/sv_SE.result
|
|||
|
0 sw_KE/sw_KE.result
|
|||
|
0 tet_ID/tet_ID.result
|
|||
|
0 tl_PH/tl_PH.result
|
|||
|
0 tn_ZA/tn_ZA.result
|
|||
|
0 uk_UA/uk_UA.result
|
|||
|
0 zu_ZA/zu_ZA.result
|
|||
|
|
|||
|
In en_AU dictionary, there is an abbrevation with two dots (`eqn..'), but
|
|||
|
`eqn.' is missing. Presumably it is a dictionary bug. Myspell also
|
|||
|
haven't accepted it.
|
|||
|
|
|||
|
Hungarian dictionary contains pseudoroots and forbidden words.
|
|||
|
Unmunch haven't supported these features yet, and generates bad words, too.
|
|||
|
|
|||
|
* check affix rules and OOo dictionaries. Detected bugs in cs_CZ,
|
|||
|
es_ES, es_NEW, es_MX, lt_LT, nn_NO, pt_PT, ro_RO, sk_SK and sv_SE dictionaries).
|
|||
|
|
|||
|
Details:
|
|||
|
--------------------------------------------------------
|
|||
|
cs_CZ
|
|||
|
warning - incompatible stripping characters and condition:
|
|||
|
SFX D us ech [^ighk]os
|
|||
|
SFX D us y [^i]os
|
|||
|
SFX Q os ech [^ghk]es
|
|||
|
SFX M o ech [^ghkei]a
|
|||
|
SFX J <20>m ej <20>m
|
|||
|
SFX J <20>m ejme <20>m
|
|||
|
SFX J <20>m ejte <20>m
|
|||
|
SFX A ou<6F>it up oupit
|
|||
|
SFX A ou<6F>it upme oupit
|
|||
|
SFX A ou<6F>it upte oupit
|
|||
|
SFX A nout l [aeiouy<75><79><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>r][^aeiouy<75><79><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>rl][^aeiouy
|
|||
|
SFX A nout l [aeiouy<75><79><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>r][^aeiouy<75><79><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD><EFBFBD>rl][^aeiouy
|
|||
|
|
|||
|
es_ES
|
|||
|
warning - incompatible stripping characters and condition:
|
|||
|
SFX W umar <20>se [ae]husar
|
|||
|
SFX W emir i<><69>is e<>ir
|
|||
|
|
|||
|
es_NEW
|
|||
|
warning - incompatible stripping characters and condition:
|
|||
|
SFX I unan <20>nen unar
|
|||
|
|
|||
|
es_MX
|
|||
|
warning - incompatible stripping characters and condition:
|
|||
|
SFX A a ote e
|
|||
|
SFX W umar <20>se [ae]husar
|
|||
|
SFX W emir i<><69>is e<>ir
|
|||
|
|
|||
|
lt_LT
|
|||
|
warning - incompatible stripping characters and condition:
|
|||
|
SFX U ti siuosi tis
|
|||
|
SFX U ti siuosi tis
|
|||
|
SFX U ti siesi tis
|
|||
|
SFX U ti siesi tis
|
|||
|
SFX U ti sis tis
|
|||
|
SFX U ti sis tis
|
|||
|
SFX U ti sim<69>s tis
|
|||
|
SFX U ti sim<69>s tis
|
|||
|
SFX U ti sit<69>s tis
|
|||
|
SFX U ti sit<69>s tis
|
|||
|
|
|||
|
nn_NO
|
|||
|
warning - incompatible stripping characters and condition:
|
|||
|
SFX D ar rar [^fmk]er
|
|||
|
SFX U <20>re orde ere
|
|||
|
SFX U <20>re ort ere
|
|||
|
|
|||
|
pt_PT
|
|||
|
warning - incompatible stripping characters and condition:
|
|||
|
SFX g <20>os oas <20>o
|
|||
|
SFX g <20>os oas <20>o
|
|||
|
|
|||
|
ro_RO
|
|||
|
warning - bad field number:
|
|||
|
SFX L 0 le [^cg] i
|
|||
|
SFX L 0 i [cg] i
|
|||
|
SFX U 0 i [^i] ii
|
|||
|
warning - incompatible stripping characters and condition:
|
|||
|
SFX P l i l [<- there is an unnecessary tabulator here)
|
|||
|
SFX I a ii [gc] a
|
|||
|
warning - bad field number:
|
|||
|
SFX I a ii [gc] a
|
|||
|
SFX I a ei [^cg] a
|
|||
|
|
|||
|
sk_SK
|
|||
|
warning - incompatible stripping characters and condition:
|
|||
|
SFX T <20>a<EFBFBD> ol<6F> kla<6C>
|
|||
|
SFX T <20>a<EFBFBD> ol<6F>c kla<6C>
|
|||
|
SFX T s<>a<EFBFBD> <20>l<EFBFBD> sla<6C>
|
|||
|
SFX T s<>a<EFBFBD> <20>l<EFBFBD>c sla<6C>
|
|||
|
SFX R <20>c<EFBFBD> l<>iem <20>c<EFBFBD>
|
|||
|
SFX R i<>s<EFBFBD> <20>tie mias<61>
|
|||
|
SFX R iez<65> iem [^i]ez<65>
|
|||
|
SFX R iez<65> ie<69> [^i]ez<65>
|
|||
|
SFX R iez<65> ie [^i]ez<65>
|
|||
|
SFX R iez<65> eme [^i]ez<65>
|
|||
|
SFX R iez<65> ete [^i]ez<65>
|
|||
|
SFX R iez<65> <20> [^i]ez<65>
|
|||
|
SFX R iez<65> <20>c [^i]ez<65>
|
|||
|
SFX R iez<65> z [^i]ez<65>
|
|||
|
SFX R iez<65> me [^i]ez<65>
|
|||
|
SFX R iez<65> te [^i]ez<65>
|
|||
|
|
|||
|
sv_SE
|
|||
|
warning - bad field number:
|
|||
|
SFX C 0 net nets [^e]n
|
|||
|
--------------------------------------------------------
|
|||
|
|
|||
|
2005-08-01: Hunspell 1.0.8 release
|
|||
|
|
|||
|
- improved compound word support
|
|||
|
- fix German S handling
|
|||
|
- port MySpell files and MAP feature
|
|||
|
|
|||
|
2005-07-22: Hunspell 1.0.7 release
|
|||
|
|
|||
|
2005-07-21: new home page: http://hunspell.sourceforge.net
|