Update internal Hunspell to latest release (1.6.2)

(cherry picked from commit 3b89cd9c28)
This commit is contained in:
Juergen Spitzmueller 2017-09-18 18:12:21 +02:00
parent 2c9fd212fd
commit cf1a276ec4
110 changed files with 31656 additions and 40795 deletions

View File

@ -1,12 +0,0 @@
GPL 2.0/LGPL 2.1/MPL 1.1 tri-license
The contents of this software may be used under the terms of
the GNU General Public License Version 2 or later (the "GPL"), or
the GNU Lesser General Public License Version 2.1 or later (the "LGPL",
see COPYING.LGPL) or (excepting the LGPLed GNU gettext library in the
intl/ directory) the Mozilla Public License Version 1.1 or later
(the "MPL", see COPYING.MPL).
Software distributed under these licenses is distributed on an "AS IS" basis,
WITHOUT WARRANTY OF ANY KIND, either express or implied. See the licences
for the specific language governing rights and limitations under the licenses.

View File

@ -1,515 +0,0 @@
GNU LESSER GENERAL PUBLIC LICENSE
Version 2.1, February 1999
Copyright (C) 1991, 1999 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
[This is the first released version of the Lesser GPL. It also counts
as the successor of the GNU Library Public License, version 2, hence
the version number 2.1.]
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
Licenses are intended to guarantee your freedom to share and change
free software--to make sure the software is free for all its users.
This license, the Lesser General Public License, applies to some
specially designated software packages--typically libraries--of the
Free Software Foundation and other authors who decide to use it. You
can use it too, but we suggest you first think carefully about whether
this license or the ordinary General Public License is the better
strategy to use in any particular case, based on the explanations
below.
When we speak of free software, we are referring to freedom of use,
not price. Our General Public Licenses are designed to make sure that
you have the freedom to distribute copies of free software (and charge
for this service if you wish); that you receive source code or can get
it if you want it; that you can change the software and use pieces of
it in new free programs; and that you are informed that you can do
these things.
To protect your rights, we need to make restrictions that forbid
distributors to deny you these rights or to ask you to surrender these
rights. These restrictions translate to certain responsibilities for
you if you distribute copies of the library or if you modify it.
For example, if you distribute copies of the library, whether gratis
or for a fee, you must give the recipients all the rights that we gave
you. You must make sure that they, too, receive or can get the source
code. If you link other code with the library, you must provide
complete object files to the recipients, so that they can relink them
with the library after making changes to the library and recompiling
it. And you must show them these terms so they know their rights.
We protect your rights with a two-step method: (1) we copyright the
library, and (2) we offer you this license, which gives you legal
permission to copy, distribute and/or modify the library.
To protect each distributor, we want to make it very clear that
there is no warranty for the free library. Also, if the library is
modified by someone else and passed on, the recipients should know
that what they have is not the original version, so that the original
author's reputation will not be affected by problems that might be
introduced by others.
^L
Finally, software patents pose a constant threat to the existence of
any free program. We wish to make sure that a company cannot
effectively restrict the users of a free program by obtaining a
restrictive license from a patent holder. Therefore, we insist that
any patent license obtained for a version of the library must be
consistent with the full freedom of use specified in this license.
Most GNU software, including some libraries, is covered by the
ordinary GNU General Public License. This license, the GNU Lesser
General Public License, applies to certain designated libraries, and
is quite different from the ordinary General Public License. We use
this license for certain libraries in order to permit linking those
libraries into non-free programs.
When a program is linked with a library, whether statically or using
a shared library, the combination of the two is legally speaking a
combined work, a derivative of the original library. The ordinary
General Public License therefore permits such linking only if the
entire combination fits its criteria of freedom. The Lesser General
Public License permits more lax criteria for linking other code with
the library.
We call this license the "Lesser" General Public License because it
does Less to protect the user's freedom than the ordinary General
Public License. It also provides other free software developers Less
of an advantage over competing non-free programs. These disadvantages
are the reason we use the ordinary General Public License for many
libraries. However, the Lesser license provides advantages in certain
special circumstances.
For example, on rare occasions, there may be a special need to
encourage the widest possible use of a certain library, so that it
becomes
a de-facto standard. To achieve this, non-free programs must be
allowed to use the library. A more frequent case is that a free
library does the same job as widely used non-free libraries. In this
case, there is little to gain by limiting the free library to free
software only, so we use the Lesser General Public License.
In other cases, permission to use a particular library in non-free
programs enables a greater number of people to use a large body of
free software. For example, permission to use the GNU C Library in
non-free programs enables many more people to use the whole GNU
operating system, as well as its variant, the GNU/Linux operating
system.
Although the Lesser General Public License is Less protective of the
users' freedom, it does ensure that the user of a program that is
linked with the Library has the freedom and the wherewithal to run
that program using a modified version of the Library.
The precise terms and conditions for copying, distribution and
modification follow. Pay close attention to the difference between a
"work based on the library" and a "work that uses the library". The
former contains code derived from the library, whereas the latter must
be combined with the library in order to run.
^L
GNU LESSER GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License Agreement applies to any software library or other
program which contains a notice placed by the copyright holder or
other authorized party saying it may be distributed under the terms of
this Lesser General Public License (also called "this License").
Each licensee is addressed as "you".
A "library" means a collection of software functions and/or data
prepared so as to be conveniently linked with application programs
(which use some of those functions and data) to form executables.
The "Library", below, refers to any such software library or work
which has been distributed under these terms. A "work based on the
Library" means either the Library or any derivative work under
copyright law: that is to say, a work containing the Library or a
portion of it, either verbatim or with modifications and/or translated
straightforwardly into another language. (Hereinafter, translation is
included without limitation in the term "modification".)
"Source code" for a work means the preferred form of the work for
making modifications to it. For a library, complete source code means
all the source code for all modules it contains, plus any associated
interface definition files, plus the scripts used to control
compilation
and installation of the library.
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running a program using the Library is not restricted, and output from
such a program is covered only if its contents constitute a work based
on the Library (independent of the use of the Library in a tool for
writing it). Whether that is true depends on what the Library does
and what the program that uses the Library does.
1. You may copy and distribute verbatim copies of the Library's
complete source code as you receive it, in any medium, provided that
you conspicuously and appropriately publish on each copy an
appropriate copyright notice and disclaimer of warranty; keep intact
all the notices that refer to this License and to the absence of any
warranty; and distribute a copy of this License along with the
Library.
You may charge a fee for the physical act of transferring a copy,
and you may at your option offer warranty protection in exchange for a
fee.
2. You may modify your copy or copies of the Library or any portion
of it, thus forming a work based on the Library, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) The modified work must itself be a software library.
b) You must cause the files modified to carry prominent notices
stating that you changed the files and the date of any change.
c) You must cause the whole of the work to be licensed at no
charge to all third parties under the terms of this License.
d) If a facility in the modified Library refers to a function or a
table of data to be supplied by an application program that uses
the facility, other than as an argument passed when the facility
is invoked, then you must make a good faith effort to ensure that,
in the event an application does not supply such function or
table, the facility still operates, and performs whatever part of
its purpose remains meaningful.
(For example, a function in a library to compute square roots has
a purpose that is entirely well-defined independent of the
application. Therefore, Subsection 2d requires that any
application-supplied function or table used by this function must
be optional: if the application does not supply it, the square
root function must still compute square roots.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Library,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Library, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote
it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Library.
In addition, mere aggregation of another work not based on the Library
with the Library (or with a work based on the Library) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may opt to apply the terms of the ordinary GNU General Public
License instead of this License to a given copy of the Library. To do
this, you must alter all the notices that refer to this License, so
that they refer to the ordinary GNU General Public License, version 2,
instead of to this License. (If a newer version than version 2 of the
ordinary GNU General Public License has appeared, then you can specify
that version instead if you wish.) Do not make any other change in
these notices.
^L
Once this change is made in a given copy, it is irreversible for
that copy, so the ordinary GNU General Public License applies to all
subsequent copies and derivative works made from that copy.
This option is useful when you wish to copy part of the code of
the Library into a program that is not a library.
4. You may copy and distribute the Library (or a portion or
derivative of it, under Section 2) in object code or executable form
under the terms of Sections 1 and 2 above provided that you accompany
it with the complete corresponding machine-readable source code, which
must be distributed under the terms of Sections 1 and 2 above on a
medium customarily used for software interchange.
If distribution of object code is made by offering access to copy
from a designated place, then offering equivalent access to copy the
source code from the same place satisfies the requirement to
distribute the source code, even though third parties are not
compelled to copy the source along with the object code.
5. A program that contains no derivative of any portion of the
Library, but is designed to work with the Library by being compiled or
linked with it, is called a "work that uses the Library". Such a
work, in isolation, is not a derivative work of the Library, and
therefore falls outside the scope of this License.
However, linking a "work that uses the Library" with the Library
creates an executable that is a derivative of the Library (because it
contains portions of the Library), rather than a "work that uses the
library". The executable is therefore covered by this License.
Section 6 states terms for distribution of such executables.
When a "work that uses the Library" uses material from a header file
that is part of the Library, the object code for the work may be a
derivative work of the Library even though the source code is not.
Whether this is true is especially significant if the work can be
linked without the Library, or if the work is itself a library. The
threshold for this to be true is not precisely defined by law.
If such an object file uses only numerical parameters, data
structure layouts and accessors, and small macros and small inline
functions (ten lines or less in length), then the use of the object
file is unrestricted, regardless of whether it is legally a derivative
work. (Executables containing this object code plus portions of the
Library will still fall under Section 6.)
Otherwise, if the work is a derivative of the Library, you may
distribute the object code for the work under the terms of Section 6.
Any executables containing that work also fall under Section 6,
whether or not they are linked directly with the Library itself.
^L
6. As an exception to the Sections above, you may also combine or
link a "work that uses the Library" with the Library to produce a
work containing portions of the Library, and distribute that work
under terms of your choice, provided that the terms permit
modification of the work for the customer's own use and reverse
engineering for debugging such modifications.
You must give prominent notice with each copy of the work that the
Library is used in it and that the Library and its use are covered by
this License. You must supply a copy of this License. If the work
during execution displays copyright notices, you must include the
copyright notice for the Library among them, as well as a reference
directing the user to the copy of this License. Also, you must do one
of these things:
a) Accompany the work with the complete corresponding
machine-readable source code for the Library including whatever
changes were used in the work (which must be distributed under
Sections 1 and 2 above); and, if the work is an executable linked
with the Library, with the complete machine-readable "work that
uses the Library", as object code and/or source code, so that the
user can modify the Library and then relink to produce a modified
executable containing the modified Library. (It is understood
that the user who changes the contents of definitions files in the
Library will not necessarily be able to recompile the application
to use the modified definitions.)
b) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (1) uses at run time a
copy of the library already present on the user's computer system,
rather than copying library functions into the executable, and (2)
will operate properly with a modified version of the library, if
the user installs one, as long as the modified version is
interface-compatible with the version that the work was made with.
c) Accompany the work with a written offer, valid for at
least three years, to give the same user the materials
specified in Subsection 6a, above, for a charge no more
than the cost of performing this distribution.
d) If distribution of the work is made by offering access to copy
from a designated place, offer equivalent access to copy the above
specified materials from the same place.
e) Verify that the user has already received a copy of these
materials or that you have already sent this user a copy.
For an executable, the required form of the "work that uses the
Library" must include any data and utility programs needed for
reproducing the executable from it. However, as a special exception,
the materials to be distributed need not include anything that is
normally distributed (in either source or binary form) with the major
components (compiler, kernel, and so on) of the operating system on
which the executable runs, unless that component itself accompanies
the executable.
It may happen that this requirement contradicts the license
restrictions of other proprietary libraries that do not normally
accompany the operating system. Such a contradiction means you cannot
use both them and the Library together in an executable that you
distribute.
^L
7. You may place library facilities that are a work based on the
Library side-by-side in a single library together with other library
facilities not covered by this License, and distribute such a combined
library, provided that the separate distribution of the work based on
the Library and of the other library facilities is otherwise
permitted, and provided that you do these two things:
a) Accompany the combined library with a copy of the same work
based on the Library, uncombined with any other library
facilities. This must be distributed under the terms of the
Sections above.
b) Give prominent notice with the combined library of the fact
that part of it is a work based on the Library, and explaining
where to find the accompanying uncombined form of the same work.
8. You may not copy, modify, sublicense, link with, or distribute
the Library except as expressly provided under this License. Any
attempt otherwise to copy, modify, sublicense, link with, or
distribute the Library is void, and will automatically terminate your
rights under this License. However, parties who have received copies,
or rights, from you under this License will not have their licenses
terminated so long as such parties remain in full compliance.
9. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Library or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Library (or any work based on the
Library), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Library or works based on it.
10. Each time you redistribute the Library (or any work based on the
Library), the recipient automatically receives a license from the
original licensor to copy, distribute, link with or modify the Library
subject to these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties with
this License.
^L
11. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Library at all. For example, if a patent
license would not permit royalty-free redistribution of the Library by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Library.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply, and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
12. If the distribution and/or use of the Library is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Library under this License
may add an explicit geographical distribution limitation excluding those
countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
13. The Free Software Foundation may publish revised and/or new
versions of the Lesser General Public License from time to time.
Such new versions will be similar in spirit to the present version,
but may differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the Library
specifies a version number of this License which applies to it and
"any later version", you have the option of following the terms and
conditions either of that version or of any later version published by
the Free Software Foundation. If the Library does not specify a
license version number, you may choose any version ever published by
the Free Software Foundation.
^L
14. If you wish to incorporate parts of the Library into other free
programs whose distribution conditions are incompatible with these,
write to the author to ask for permission. For software which is
copyrighted by the Free Software Foundation, write to the Free
Software Foundation; we sometimes make exceptions for this. Our
decision will be guided by the two goals of preserving the free status
of all derivatives of our free software and of promoting the sharing
and reuse of software generally.
NO WARRANTY
15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO
WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW.
EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR
OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE
LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME
THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN
WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY
AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU
FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR
CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE
LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING
RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A
FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF
SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH
DAMAGES.
END OF TERMS AND CONDITIONS
^L
How to Apply These Terms to Your New Libraries
If you develop a new library, and you want it to be of the greatest
possible use to the public, we recommend making it free software that
everyone can redistribute and change. You can do so by permitting
redistribution under these terms (or, alternatively, under the terms
of the ordinary General Public License).
To apply these terms, attach the following notices to the library.
It is safest to attach them to the start of each source file to most
effectively convey the exclusion of warranty; and each file should
have at least the "copyright" line and a pointer to where the full
notice is found.
<one line to give the library's name and a brief idea of what it
does.>
Copyright (C) <year> <name of author>
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License as published by the Free Software Foundation; either
version 2 of the License, or (at your option) any later version.
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; if not, write to the Free Software
Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Also add information on how to contact you by electronic and paper
mail.
You should also get your employer (if you work as a programmer) or
your
school, if any, to sign a "copyright disclaimer" for the library, if
necessary. Here is a sample; alter the names:
Yoyodyne, Inc., hereby disclaims all copyright interest in the
library `Frob' (a library for tweaking knobs) written by James
Random Hacker.
<signature of Ty Coon>, 1 April 1990
Ty Coon, President of Vice
That's all there is to it!

View File

@ -1,182 +0,0 @@
About Hunspell
--------------
Hunspell is a spell checker and morphological analyzer library and program
designed for languages with rich morphology and complex word compounding or
character encoding. Hunspell interfaces: Ispell-like terminal interface
using Curses library, Ispell pipe interface, OpenOffice.org UNO module.
Hunspell's code base comes from the OpenOffice.org MySpell
(http://lingucomponent.openoffice.org/MySpell-3.zip). See README.MYSPELL,
AUTHORS.MYSPELL and license.myspell files.
Hunspell is designed to eventually replace Myspell in OpenOffice.org.
Main features of Hunspell spell checker and morphological analyzer:
- Unicode support (affix rules work only with the first 65535 Unicode characters)
- Morphological analysis (in custom item and arrangement style) and stemming
- Max. 65535 affix classes and twofold affix stripping (for agglutinative
languages, like Azeri, Basque, Estonian, Finnish, Hungarian, Turkish, etc.)
- Support complex compoundings (for example, Hungarian and German)
- Support language specific features (for example, special casing of
Azeri and Turkish dotted i, or German sharp s)
- Handle conditional affixes, circumfixes, fogemorphemes,
forbidden words, pseudoroots and homonyms.
- Free software (LGPL, GPL, MPL tri-license)
Compiling on Unix/Linux
-----------------------
./configure
make
make install
For dictionary development, use the --with-warnings option of configure.
For interactive user interface of Hunspell executable, use the --with-ui option.
The developer packages you need to compile Hunspell's interface:
glibc-devel
optional developer packages:
ncurses (need for --with-ui), eg. libncursesw5 for UTF-8
readline (for fancy input line editing,
configure parameter: --with-readline)
locale and gettext (but you can also use the
--with-included-gettext configure parameter)
Hunspell distribution uses new Autoconf (2.59) and Automake (1.9).
Compiling on Windows
--------------------
1. Compiling with Windows SDK
Download the free Windows SDK of Microsoft, open a command prompt
window and cd into hunspell/src/win_api. Use the following command
to compile hunspell:
vcbuild
2. Compiling in Cygwin environment
Download and install Cygwin environment for Windows with the following
extra packages:
make
gcc-g++ development package
mingw development package (for cygwin.dll free native Windows compilation)
ncurses, readline (for user interface)
iconv (character conversion)
2.1. Cygwin1.dll dependent compiling
Open a Cygwin shell, cd into the hunspell root directory:
./configure
make
make install
For dictionary development, use the --with-warnings option of configure.
For interactive user interface of Hunspell executable, use the --with-ui option.
readline configure parameter: --with-readline (for fancy input line editing)
1.2. Cygwin1.dll free compiling
Open a Cygwin shell, cd into the hunspell/src/win_api and
make -f Makefile.cygwin
Testing
-------
Testing Hunspell (see tests in tests/ subdirectory):
make check
or with Valgrind debugger:
make check
VALGRIND=[Valgrind_tool] make check
For example:
make check
VALGRIND=memcheck make check
Documentation
-------------
features and dictionary format:
man 5 hunspell
man hunspell
hunspell -h
http://hunspell.sourceforge.net
Usage
-----
The src/tools dictionary contains ten executables after compiling
(or some of them are in the src/win_api):
affixcompress: dictionary generation from large (millions of words) vocabularies
analyze: example of spell checking, stemming and morphological analysis
chmorph: example of automatic morphological generation and conversion
example: example of spell checking and suggestion
hunspell: main program for spell checking and others (see manual)
hunzip: decompressor of hzip format
hzip: compressor of hzip format
makealias: alias compression (Hunspell only, not back compatible with MySpell)
munch: dictionary generation from vocabularies (it needs an affix file, too).
unmunch: list all recognized words of a MySpell dictionary
wordforms: word generation (Hunspell version of unmunch)
After compiling and installing (see INSTALL) you can
run the Hunspell spell checker (compiled with user interface)
with a Hunspell or Myspell dictionary:
hunspell -d en_US text.txt
or without interface:
hunspell
hunspell -d en_UK -l <text.txt
Dictionaries consist of an affix and dictionary file, see tests/
or http://wiki.services.openoffice.org/wiki/Dictionaries.
Using Hunspell library with GCC
-------------------------------
Including in your program:
#include <hunspell.hxx>
Linking with Hunspell static library:
g++ -lhunspell example.cxx
Dictionaries
------------
Myspell & Hunspell dictionaries:
http://extensions.libreoffice.org
http://cgit.freedesktop.org/libreoffice/dictionaries
http://extensions.openoffice.org
http://wiki.services.openoffice.org/wiki/Dictionaries
Aspell dictionaries (need some conversion):
ftp://ftp.gnu.org/gnu/aspell/dict
Conversion steps: see relevant feature request at http://hunspell.sf.net.
László Németh
nemeth at numbertext org

View File

@ -1,21 +0,0 @@
Hunspell spell checker and morphological analyser library
Documentation, tests, examples: http://hunspell.sourceforge.net
Author of Hunspell:
László Németh (nemethl (at) gyorsposta.hu)
Hunspell based on OpenOffice.org's Myspell. MySpell's author:
Kevin Hendricks (kevin.hendricks (at) sympatico.ca)
License: GPL 2.0/LGPL 2.1/MPL 1.1 tri-license
The contents of this library may be used under the terms of
the GNU General Public License Version 2 or later (the "GPL"), or
the GNU Lesser General Public License Version 2.1 or later (the "LGPL",
see http://gnu.org/copyleft/lesser.html) or the Mozilla Public License
Version 1.1 or later (the "MPL", see http://mozilla.org/MPL/MPL-1.1.html).
Software distributed under these licenses is distributed on an "AS IS" basis,
WITHOUT WARRANTY OF ANY KIND, either express or implied. See the licences
for the specific language governing rights and limitations under the licenses.

File diff suppressed because it is too large Load Diff

View File

@ -1,144 +0,0 @@
#ifndef _AFFIX_HXX_
#define _AFFIX_HXX_
#include "hunvisapi.h"
#include "atypes.hxx"
#include "baseaffix.hxx"
#include "affixmgr.hxx"
/* A Prefix Entry */
class LIBHUNSPELL_DLL_EXPORTED PfxEntry : protected AffEntry
{
private:
PfxEntry(const PfxEntry&);
PfxEntry& operator = (const PfxEntry&);
private:
AffixMgr* pmyMgr;
PfxEntry * next;
PfxEntry * nexteq;
PfxEntry * nextne;
PfxEntry * flgnxt;
public:
PfxEntry(AffixMgr* pmgr, affentry* dp );
~PfxEntry();
inline bool allowCross() { return ((opts & aeXPRODUCT) != 0); }
struct hentry * checkword(const char * word, int len, char in_compound,
const FLAG needflag = FLAG_NULL);
struct hentry * check_twosfx(const char * word, int len, char in_compound, const FLAG needflag = NULL);
char * check_morph(const char * word, int len, char in_compound,
const FLAG needflag = FLAG_NULL);
char * check_twosfx_morph(const char * word, int len,
char in_compound, const FLAG needflag = FLAG_NULL);
inline FLAG getFlag() { return aflag; }
inline const char * getKey() { return appnd; }
char * add(const char * word, int len);
inline short getKeyLen() { return appndl; }
inline const char * getMorph() { return morphcode; }
inline const unsigned short * getCont() { return contclass; }
inline short getContLen() { return contclasslen; }
inline PfxEntry * getNext() { return next; }
inline PfxEntry * getNextNE() { return nextne; }
inline PfxEntry * getNextEQ() { return nexteq; }
inline PfxEntry * getFlgNxt() { return flgnxt; }
inline void setNext(PfxEntry * ptr) { next = ptr; }
inline void setNextNE(PfxEntry * ptr) { nextne = ptr; }
inline void setNextEQ(PfxEntry * ptr) { nexteq = ptr; }
inline void setFlgNxt(PfxEntry * ptr) { flgnxt = ptr; }
inline char * nextchar(char * p);
inline int test_condition(const char * st);
};
/* A Suffix Entry */
class LIBHUNSPELL_DLL_EXPORTED SfxEntry : protected AffEntry
{
private:
SfxEntry(const SfxEntry&);
SfxEntry& operator = (const SfxEntry&);
private:
AffixMgr* pmyMgr;
char * rappnd;
SfxEntry * next;
SfxEntry * nexteq;
SfxEntry * nextne;
SfxEntry * flgnxt;
SfxEntry * l_morph;
SfxEntry * r_morph;
SfxEntry * eq_morph;
public:
SfxEntry(AffixMgr* pmgr, affentry* dp );
~SfxEntry();
inline bool allowCross() { return ((opts & aeXPRODUCT) != 0); }
struct hentry * checkword(const char * word, int len, int optflags,
PfxEntry* ppfx, char ** wlst, int maxSug, int * ns,
// const FLAG cclass = FLAG_NULL, const FLAG needflag = FLAG_NULL, char in_compound=IN_CPD_NOT);
const FLAG cclass = FLAG_NULL, const FLAG needflag = FLAG_NULL, const FLAG badflag = 0);
struct hentry * check_twosfx(const char * word, int len, int optflags, PfxEntry* ppfx, const FLAG needflag = NULL);
char * check_twosfx_morph(const char * word, int len, int optflags,
PfxEntry* ppfx, const FLAG needflag = FLAG_NULL);
struct hentry * get_next_homonym(struct hentry * he);
struct hentry * get_next_homonym(struct hentry * word, int optflags, PfxEntry* ppfx,
const FLAG cclass, const FLAG needflag);
inline FLAG getFlag() { return aflag; }
inline const char * getKey() { return rappnd; }
char * add(const char * word, int len);
inline const char * getMorph() { return morphcode; }
inline const unsigned short * getCont() { return contclass; }
inline short getContLen() { return contclasslen; }
inline const char * getAffix() { return appnd; }
inline short getKeyLen() { return appndl; }
inline SfxEntry * getNext() { return next; }
inline SfxEntry * getNextNE() { return nextne; }
inline SfxEntry * getNextEQ() { return nexteq; }
inline SfxEntry * getLM() { return l_morph; }
inline SfxEntry * getRM() { return r_morph; }
inline SfxEntry * getEQM() { return eq_morph; }
inline SfxEntry * getFlgNxt() { return flgnxt; }
inline void setNext(SfxEntry * ptr) { next = ptr; }
inline void setNextNE(SfxEntry * ptr) { nextne = ptr; }
inline void setNextEQ(SfxEntry * ptr) { nexteq = ptr; }
inline void setFlgNxt(SfxEntry * ptr) { flgnxt = ptr; }
inline char * nextchar(char * p);
inline int test_condition(const char * st, const char * begin);
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,252 +0,0 @@
#ifndef _AFFIXMGR_HXX_
#define _AFFIXMGR_HXX_
#include "hunvisapi.h"
#include <stdio.h>
#include "atypes.hxx"
#include "baseaffix.hxx"
#include "hashmgr.hxx"
#include "phonet.hxx"
#include "replist.hxx"
// check flag duplication
#define dupSFX (1 << 0)
#define dupPFX (1 << 1)
class PfxEntry;
class SfxEntry;
class LIBHUNSPELL_DLL_EXPORTED AffixMgr
{
PfxEntry * pStart[SETSIZE];
SfxEntry * sStart[SETSIZE];
PfxEntry * pFlag[SETSIZE];
SfxEntry * sFlag[SETSIZE];
HashMgr * pHMgr;
HashMgr ** alldic;
int * maxdic;
char * keystring;
char * trystring;
char * encoding;
struct cs_info * csconv;
int utf8;
int complexprefixes;
FLAG compoundflag;
FLAG compoundbegin;
FLAG compoundmiddle;
FLAG compoundend;
FLAG compoundroot;
FLAG compoundforbidflag;
FLAG compoundpermitflag;
int compoundmoresuffixes;
int checkcompounddup;
int checkcompoundrep;
int checkcompoundcase;
int checkcompoundtriple;
int simplifiedtriple;
FLAG forbiddenword;
FLAG nosuggest;
FLAG nongramsuggest;
FLAG needaffix;
int cpdmin;
int numrep;
replentry * reptable;
RepList * iconvtable;
RepList * oconvtable;
int nummap;
mapentry * maptable;
int numbreak;
char ** breaktable;
int numcheckcpd;
patentry * checkcpdtable;
int simplifiedcpd;
int numdefcpd;
flagentry * defcpdtable;
phonetable * phone;
int maxngramsugs;
int maxcpdsugs;
int maxdiff;
int onlymaxdiff;
int nosplitsugs;
int sugswithdots;
int cpdwordmax;
int cpdmaxsyllable;
char * cpdvowels;
w_char * cpdvowels_utf16;
int cpdvowels_utf16_len;
char * cpdsyllablenum;
const char * pfxappnd; // BUG: not stateless
const char * sfxappnd; // BUG: not stateless
FLAG sfxflag; // BUG: not stateless
char * derived; // BUG: not stateless
SfxEntry * sfx; // BUG: not stateless
PfxEntry * pfx; // BUG: not stateless
int checknum;
char * wordchars;
unsigned short * wordchars_utf16;
int wordchars_utf16_len;
char * ignorechars;
unsigned short * ignorechars_utf16;
int ignorechars_utf16_len;
char * version;
char * lang;
int langnum;
FLAG lemma_present;
FLAG circumfix;
FLAG onlyincompound;
FLAG keepcase;
FLAG forceucase;
FLAG warn;
int forbidwarn;
FLAG substandard;
int checksharps;
int fullstrip;
int havecontclass; // boolean variable
char contclasses[CONTSIZE]; // flags of possible continuing classes (twofold affix)
public:
AffixMgr(const char * affpath, HashMgr** ptr, int * md,
const char * key = NULL);
~AffixMgr();
struct hentry * affix_check(const char * word, int len,
const unsigned short needflag = (unsigned short) 0,
char in_compound = IN_CPD_NOT);
struct hentry * prefix_check(const char * word, int len,
char in_compound, const FLAG needflag = FLAG_NULL);
inline int isSubset(const char * s1, const char * s2);
struct hentry * prefix_check_twosfx(const char * word, int len,
char in_compound, const FLAG needflag = FLAG_NULL);
inline int isRevSubset(const char * s1, const char * end_of_s2, int len);
struct hentry * suffix_check(const char * word, int len, int sfxopts,
PfxEntry* ppfx, char ** wlst, int maxSug, int * ns,
const FLAG cclass = FLAG_NULL, const FLAG needflag = FLAG_NULL,
char in_compound = IN_CPD_NOT);
struct hentry * suffix_check_twosfx(const char * word, int len,
int sfxopts, PfxEntry* ppfx, const FLAG needflag = FLAG_NULL);
char * affix_check_morph(const char * word, int len,
const FLAG needflag = FLAG_NULL, char in_compound = IN_CPD_NOT);
char * prefix_check_morph(const char * word, int len,
char in_compound, const FLAG needflag = FLAG_NULL);
char * suffix_check_morph (const char * word, int len, int sfxopts,
PfxEntry * ppfx, const FLAG cclass = FLAG_NULL,
const FLAG needflag = FLAG_NULL, char in_compound = IN_CPD_NOT);
char * prefix_check_twosfx_morph(const char * word, int len,
char in_compound, const FLAG needflag = FLAG_NULL);
char * suffix_check_twosfx_morph(const char * word, int len,
int sfxopts, PfxEntry * ppfx, const FLAG needflag = FLAG_NULL);
char * morphgen(char * ts, int wl, const unsigned short * ap,
unsigned short al, char * morph, char * targetmorph, int level);
int expand_rootword(struct guessword * wlst, int maxn, const char * ts,
int wl, const unsigned short * ap, unsigned short al, char * bad,
int, char *);
short get_syllable (const char * word, int wlen);
int cpdrep_check(const char * word, int len);
int cpdpat_check(const char * word, int len, hentry * r1, hentry * r2,
const char affixed);
int defcpd_check(hentry *** words, short wnum, hentry * rv,
hentry ** rwords, char all);
int cpdcase_check(const char * word, int len);
inline int candidate_check(const char * word, int len);
void setcminmax(int * cmin, int * cmax, const char * word, int len);
struct hentry * compound_check(const char * word, int len, short wordnum,
short numsyllable, short maxwordnum, short wnum, hentry ** words,
char hu_mov_rule, char is_sug, int * info);
int compound_check_morph(const char * word, int len, short wordnum,
short numsyllable, short maxwordnum, short wnum, hentry ** words,
char hu_mov_rule, char ** result, char * partresult);
struct hentry * lookup(const char * word);
int get_numrep() const;
struct replentry * get_reptable() const;
RepList * get_iconvtable() const;
RepList * get_oconvtable() const;
struct phonetable * get_phonetable() const;
int get_nummap() const;
struct mapentry * get_maptable() const;
int get_numbreak() const;
char ** get_breaktable() const;
char * get_encoding();
int get_langnum() const;
char * get_key_string();
char * get_try_string() const;
const char * get_wordchars() const;
unsigned short * get_wordchars_utf16(int * len) const;
char * get_ignore() const;
unsigned short * get_ignore_utf16(int * len) const;
int get_compound() const;
FLAG get_compoundflag() const;
FLAG get_compoundbegin() const;
FLAG get_forbiddenword() const;
FLAG get_nosuggest() const;
FLAG get_nongramsuggest() const;
FLAG get_needaffix() const;
FLAG get_onlyincompound() const;
FLAG get_compoundroot() const;
FLAG get_lemma_present() const;
int get_checknum() const;
const char * get_prefix() const;
const char * get_suffix() const;
const char * get_derived() const;
const char * get_version() const;
int have_contclass() const;
int get_utf8() const;
int get_complexprefixes() const;
char * get_suffixed(char ) const;
int get_maxngramsugs() const;
int get_maxcpdsugs() const;
int get_maxdiff() const;
int get_onlymaxdiff() const;
int get_nosplitsugs() const;
int get_sugswithdots(void) const;
FLAG get_keepcase(void) const;
FLAG get_forceucase(void) const;
FLAG get_warn(void) const;
int get_forbidwarn(void) const;
int get_checksharps(void) const;
char * encode_flag(unsigned short aflag) const;
int get_fullstrip() const;
private:
int parse_file(const char * affpath, const char * key);
int parse_flag(char * line, unsigned short * out, FileMgr * af);
int parse_num(char * line, int * out, FileMgr * af);
int parse_cpdsyllable(char * line, FileMgr * af);
int parse_reptable(char * line, FileMgr * af);
int parse_convtable(char * line, FileMgr * af, RepList ** rl, const char * keyword);
int parse_phonetable(char * line, FileMgr * af);
int parse_maptable(char * line, FileMgr * af);
int parse_breaktable(char * line, FileMgr * af);
int parse_checkcpdtable(char * line, FileMgr * af);
int parse_defcpdtable(char * line, FileMgr * af);
int parse_affix(char * line, const char at, FileMgr * af, char * dupflags);
void reverse_condition(char *);
void debugflag(char * result, unsigned short flag);
int condlen(char *);
int encodeit(affentry &entry, char * cs);
int build_pfxtree(PfxEntry* pfxptr);
int build_sfxtree(SfxEntry* sfxptr);
int process_pfx_order();
int process_sfx_order();
PfxEntry * process_pfx_in_order(PfxEntry * ptr, PfxEntry * nptr);
SfxEntry * process_sfx_in_order(SfxEntry * ptr, SfxEntry * nptr);
int process_pfx_tree_to_list();
int process_sfx_tree_to_list();
int redundant_condition(char, char * strip, int stripl,
const char * cond, int);
void finishFileMgr(FileMgr *afflst);
};
#endif

View File

@ -1,107 +0,0 @@
#ifndef _ATYPES_HXX_
#define _ATYPES_HXX_
#ifndef HUNSPELL_WARNING
#include <stdio.h>
#ifdef HUNSPELL_WARNING_ON
#define HUNSPELL_WARNING fprintf
#else
// empty inline function to switch off warnings (instead of the C99 standard variadic macros)
static inline void HUNSPELL_WARNING(FILE *, const char *, ...) {}
#endif
#endif
// HUNSTEM def.
#define HUNSTEM
#include "hashmgr.hxx"
#include "w_char.hxx"
#define SETSIZE 256
#define CONTSIZE 65536
#define MAXWORDLEN 100
#define MAXWORDUTF8LEN 256
// affentry options
#define aeXPRODUCT (1 << 0)
#define aeUTF8 (1 << 1)
#define aeALIASF (1 << 2)
#define aeALIASM (1 << 3)
#define aeLONGCOND (1 << 4)
// compound options
#define IN_CPD_NOT 0
#define IN_CPD_BEGIN 1
#define IN_CPD_END 2
#define IN_CPD_OTHER 3
// info options
#define SPELL_COMPOUND (1 << 0)
#define SPELL_FORBIDDEN (1 << 1)
#define SPELL_ALLCAP (1 << 2)
#define SPELL_NOCAP (1 << 3)
#define SPELL_INITCAP (1 << 4)
#define SPELL_ORIGCAP (1 << 5)
#define SPELL_WARN (1 << 6)
#define MAXLNLEN 8192
#define MINCPDLEN 3
#define MAXCOMPOUND 10
#define MAXCONDLEN 20
#define MAXCONDLEN_1 (MAXCONDLEN - sizeof(char *))
#define MAXACC 1000
#define FLAG unsigned short
#define FLAG_NULL 0x00
#define FREE_FLAG(a) a = 0
#define TESTAFF( a, b , c ) (flag_bsearch((unsigned short *) a, (unsigned short) b, c))
struct affentry
{
char * strip;
char * appnd;
unsigned char stripl;
unsigned char appndl;
char numconds;
char opts;
unsigned short aflag;
unsigned short * contclass;
short contclasslen;
union {
char conds[MAXCONDLEN];
struct {
char conds1[MAXCONDLEN_1];
char * conds2;
} l;
} c;
char * morphcode;
};
struct guessword {
char * word;
bool allow;
char * orig;
};
struct mapentry {
char ** set;
int len;
};
struct flagentry {
FLAG * def;
int len;
};
struct patentry {
char * pattern;
char * pattern2;
char * pattern3;
FLAG cond;
FLAG cond2;
};
#endif

View File

@ -1,32 +0,0 @@
#ifndef _BASEAFF_HXX_
#define _BASEAFF_HXX_
#include "hunvisapi.h"
class LIBHUNSPELL_DLL_EXPORTED AffEntry
{
private:
AffEntry(const AffEntry&);
AffEntry& operator = (const AffEntry&);
protected:
AffEntry() {}
char * appnd;
char * strip;
unsigned char appndl;
unsigned char stripl;
char numconds;
char opts;
unsigned short aflag;
union {
char conds[MAXCONDLEN];
struct {
char conds1[MAXCONDLEN_1];
char * conds2;
} l;
} c;
char * morphcode;
unsigned short * contclass;
short contclasslen;
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,223 +0,0 @@
#ifndef __CSUTILHXX__
#define __CSUTILHXX__
#include "hunvisapi.h"
// First some base level utility routines
#include <string.h>
#include "w_char.hxx"
#include "htypes.hxx"
#ifdef MOZILLA_CLIENT
#include "nscore.h" // for mozalloc headers
#endif
// casing
#define NOCAP 0
#define INITCAP 1
#define ALLCAP 2
#define HUHCAP 3
#define HUHINITCAP 4
// default encoding and keystring
#define SPELL_ENCODING "ISO8859-1"
#define SPELL_KEYSTRING "qwertyuiop|asdfghjkl|zxcvbnm"
// default morphological fields
#define MORPH_STEM "st:"
#define MORPH_ALLOMORPH "al:"
#define MORPH_POS "po:"
#define MORPH_DERI_PFX "dp:"
#define MORPH_INFL_PFX "ip:"
#define MORPH_TERM_PFX "tp:"
#define MORPH_DERI_SFX "ds:"
#define MORPH_INFL_SFX "is:"
#define MORPH_TERM_SFX "ts:"
#define MORPH_SURF_PFX "sp:"
#define MORPH_FREQ "fr:"
#define MORPH_PHON "ph:"
#define MORPH_HYPH "hy:"
#define MORPH_PART "pa:"
#define MORPH_FLAG "fl:"
#define MORPH_HENTRY "_H:"
#define MORPH_TAG_LEN strlen(MORPH_STEM)
#define MSEP_FLD ' '
#define MSEP_REC '\n'
#define MSEP_ALT '\v'
// default flags
#define DEFAULTFLAGS 65510
#define FORBIDDENWORD 65510
#define ONLYUPCASEFLAG 65511
// fopen or optional _wfopen to fix long pathname problem of WIN32
LIBHUNSPELL_DLL_EXPORTED FILE * myfopen(const char * path, const char * mode);
// convert UTF-16 characters to UTF-8
LIBHUNSPELL_DLL_EXPORTED char * u16_u8(char * dest, int size, const w_char * src, int srclen);
// convert UTF-8 characters to UTF-16
LIBHUNSPELL_DLL_EXPORTED int u8_u16(w_char * dest, int size, const char * src);
// sort 2-byte vector
LIBHUNSPELL_DLL_EXPORTED void flag_qsort(unsigned short flags[], int begin, int end);
// binary search in 2-byte vector
LIBHUNSPELL_DLL_EXPORTED int flag_bsearch(unsigned short flags[], unsigned short flag, int right);
// remove end of line char(s)
LIBHUNSPELL_DLL_EXPORTED void mychomp(char * s);
// duplicate string
LIBHUNSPELL_DLL_EXPORTED char * mystrdup(const char * s);
// strcat for limited length destination string
LIBHUNSPELL_DLL_EXPORTED char * mystrcat(char * dest, const char * st, int max);
// duplicate reverse of string
LIBHUNSPELL_DLL_EXPORTED char * myrevstrdup(const char * s);
// parse into tokens with char delimiter
LIBHUNSPELL_DLL_EXPORTED char * mystrsep(char ** sptr, const char delim);
// parse into tokens with char delimiter
LIBHUNSPELL_DLL_EXPORTED char * mystrsep2(char ** sptr, const char delim);
// parse into tokens with char delimiter
LIBHUNSPELL_DLL_EXPORTED char * mystrrep(char *, const char *, const char *);
// append s to ends of every lines in text
LIBHUNSPELL_DLL_EXPORTED void strlinecat(char * lines, const char * s);
// tokenize into lines with new line
LIBHUNSPELL_DLL_EXPORTED int line_tok(const char * text, char *** lines, char breakchar);
// tokenize into lines with new line and uniq in place
LIBHUNSPELL_DLL_EXPORTED char * line_uniq(char * text, char breakchar);
LIBHUNSPELL_DLL_EXPORTED char * line_uniq_app(char ** text, char breakchar);
// change oldchar to newchar in place
LIBHUNSPELL_DLL_EXPORTED char * tr(char * text, char oldc, char newc);
// reverse word
LIBHUNSPELL_DLL_EXPORTED int reverseword(char *);
// reverse word
LIBHUNSPELL_DLL_EXPORTED int reverseword_utf(char *);
// remove duplicates
LIBHUNSPELL_DLL_EXPORTED int uniqlist(char ** list, int n);
// free character array list
LIBHUNSPELL_DLL_EXPORTED void freelist(char *** list, int n);
// character encoding information
struct cs_info {
unsigned char ccase;
unsigned char clower;
unsigned char cupper;
};
LIBHUNSPELL_DLL_EXPORTED int initialize_utf_tbl();
LIBHUNSPELL_DLL_EXPORTED void free_utf_tbl();
LIBHUNSPELL_DLL_EXPORTED unsigned short unicodetoupper(unsigned short c, int langnum);
LIBHUNSPELL_DLL_EXPORTED unsigned short unicodetolower(unsigned short c, int langnum);
LIBHUNSPELL_DLL_EXPORTED int unicodeisalpha(unsigned short c);
LIBHUNSPELL_DLL_EXPORTED struct cs_info * get_current_cs(const char * es);
// get language identifiers of language codes
LIBHUNSPELL_DLL_EXPORTED int get_lang_num(const char * lang);
// get characters of the given 8bit encoding with lower- and uppercase forms
LIBHUNSPELL_DLL_EXPORTED char * get_casechars(const char * enc);
// convert null terminated string to all caps using encoding
LIBHUNSPELL_DLL_EXPORTED void enmkallcap(char * d, const char * p, const char * encoding);
// convert null terminated string to all little using encoding
LIBHUNSPELL_DLL_EXPORTED void enmkallsmall(char * d, const char * p, const char * encoding);
// convert null terminated string to have initial capital using encoding
LIBHUNSPELL_DLL_EXPORTED void enmkinitcap(char * d, const char * p, const char * encoding);
// convert null terminated string to all caps
LIBHUNSPELL_DLL_EXPORTED void mkallcap(char * p, const struct cs_info * csconv);
// convert null terminated string to all little
LIBHUNSPELL_DLL_EXPORTED void mkallsmall(char * p, const struct cs_info * csconv);
// convert null terminated string to have initial capital
LIBHUNSPELL_DLL_EXPORTED void mkinitcap(char * p, const struct cs_info * csconv);
// convert first nc characters of UTF-8 string to little
LIBHUNSPELL_DLL_EXPORTED void mkallsmall_utf(w_char * u, int nc, int langnum);
// convert first nc characters of UTF-8 string to capital
LIBHUNSPELL_DLL_EXPORTED void mkallcap_utf(w_char * u, int nc, int langnum);
// get type of capitalization
LIBHUNSPELL_DLL_EXPORTED int get_captype(char * q, int nl, cs_info *);
// get type of capitalization (UTF-8)
LIBHUNSPELL_DLL_EXPORTED int get_captype_utf8(w_char * q, int nl, int langnum);
// strip all ignored characters in the string
LIBHUNSPELL_DLL_EXPORTED void remove_ignored_chars_utf(char * word, unsigned short ignored_chars[], int ignored_len);
// strip all ignored characters in the string
LIBHUNSPELL_DLL_EXPORTED void remove_ignored_chars(char * word, char * ignored_chars);
LIBHUNSPELL_DLL_EXPORTED int parse_string(char * line, char ** out, int ln);
LIBHUNSPELL_DLL_EXPORTED int parse_array(char * line, char ** out, unsigned short ** out_utf16,
int * out_utf16_len, int utf8, int ln);
LIBHUNSPELL_DLL_EXPORTED int fieldlen(const char * r);
LIBHUNSPELL_DLL_EXPORTED char * copy_field(char * dest, const char * morph, const char * var);
LIBHUNSPELL_DLL_EXPORTED int morphcmp(const char * s, const char * t);
LIBHUNSPELL_DLL_EXPORTED int get_sfxcount(const char * morph);
// conversion function for protected memory
LIBHUNSPELL_DLL_EXPORTED void store_pointer(char * dest, char * source);
// conversion function for protected memory
LIBHUNSPELL_DLL_EXPORTED char * get_stored_pointer(const char * s);
// hash entry macros
LIBHUNSPELL_DLL_EXPORTED inline char* HENTRY_DATA(struct hentry *h)
{
char *ret;
if (!h->var)
ret = NULL;
else if (h->var & H_OPT_ALIASM)
ret = get_stored_pointer(HENTRY_WORD(h) + h->blen + 1);
else
ret = HENTRY_WORD(h) + h->blen + 1;
return ret;
}
// NULL-free version for warning-free OOo build
LIBHUNSPELL_DLL_EXPORTED inline const char* HENTRY_DATA2(const struct hentry *h)
{
const char *ret;
if (!h->var)
ret = "";
else if (h->var & H_OPT_ALIASM)
ret = get_stored_pointer(HENTRY_WORD(h) + h->blen + 1);
else
ret = HENTRY_WORD(h) + h->blen + 1;
return ret;
}
LIBHUNSPELL_DLL_EXPORTED inline char* HENTRY_FIND(struct hentry *h, const char *p)
{
return (HENTRY_DATA(h) ? strstr(HENTRY_DATA(h), p) : NULL);
}
#define w_char_eq(a,b) (((a).l == (b).l) && ((a).h == (b).h))
#endif

View File

@ -1,182 +0,0 @@
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#include <stdio.h>
#include "dictmgr.hxx"
#include "csutil.hxx"
DictMgr::DictMgr(const char * dictpath, const char * etype) : numdict(0)
{
// load list of etype entries
pdentry = (dictentry *)malloc(MAXDICTIONARIES*sizeof(struct dictentry));
if (pdentry) {
if (parse_file(dictpath, etype)) {
numdict = 0;
// no dictionary.lst found is okay
}
}
}
DictMgr::~DictMgr()
{
dictentry * pdict = NULL;
if (pdentry) {
pdict = pdentry;
for (int i=0;i<numdict;i++) {
if (pdict->lang) {
free(pdict->lang);
pdict->lang = NULL;
}
if (pdict->region) {
free(pdict->region);
pdict->region=NULL;
}
if (pdict->filename) {
free(pdict->filename);
pdict->filename = NULL;
}
pdict++;
}
free(pdentry);
pdentry = NULL;
pdict = NULL;
}
numdict = 0;
}
// read in list of etype entries and build up structure to describe them
int DictMgr::parse_file(const char * dictpath, const char * etype)
{
int i;
char line[MAXDICTENTRYLEN+1];
dictentry * pdict = pdentry;
// open the dictionary list file
FILE * dictlst;
dictlst = myfopen(dictpath,"r");
if (!dictlst) {
return 1;
}
// step one is to parse the dictionary list building up the
// descriptive structures
// read in each line ignoring any that dont start with etype
while (fgets(line,MAXDICTENTRYLEN,dictlst)) {
mychomp(line);
/* parse in a dictionary entry */
if (strncmp(line,etype,4) == 0) {
if (numdict < MAXDICTIONARIES) {
char * tp = line;
char * piece;
i = 0;
while ((piece=mystrsep(&tp,' '))) {
if (*piece != '\0') {
switch(i) {
case 0: break;
case 1: pdict->lang = mystrdup(piece); break;
case 2: if (strcmp (piece, "ANY") == 0)
pdict->region = mystrdup("");
else
pdict->region = mystrdup(piece);
break;
case 3: pdict->filename = mystrdup(piece); break;
default: break;
}
i++;
}
free(piece);
}
if (i == 4) {
numdict++;
pdict++;
} else {
switch (i) {
case 3:
free(pdict->region);
pdict->region=NULL;
/* FALLTHROUGH */
case 2:
free(pdict->lang);
pdict->lang=NULL;
default:
break;
}
fprintf(stderr,"dictionary list corruption in line \"%s\"\n",line);
fflush(stderr);
}
}
}
}
fclose(dictlst);
return 0;
}
// return text encoding of dictionary
int DictMgr::get_list(dictentry ** ppentry)
{
*ppentry = pdentry;
return numdict;
}
// strip strings into token based on single char delimiter
// acts like strsep() but only uses a delim char and not
// a delim string
char * DictMgr::mystrsep(char ** stringp, const char delim)
{
char * rv = NULL;
char * mp = *stringp;
size_t n = strlen(mp);
if (n > 0) {
char * dp = (char *)memchr(mp,(int)((unsigned char)delim),n);
if (dp) {
*stringp = dp+1;
size_t nc = dp - mp;
rv = (char *) malloc(nc+1);
if (rv) {
memcpy(rv,mp,nc);
*(rv+nc) = '\0';
}
} else {
rv = (char *) malloc(n+1);
if (rv) {
memcpy(rv, mp, n);
*(rv+n) = '\0';
*stringp = mp + n;
}
}
}
return rv;
}
// replaces strdup with ansi version
char * DictMgr::mystrdup(const char * s)
{
char * d = NULL;
if (s) {
int sl = strlen(s)+1;
d = (char *) malloc(sl);
if (d) memcpy(d,s,sl);
}
return d;
}
// remove cross-platform text line end characters
void DictMgr:: mychomp(char * s)
{
int k = strlen(s);
if ((k > 0) && ((*(s+k-1)=='\r') || (*(s+k-1)=='\n'))) *(s+k-1) = '\0';
if ((k > 1) && (*(s+k-2) == '\r')) *(s+k-2) = '\0';
}

View File

@ -1,39 +0,0 @@
#ifndef _DICTMGR_HXX_
#define _DICTMGR_HXX_
#include "hunvisapi.h"
#define MAXDICTIONARIES 100
#define MAXDICTENTRYLEN 1024
struct dictentry {
char * filename;
char * lang;
char * region;
};
class LIBHUNSPELL_DLL_EXPORTED DictMgr
{
private:
DictMgr(const DictMgr&);
DictMgr& operator = (const DictMgr&);
private:
int numdict;
dictentry * pdentry;
public:
DictMgr(const char * dictpath, const char * etype);
~DictMgr();
int get_list(dictentry** ppentry);
private:
int parse_file(const char * dictpath, const char * etype);
char * mystrsep(char ** stringp, const char delim);
char * mystrdup(const char * s);
void mychomp(char * s);
};
#endif

View File

@ -1,53 +0,0 @@
#include "license.hunspell"
#include "license.myspell"
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "filemgr.hxx"
#include "csutil.hxx"
int FileMgr::fail(const char * err, const char * par) {
fprintf(stderr, err, par);
return -1;
}
FileMgr::FileMgr(const char * file, const char * key)
: hin(NULL)
, linenum(0)
{
in[0] = '\0';
fin = myfopen(file, "r");
if (!fin) {
// check hzipped file
char * st = (char *) malloc(strlen(file) + strlen(HZIP_EXTENSION) + 1);
if (st) {
strcpy(st, file);
strcat(st, HZIP_EXTENSION);
hin = new Hunzip(st, key);
free(st);
}
}
if (!fin && !hin) fail(MSG_OPEN, file);
}
FileMgr::~FileMgr()
{
if (fin) fclose(fin);
if (hin) delete hin;
}
char * FileMgr::getline() {
const char * l;
linenum++;
if (fin) return fgets(in, BUFSIZE - 1, fin);
if (hin && ((l = hin->getline()) != NULL)) return strcpy(in, l);
linenum--;
return NULL;
}
int FileMgr::getlinenum() {
return linenum;
}

View File

@ -1,28 +0,0 @@
/* file manager class - read lines of files [filename] OR [filename.hz] */
#ifndef _FILEMGR_HXX_
#define _FILEMGR_HXX_
#include "hunvisapi.h"
#include "hunzip.hxx"
#include <stdio.h>
class LIBHUNSPELL_DLL_EXPORTED FileMgr
{
private:
FileMgr(const FileMgr&);
FileMgr& operator = (const FileMgr&);
protected:
FILE * fin;
Hunzip * hin;
char in[BUFSIZE + 50]; // input buffer
int fail(const char * err, const char * par);
int linenum;
public:
FileMgr(const char * filename, const char * key = NULL);
~FileMgr();
char * getline();
int getlinenum();
};
#endif

View File

@ -1,936 +0,0 @@
#include "license.hunspell"
#include "license.myspell"
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
#include <limits>
#include "hashmgr.hxx"
#include "csutil.hxx"
#include "atypes.hxx"
// build a hash table from a munched word list
HashMgr::HashMgr(const char * tpath, const char * apath, const char * key)
: tablesize(0)
, tableptr(NULL)
, userword(0)
, flag_mode(FLAG_CHAR)
, complexprefixes(0)
, utf8(0)
, forbiddenword(FORBIDDENWORD) // forbidden word signing flag
, numaliasf(0)
, aliasf(NULL)
, aliasflen(0)
, numaliasm(0)
, aliasm(NULL)
{
langnum = 0;
lang = NULL;
enc = NULL;
csconv = 0;
ignorechars = NULL;
ignorechars_utf16 = NULL;
ignorechars_utf16_len = 0;
load_config(apath, key);
int ec = load_tables(tpath, key);
if (ec) {
/* error condition - what should we do here */
HUNSPELL_WARNING(stderr, "Hash Manager Error : %d\n",ec);
if (tableptr) {
free(tableptr);
tableptr = NULL;
}
tablesize = 0;
}
}
HashMgr::~HashMgr()
{
if (tableptr) {
// now pass through hash table freeing up everything
// go through column by column of the table
for (int i=0; i < tablesize; i++) {
struct hentry * pt = tableptr[i];
struct hentry * nt = NULL;
while(pt) {
nt = pt->next;
if (pt->astr && (!aliasf || TESTAFF(pt->astr, ONLYUPCASEFLAG, pt->alen))) free(pt->astr);
free(pt);
pt = nt;
}
}
free(tableptr);
}
tablesize = 0;
if (aliasf) {
for (int j = 0; j < (numaliasf); j++) free(aliasf[j]);
free(aliasf);
aliasf = NULL;
if (aliasflen) {
free(aliasflen);
aliasflen = NULL;
}
}
if (aliasm) {
for (int j = 0; j < (numaliasm); j++) free(aliasm[j]);
free(aliasm);
aliasm = NULL;
}
#ifndef OPENOFFICEORG
#ifndef MOZILLA_CLIENT
if (utf8) free_utf_tbl();
#endif
#endif
if (enc) free(enc);
if (lang) free(lang);
if (ignorechars) free(ignorechars);
if (ignorechars_utf16) free(ignorechars_utf16);
#ifdef MOZILLA_CLIENT
delete [] csconv;
#endif
}
// lookup a root word in the hashtable
struct hentry * HashMgr::lookup(const char *word) const
{
struct hentry * dp;
if (tableptr) {
dp = tableptr[hash(word)];
if (!dp) return NULL;
for ( ; dp != NULL; dp = dp->next) {
if (strcmp(word, dp->word) == 0) return dp;
}
}
return NULL;
}
// add a word to the hash table (private)
int HashMgr::add_word(const char * word, int wbl, int wcl, unsigned short * aff,
int al, const char * desc, bool onlyupcase)
{
bool upcasehomonym = false;
int descl = desc ? (aliasm ? sizeof(char *) : strlen(desc) + 1) : 0;
// variable-length hash record with word and optional fields
struct hentry* hp =
(struct hentry *) malloc (sizeof(struct hentry) + wbl + descl);
if (!hp) return 1;
char * hpw = hp->word;
strcpy(hpw, word);
if (ignorechars != NULL) {
if (utf8) {
remove_ignored_chars_utf(hpw, ignorechars_utf16, ignorechars_utf16_len);
} else {
remove_ignored_chars(hpw, ignorechars);
}
}
if (complexprefixes) {
if (utf8) reverseword_utf(hpw); else reverseword(hpw);
}
int i = hash(hpw);
hp->blen = (unsigned char) wbl;
hp->clen = (unsigned char) wcl;
hp->alen = (short) al;
hp->astr = aff;
hp->next = NULL;
hp->next_homonym = NULL;
// store the description string or its pointer
if (desc) {
hp->var = H_OPT;
if (aliasm) {
hp->var += H_OPT_ALIASM;
store_pointer(hpw + wbl + 1, get_aliasm(atoi(desc)));
} else {
strcpy(hpw + wbl + 1, desc);
if (complexprefixes) {
if (utf8) reverseword_utf(HENTRY_DATA(hp));
else reverseword(HENTRY_DATA(hp));
}
}
if (strstr(HENTRY_DATA(hp), MORPH_PHON)) hp->var += H_OPT_PHON;
} else hp->var = 0;
struct hentry * dp = tableptr[i];
if (!dp) {
tableptr[i] = hp;
return 0;
}
while (dp->next != NULL) {
if ((!dp->next_homonym) && (strcmp(hp->word, dp->word) == 0)) {
// remove hidden onlyupcase homonym
if (!onlyupcase) {
if ((dp->astr) && TESTAFF(dp->astr, ONLYUPCASEFLAG, dp->alen)) {
free(dp->astr);
dp->astr = hp->astr;
dp->alen = hp->alen;
free(hp);
return 0;
} else {
dp->next_homonym = hp;
}
} else {
upcasehomonym = true;
}
}
dp=dp->next;
}
if (strcmp(hp->word, dp->word) == 0) {
// remove hidden onlyupcase homonym
if (!onlyupcase) {
if ((dp->astr) && TESTAFF(dp->astr, ONLYUPCASEFLAG, dp->alen)) {
free(dp->astr);
dp->astr = hp->astr;
dp->alen = hp->alen;
free(hp);
return 0;
} else {
dp->next_homonym = hp;
}
} else {
upcasehomonym = true;
}
}
if (!upcasehomonym) {
dp->next = hp;
} else {
// remove hidden onlyupcase homonym
if (hp->astr) free(hp->astr);
free(hp);
}
return 0;
}
int HashMgr::add_hidden_capitalized_word(char * word, int wbl, int wcl,
unsigned short * flags, int flagslen, char * dp, int captype)
{
if (flags == NULL)
flagslen = 0;
// add inner capitalized forms to handle the following allcap forms:
// Mixed caps: OpenOffice.org -> OPENOFFICE.ORG
// Allcaps with suffixes: CIA's -> CIA'S
if (((captype == HUHCAP) || (captype == HUHINITCAP) ||
((captype == ALLCAP) && (flagslen != 0))) &&
!((flagslen != 0) && TESTAFF(flags, forbiddenword, flagslen))) {
unsigned short * flags2 = (unsigned short *) malloc (sizeof(unsigned short) * (flagslen+1));
if (!flags2) return 1;
if (flagslen) memcpy(flags2, flags, flagslen * sizeof(unsigned short));
flags2[flagslen] = ONLYUPCASEFLAG;
if (utf8) {
char st[BUFSIZE];
w_char w[BUFSIZE];
int wlen = u8_u16(w, BUFSIZE, word);
mkallsmall_utf(w, wlen, langnum);
mkallcap_utf(w, 1, langnum);
u16_u8(st, BUFSIZE, w, wlen);
return add_word(st,wbl,wcl,flags2,flagslen+1,dp, true);
} else {
mkallsmall(word, csconv);
mkinitcap(word, csconv);
return add_word(word,wbl,wcl,flags2,flagslen+1,dp, true);
}
}
return 0;
}
// detect captype and modify word length for UTF-8 encoding
int HashMgr::get_clen_and_captype(const char * word, int wbl, int * captype) {
int len;
if (utf8) {
w_char dest_utf[BUFSIZE];
len = u8_u16(dest_utf, BUFSIZE, word);
*captype = get_captype_utf8(dest_utf, len, langnum);
} else {
len = wbl;
*captype = get_captype((char *) word, len, csconv);
}
return len;
}
// remove word (personal dictionary function for standalone applications)
int HashMgr::remove(const char * word)
{
struct hentry * dp = lookup(word);
while (dp) {
if (dp->alen == 0 || !TESTAFF(dp->astr, forbiddenword, dp->alen)) {
unsigned short * flags =
(unsigned short *) malloc(sizeof(short) * (dp->alen + 1));
if (!flags) return 1;
for (int i = 0; i < dp->alen; i++) flags[i] = dp->astr[i];
flags[dp->alen] = forbiddenword;
dp->astr = flags;
dp->alen++;
flag_qsort(flags, 0, dp->alen);
}
dp = dp->next_homonym;
}
return 0;
}
/* remove forbidden flag to add a personal word to the hash */
int HashMgr::remove_forbidden_flag(const char * word) {
struct hentry * dp = lookup(word);
if (!dp) return 1;
while (dp) {
if (dp->astr && TESTAFF(dp->astr, forbiddenword, dp->alen)) {
if (dp->alen == 1) dp->alen = 0; // XXX forbidden words of personal dic.
else {
unsigned short * flags2 =
(unsigned short *) malloc(sizeof(short) * (dp->alen - 1));
if (!flags2) return 1;
int i, j = 0;
for (i = 0; i < dp->alen; i++) {
if (dp->astr[i] != forbiddenword) flags2[j++] = dp->astr[i];
}
dp->alen--;
dp->astr = flags2; // XXX allowed forbidden words
}
}
dp = dp->next_homonym;
}
return 0;
}
// add a custom dic. word to the hash table (public)
int HashMgr::add(const char * word)
{
unsigned short * flags = NULL;
int al = 0;
if (remove_forbidden_flag(word)) {
int captype;
int wbl = strlen(word);
int wcl = get_clen_and_captype(word, wbl, &captype);
add_word(word, wbl, wcl, flags, al, NULL, false);
return add_hidden_capitalized_word((char *) word, wbl, wcl, flags, al, NULL, captype);
}
return 0;
}
int HashMgr::add_with_affix(const char * word, const char * example)
{
// detect captype and modify word length for UTF-8 encoding
struct hentry * dp = lookup(example);
remove_forbidden_flag(word);
if (dp && dp->astr) {
int captype;
int wbl = strlen(word);
int wcl = get_clen_and_captype(word, wbl, &captype);
if (aliasf) {
add_word(word, wbl, wcl, dp->astr, dp->alen, NULL, false);
} else {
unsigned short * flags = (unsigned short *) malloc (dp->alen * sizeof(short));
if (flags) {
memcpy((void *) flags, (void *) dp->astr, dp->alen * sizeof(short));
add_word(word, wbl, wcl, flags, dp->alen, NULL, false);
} else return 1;
}
return add_hidden_capitalized_word((char *) word, wbl, wcl, dp->astr, dp->alen, NULL, captype);
}
return 1;
}
// walk the hash table entry by entry - null at end
// initialize: col=-1; hp = NULL; hp = walk_hashtable(&col, hp);
struct hentry * HashMgr::walk_hashtable(int &col, struct hentry * hp) const
{
if (hp && hp->next != NULL) return hp->next;
for (col++; col < tablesize; col++) {
if (tableptr[col]) return tableptr[col];
}
// null at end and reset to start
col = -1;
return NULL;
}
// load a munched word list and build a hash table on the fly
int HashMgr::load_tables(const char * tpath, const char * key)
{
int al;
char * ap;
char * dp;
char * dp2;
unsigned short * flags;
char * ts;
// open dictionary file
FileMgr * dict = new FileMgr(tpath, key);
if (dict == NULL) return 1;
// first read the first line of file to get hash table size */
if ((ts = dict->getline()) == NULL) {
HUNSPELL_WARNING(stderr, "error: empty dic file %s\n", tpath);
delete dict;
return 2;
}
mychomp(ts);
/* remove byte order mark */
if (strncmp(ts,"\xEF\xBB\xBF",3) == 0) {
memmove(ts, ts+3, strlen(ts+3)+1);
// warning: dic file begins with byte order mark: possible incompatibility with old Hunspell versions
}
tablesize = atoi(ts);
int nExtra = 5 + USERWORD;
if (tablesize <= 0 || (tablesize >= (std::numeric_limits<int>::max() - 1 - nExtra) / int(sizeof(struct hentry *)))) {
HUNSPELL_WARNING(stderr, "error: line 1: missing or bad word count in the dic file\n");
delete dict;
return 4;
}
tablesize += nExtra;
if ((tablesize % 2) == 0) tablesize++;
// allocate the hash table
tableptr = (struct hentry **) calloc(tablesize, sizeof(struct hentry *));
if (! tableptr) {
delete dict;
return 3;
}
// loop through all words on much list and add to hash
// table and create word and affix strings
while ((ts = dict->getline()) != NULL) {
mychomp(ts);
// split each line into word and morphological description
dp = ts;
while ((dp = strchr(dp, ':')) != NULL) {
if ((dp > ts + 3) && (*(dp - 3) == ' ' || *(dp - 3) == '\t')) {
for (dp -= 4; dp >= ts && (*dp == ' ' || *dp == '\t'); dp--);
if (dp < ts) { // missing word
dp = NULL;
} else {
*(dp + 1) = '\0';
dp = dp + 2;
}
break;
}
dp++;
}
// tabulator is the old morphological field separator
dp2 = strchr(ts, '\t');
if (dp2 && (!dp || dp2 < dp)) {
*dp2 = '\0';
dp = dp2 + 1;
}
// split each line into word and affix char strings
// "\/" signs slash in words (not affix separator)
// "/" at beginning of the line is word character (not affix separator)
ap = strchr(ts,'/');
while (ap) {
if (ap == ts) {
ap++;
continue;
} else if (*(ap - 1) != '\\') break;
// replace "\/" with "/"
for (char * sp = ap - 1; *sp; *sp = *(sp + 1), sp++);
ap = strchr(ap,'/');
}
if (ap) {
*ap = '\0';
if (aliasf) {
int index = atoi(ap + 1);
al = get_aliasf(index, &flags, dict);
if (!al) {
HUNSPELL_WARNING(stderr, "error: line %d: bad flag vector alias\n", dict->getlinenum());
*ap = '\0';
}
} else {
al = decode_flags(&flags, ap + 1, dict);
if (al == -1) {
HUNSPELL_WARNING(stderr, "Can't allocate memory.\n");
delete dict;
return 6;
}
flag_qsort(flags, 0, al);
}
} else {
al = 0;
ap = NULL;
flags = NULL;
}
int captype;
int wbl = strlen(ts);
int wcl = get_clen_and_captype(ts, wbl, &captype);
// add the word and its index plus its capitalized form optionally
if (add_word(ts,wbl,wcl,flags,al,dp, false) ||
add_hidden_capitalized_word(ts, wbl, wcl, flags, al, dp, captype)) {
delete dict;
return 5;
}
}
delete dict;
return 0;
}
// the hash function is a simple load and rotate
// algorithm borrowed
int HashMgr::hash(const char * word) const
{
long hv = 0;
for (int i=0; i < 4 && *word != 0; i++)
hv = (hv << 8) | (*word++);
while (*word != 0) {
ROTATE(hv,ROTATE_LEN);
hv ^= (*word++);
}
return (unsigned long) hv % tablesize;
}
int HashMgr::decode_flags(unsigned short ** result, char * flags, FileMgr * af) {
int len;
if (*flags == '\0') {
*result = NULL;
return 0;
}
switch (flag_mode) {
case FLAG_LONG: { // two-character flags (1x2yZz -> 1x 2y Zz)
len = strlen(flags);
if (len%2 == 1) HUNSPELL_WARNING(stderr, "error: line %d: bad flagvector\n", af->getlinenum());
len /= 2;
*result = (unsigned short *) malloc(len * sizeof(short));
if (!*result) return -1;
for (int i = 0; i < len; i++) {
(*result)[i] = (((unsigned short) flags[i * 2]) << 8) + (unsigned short) flags[i * 2 + 1];
}
break;
}
case FLAG_NUM: { // decimal numbers separated by comma (4521,23,233 -> 4521 23 233)
int i;
len = 1;
char * src = flags;
unsigned short * dest;
char * p;
for (p = flags; *p; p++) {
if (*p == ',') len++;
}
*result = (unsigned short *) malloc(len * sizeof(short));
if (!*result) return -1;
dest = *result;
for (p = flags; *p; p++) {
if (*p == ',') {
i = atoi(src);
if (i >= DEFAULTFLAGS) HUNSPELL_WARNING(stderr, "error: line %d: flag id %d is too large (max: %d)\n",
af->getlinenum(), i, DEFAULTFLAGS - 1);
*dest = (unsigned short) i;
if (*dest == 0) HUNSPELL_WARNING(stderr, "error: line %d: 0 is wrong flag id\n", af->getlinenum());
src = p + 1;
dest++;
}
}
i = atoi(src);
if (i >= DEFAULTFLAGS) HUNSPELL_WARNING(stderr, "error: line %d: flag id %d is too large (max: %d)\n",
af->getlinenum(), i, DEFAULTFLAGS - 1);
*dest = (unsigned short) i;
if (*dest == 0) HUNSPELL_WARNING(stderr, "error: line %d: 0 is wrong flag id\n", af->getlinenum());
break;
}
case FLAG_UNI: { // UTF-8 characters
w_char w[BUFSIZE/2];
len = u8_u16(w, BUFSIZE/2, flags);
*result = (unsigned short *) malloc(len * sizeof(short));
if (!*result) return -1;
memcpy(*result, w, len * sizeof(short));
break;
}
default: { // Ispell's one-character flags (erfg -> e r f g)
unsigned short * dest;
len = strlen(flags);
*result = (unsigned short *) malloc(len * sizeof(short));
if (!*result) return -1;
dest = *result;
for (unsigned char * p = (unsigned char *) flags; *p; p++) {
*dest = (unsigned short) *p;
dest++;
}
}
}
return len;
}
unsigned short HashMgr::decode_flag(const char * f) {
unsigned short s = 0;
int i;
switch (flag_mode) {
case FLAG_LONG:
s = ((unsigned short) f[0] << 8) + (unsigned short) f[1];
break;
case FLAG_NUM:
i = atoi(f);
if (i >= DEFAULTFLAGS) HUNSPELL_WARNING(stderr, "error: flag id %d is too large (max: %d)\n", i, DEFAULTFLAGS - 1);
s = (unsigned short) i;
break;
case FLAG_UNI:
u8_u16((w_char *) &s, 1, f);
break;
default:
s = (unsigned short) *((unsigned char *)f);
}
if (s == 0) HUNSPELL_WARNING(stderr, "error: 0 is wrong flag id\n");
return s;
}
char * HashMgr::encode_flag(unsigned short f) {
unsigned char ch[10];
if (f==0) return mystrdup("(NULL)");
if (flag_mode == FLAG_LONG) {
ch[0] = (unsigned char) (f >> 8);
ch[1] = (unsigned char) (f - ((f >> 8) << 8));
ch[2] = '\0';
} else if (flag_mode == FLAG_NUM) {
sprintf((char *) ch, "%d", f);
} else if (flag_mode == FLAG_UNI) {
u16_u8((char *) &ch, 10, (w_char *) &f, 1);
} else {
ch[0] = (unsigned char) (f);
ch[1] = '\0';
}
return mystrdup((char *) ch);
}
// read in aff file and set flag mode
int HashMgr::load_config(const char * affpath, const char * key)
{
char * line; // io buffers
int firstline = 1;
// open the affix file
FileMgr * afflst = new FileMgr(affpath, key);
if (!afflst) {
HUNSPELL_WARNING(stderr, "Error - could not open affix description file %s\n",affpath);
return 1;
}
// read in each line ignoring any that do not
// start with a known line type indicator
while ((line = afflst->getline()) != NULL) {
mychomp(line);
/* remove byte order mark */
if (firstline) {
firstline = 0;
if (strncmp(line,"\xEF\xBB\xBF",3) == 0) memmove(line, line+3, strlen(line+3)+1);
}
/* parse in the try string */
if ((strncmp(line,"FLAG",4) == 0) && isspace(line[4])) {
if (flag_mode != FLAG_CHAR) {
HUNSPELL_WARNING(stderr, "error: line %d: multiple definitions of the FLAG affix file parameter\n", afflst->getlinenum());
}
if (strstr(line, "long")) flag_mode = FLAG_LONG;
if (strstr(line, "num")) flag_mode = FLAG_NUM;
if (strstr(line, "UTF-8")) flag_mode = FLAG_UNI;
if (flag_mode == FLAG_CHAR) {
HUNSPELL_WARNING(stderr, "error: line %d: FLAG needs `num', `long' or `UTF-8' parameter\n", afflst->getlinenum());
}
}
if (strncmp(line,"FORBIDDENWORD",13) == 0) {
char * st = NULL;
if (parse_string(line, &st, afflst->getlinenum())) {
delete afflst;
return 1;
}
forbiddenword = decode_flag(st);
free(st);
}
if (strncmp(line, "SET", 3) == 0) {
if (parse_string(line, &enc, afflst->getlinenum())) {
delete afflst;
return 1;
}
if (strcmp(enc, "UTF-8") == 0) {
utf8 = 1;
#ifndef OPENOFFICEORG
#ifndef MOZILLA_CLIENT
initialize_utf_tbl();
#endif
#endif
} else csconv = get_current_cs(enc);
}
if (strncmp(line, "LANG", 4) == 0) {
if (parse_string(line, &lang, afflst->getlinenum())) {
delete afflst;
return 1;
}
langnum = get_lang_num(lang);
}
/* parse in the ignored characters (for example, Arabic optional diacritics characters */
if (strncmp(line,"IGNORE",6) == 0) {
if (parse_array(line, &ignorechars, &ignorechars_utf16,
&ignorechars_utf16_len, utf8, afflst->getlinenum())) {
delete afflst;
return 1;
}
}
if ((strncmp(line,"AF",2) == 0) && isspace(line[2])) {
if (parse_aliasf(line, afflst)) {
delete afflst;
return 1;
}
}
if ((strncmp(line,"AM",2) == 0) && isspace(line[2])) {
if (parse_aliasm(line, afflst)) {
delete afflst;
return 1;
}
}
if (strncmp(line,"COMPLEXPREFIXES",15) == 0) complexprefixes = 1;
if (((strncmp(line,"SFX",3) == 0) || (strncmp(line,"PFX",3) == 0)) && isspace(line[3])) break;
}
if (csconv == NULL) csconv = get_current_cs(SPELL_ENCODING);
delete afflst;
return 0;
}
/* parse in the ALIAS table */
int HashMgr::parse_aliasf(char * line, FileMgr * af)
{
if (numaliasf != 0) {
HUNSPELL_WARNING(stderr, "error: line %d: multiple table definitions\n", af->getlinenum());
return 1;
}
char * tp = line;
char * piece;
int i = 0;
int np = 0;
piece = mystrsep(&tp, 0);
while (piece) {
if (*piece != '\0') {
switch(i) {
case 0: { np++; break; }
case 1: {
numaliasf = atoi(piece);
if (numaliasf < 1) {
numaliasf = 0;
aliasf = NULL;
aliasflen = NULL;
HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", af->getlinenum());
return 1;
}
aliasf = (unsigned short **) malloc(numaliasf * sizeof(unsigned short *));
aliasflen = (unsigned short *) malloc(numaliasf * sizeof(short));
if (!aliasf || !aliasflen) {
numaliasf = 0;
if (aliasf) free(aliasf);
if (aliasflen) free(aliasflen);
aliasf = NULL;
aliasflen = NULL;
return 1;
}
np++;
break;
}
default: break;
}
i++;
}
piece = mystrsep(&tp, 0);
}
if (np != 2) {
numaliasf = 0;
free(aliasf);
free(aliasflen);
aliasf = NULL;
aliasflen = NULL;
HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", af->getlinenum());
return 1;
}
/* now parse the numaliasf lines to read in the remainder of the table */
char * nl;
for (int j=0; j < numaliasf; j++) {
if ((nl = af->getline()) == NULL) return 1;
mychomp(nl);
tp = nl;
i = 0;
aliasf[j] = NULL;
aliasflen[j] = 0;
piece = mystrsep(&tp, 0);
while (piece) {
if (*piece != '\0') {
switch(i) {
case 0: {
if (strncmp(piece,"AF",2) != 0) {
numaliasf = 0;
free(aliasf);
free(aliasflen);
aliasf = NULL;
aliasflen = NULL;
HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum());
return 1;
}
break;
}
case 1: {
aliasflen[j] = (unsigned short) decode_flags(&(aliasf[j]), piece, af);
flag_qsort(aliasf[j], 0, aliasflen[j]);
break;
}
default: break;
}
i++;
}
piece = mystrsep(&tp, 0);
}
if (!aliasf[j]) {
free(aliasf);
free(aliasflen);
aliasf = NULL;
aliasflen = NULL;
numaliasf = 0;
HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum());
return 1;
}
}
return 0;
}
int HashMgr::is_aliasf() {
return (aliasf != NULL);
}
int HashMgr::get_aliasf(int index, unsigned short ** fvec, FileMgr * af) {
if ((index > 0) && (index <= numaliasf)) {
*fvec = aliasf[index - 1];
return aliasflen[index - 1];
}
HUNSPELL_WARNING(stderr, "error: line %d: bad flag alias index: %d\n", af->getlinenum(), index);
*fvec = NULL;
return 0;
}
/* parse morph alias definitions */
int HashMgr::parse_aliasm(char * line, FileMgr * af)
{
if (numaliasm != 0) {
HUNSPELL_WARNING(stderr, "error: line %d: multiple table definitions\n", af->getlinenum());
return 1;
}
char * tp = line;
char * piece;
int i = 0;
int np = 0;
piece = mystrsep(&tp, 0);
while (piece) {
if (*piece != '\0') {
switch(i) {
case 0: { np++; break; }
case 1: {
numaliasm = atoi(piece);
if (numaliasm < 1) {
HUNSPELL_WARNING(stderr, "error: line %d: bad entry number\n", af->getlinenum());
return 1;
}
aliasm = (char **) malloc(numaliasm * sizeof(char *));
if (!aliasm) {
numaliasm = 0;
return 1;
}
np++;
break;
}
default: break;
}
i++;
}
piece = mystrsep(&tp, 0);
}
if (np != 2) {
numaliasm = 0;
free(aliasm);
aliasm = NULL;
HUNSPELL_WARNING(stderr, "error: line %d: missing data\n", af->getlinenum());
return 1;
}
/* now parse the numaliasm lines to read in the remainder of the table */
char * nl = line;
for (int j=0; j < numaliasm; j++) {
if ((nl = af->getline()) == NULL) return 1;
mychomp(nl);
tp = nl;
i = 0;
aliasm[j] = NULL;
piece = mystrsep(&tp, ' ');
while (piece) {
if (*piece != '\0') {
switch(i) {
case 0: {
if (strncmp(piece,"AM",2) != 0) {
HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum());
numaliasm = 0;
free(aliasm);
aliasm = NULL;
return 1;
}
break;
}
case 1: {
// add the remaining of the line
if (*tp) {
*(tp - 1) = ' ';
tp = tp + strlen(tp);
}
if (complexprefixes) {
if (utf8) reverseword_utf(piece);
else reverseword(piece);
}
aliasm[j] = mystrdup(piece);
if (!aliasm[j]) {
numaliasm = 0;
free(aliasm);
aliasm = NULL;
return 1;
}
break; }
default: break;
}
i++;
}
piece = mystrsep(&tp, ' ');
}
if (!aliasm[j]) {
numaliasm = 0;
free(aliasm);
aliasm = NULL;
HUNSPELL_WARNING(stderr, "error: line %d: table is corrupt\n", af->getlinenum());
return 1;
}
}
return 0;
}
int HashMgr::is_aliasm() {
return (aliasm != NULL);
}
char * HashMgr::get_aliasm(int index) {
if ((index > 0) && (index <= numaliasm)) return aliasm[index - 1];
HUNSPELL_WARNING(stderr, "error: bad morph. alias index: %d\n", index);
return NULL;
}

View File

@ -1,69 +0,0 @@
#ifndef _HASHMGR_HXX_
#define _HASHMGR_HXX_
#include "hunvisapi.h"
#include <stdio.h>
#include "htypes.hxx"
#include "filemgr.hxx"
enum flag { FLAG_CHAR, FLAG_LONG, FLAG_NUM, FLAG_UNI };
class LIBHUNSPELL_DLL_EXPORTED HashMgr
{
int tablesize;
struct hentry ** tableptr;
int userword;
flag flag_mode;
int complexprefixes;
int utf8;
unsigned short forbiddenword;
int langnum;
char * enc;
char * lang;
struct cs_info * csconv;
char * ignorechars;
unsigned short * ignorechars_utf16;
int ignorechars_utf16_len;
int numaliasf; // flag vector `compression' with aliases
unsigned short ** aliasf;
unsigned short * aliasflen;
int numaliasm; // morphological desciption `compression' with aliases
char ** aliasm;
public:
HashMgr(const char * tpath, const char * apath, const char * key = NULL);
~HashMgr();
struct hentry * lookup(const char *) const;
int hash(const char *) const;
struct hentry * walk_hashtable(int & col, struct hentry * hp) const;
int add(const char * word);
int add_with_affix(const char * word, const char * pattern);
int remove(const char * word);
int decode_flags(unsigned short ** result, char * flags, FileMgr * af);
unsigned short decode_flag(const char * flag);
char * encode_flag(unsigned short flag);
int is_aliasf();
int get_aliasf(int index, unsigned short ** fvec, FileMgr * af);
int is_aliasm();
char * get_aliasm(int index);
private:
int get_clen_and_captype(const char * word, int wbl, int * captype);
int load_tables(const char * tpath, const char * key);
int add_word(const char * word, int wbl, int wcl, unsigned short * ap,
int al, const char * desc, bool onlyupcase);
int load_config(const char * affpath, const char * key);
int parse_aliasf(char * line, FileMgr * af);
int add_hidden_capitalized_word(char * word, int wbl, int wcl,
unsigned short * flags, int al, char * dp, int captype);
int parse_aliasm(char * line, FileMgr * af);
int remove_forbidden_flag(const char * word);
};
#endif

View File

@ -1,32 +0,0 @@
#ifndef _HTYPES_HXX_
#define _HTYPES_HXX_
#define ROTATE_LEN 5
#define ROTATE(v,q) \
(v) = ((v) << (q)) | (((v) >> (32 - q)) & ((1 << (q))-1));
// hentry options
#define H_OPT (1 << 0)
#define H_OPT_ALIASM (1 << 1)
#define H_OPT_PHON (1 << 2)
// see also csutil.hxx
#define HENTRY_WORD(h) &(h->word[0])
// approx. number of user defined words
#define USERWORD 1000
struct hentry
{
unsigned char blen; // word length in bytes
unsigned char clen; // word length in characters (different for UTF-8 enc.)
short alen; // length of affix flag vector
unsigned short * astr; // affix flag vector
struct hentry * next; // next word with same hash code
struct hentry * next_homonym; // next homonym word (with same hash code)
char var; // variable fields (only for special pronounciation yet)
char word[1]; // variable-length word (8-bit or UTF-8 encoding)
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,95 +0,0 @@
#ifndef _MYSPELLMGR_H_
#define _MYSPELLMGR_H_
#include "hunvisapi.h"
#ifdef __cplusplus
extern "C" {
#endif
typedef struct Hunhandle Hunhandle;
LIBHUNSPELL_DLL_EXPORTED Hunhandle *Hunspell_create(const char * affpath, const char * dpath);
LIBHUNSPELL_DLL_EXPORTED Hunhandle *Hunspell_create_key(const char * affpath, const char * dpath,
const char * key);
LIBHUNSPELL_DLL_EXPORTED void Hunspell_destroy(Hunhandle *pHunspell);
/* spell(word) - spellcheck word
* output: 0 = bad word, not 0 = good word
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_spell(Hunhandle *pHunspell, const char *);
LIBHUNSPELL_DLL_EXPORTED char *Hunspell_get_dic_encoding(Hunhandle *pHunspell);
/* suggest(suggestions, word) - search suggestions
* input: pointer to an array of strings pointer and the (bad) word
* array of strings pointer (here *slst) may not be initialized
* output: number of suggestions in string array, and suggestions in
* a newly allocated array of strings (*slts will be NULL when number
* of suggestion equals 0.)
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_suggest(Hunhandle *pHunspell, char*** slst, const char * word);
/* morphological functions */
/* analyze(result, word) - morphological analysis of the word */
LIBHUNSPELL_DLL_EXPORTED int Hunspell_analyze(Hunhandle *pHunspell, char*** slst, const char * word);
/* stem(result, word) - stemmer function */
LIBHUNSPELL_DLL_EXPORTED int Hunspell_stem(Hunhandle *pHunspell, char*** slst, const char * word);
/* stem(result, analysis, n) - get stems from a morph. analysis
* example:
* char ** result, result2;
* int n1 = Hunspell_analyze(result, "words");
* int n2 = Hunspell_stem2(result2, result, n1);
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_stem2(Hunhandle *pHunspell, char*** slst, char** desc, int n);
/* generate(result, word, word2) - morphological generation by example(s) */
LIBHUNSPELL_DLL_EXPORTED int Hunspell_generate(Hunhandle *pHunspell, char*** slst, const char * word,
const char * word2);
/* generate(result, word, desc, n) - generation by morph. description(s)
* example:
* char ** result;
* char * affix = "is:plural"; // description depends from dictionaries, too
* int n = Hunspell_generate2(result, "word", &affix, 1);
* for (int i = 0; i < n; i++) printf("%s\n", result[i]);
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_generate2(Hunhandle *pHunspell, char*** slst, const char * word,
char** desc, int n);
/* functions for run-time modification of the dictionary */
/* add word to the run-time dictionary */
LIBHUNSPELL_DLL_EXPORTED int Hunspell_add(Hunhandle *pHunspell, const char * word);
/* add word to the run-time dictionary with affix flags of
* the example (a dictionary word): Hunspell will recognize
* affixed forms of the new word, too.
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_add_with_affix(Hunhandle *pHunspell, const char * word, const char * example);
/* remove word from the run-time dictionary */
LIBHUNSPELL_DLL_EXPORTED int Hunspell_remove(Hunhandle *pHunspell, const char * word);
/* free suggestion lists */
LIBHUNSPELL_DLL_EXPORTED void Hunspell_free_list(Hunhandle *pHunspell, char *** slst, int n);
#ifdef __cplusplus
}
#endif
#endif

View File

@ -1,184 +0,0 @@
#include "hunvisapi.h"
#include "hashmgr.hxx"
#include "affixmgr.hxx"
#include "suggestmgr.hxx"
#include "langnum.hxx"
#define SPELL_XML "<?xml?>"
#define MAXDIC 20
#define MAXSUGGESTION 15
#define MAXSHARPS 5
#define HUNSPELL_OK (1 << 0)
#define HUNSPELL_OK_WARN (1 << 1)
#ifndef _MYSPELLMGR_HXX_
#define _MYSPELLMGR_HXX_
class LIBHUNSPELL_DLL_EXPORTED Hunspell
{
private:
Hunspell(const Hunspell&);
Hunspell& operator = (const Hunspell&);
private:
AffixMgr* pAMgr;
HashMgr* pHMgr[MAXDIC];
int maxdic;
SuggestMgr* pSMgr;
char * affixpath;
char * encoding;
struct cs_info * csconv;
int langnum;
int utf8;
int complexprefixes;
char** wordbreak;
public:
/* Hunspell(aff, dic) - constructor of Hunspell class
* input: path of affix file and dictionary file
*
* In WIN32 environment, use UTF-8 encoded paths started with the long path
* prefix \\\\?\\ to handle system-independent character encoding and very
* long path names (without the long path prefix Hunspell will use fopen()
* with system-dependent character encoding instead of _wfopen()).
*/
Hunspell(const char * affpath, const char * dpath, const char * key = NULL);
~Hunspell();
/* load extra dictionaries (only dic files) */
int add_dic(const char * dpath, const char * key = NULL);
/* spell(word) - spellcheck word
* output: 0 = bad word, not 0 = good word
*
* plus output:
* info: information bit array, fields:
* SPELL_COMPOUND = a compound word
* SPELL_FORBIDDEN = an explicit forbidden word
* root: root (stem), when input is a word with affix(es)
*/
int spell(const char * word, int * info = NULL, char ** root = NULL);
/* suggest(suggestions, word) - search suggestions
* input: pointer to an array of strings pointer and the (bad) word
* array of strings pointer (here *slst) may not be initialized
* output: number of suggestions in string array, and suggestions in
* a newly allocated array of strings (*slts will be NULL when number
* of suggestion equals 0.)
*/
int suggest(char*** slst, const char * word);
/* deallocate suggestion lists */
void free_list(char *** slst, int n);
char * get_dic_encoding();
/* morphological functions */
/* analyze(result, word) - morphological analysis of the word */
int analyze(char*** slst, const char * word);
/* stem(result, word) - stemmer function */
int stem(char*** slst, const char * word);
/* stem(result, analysis, n) - get stems from a morph. analysis
* example:
* char ** result, result2;
* int n1 = analyze(&result, "words");
* int n2 = stem(&result2, result, n1);
*/
int stem(char*** slst, char ** morph, int n);
/* generate(result, word, word2) - morphological generation by example(s) */
int generate(char*** slst, const char * word, const char * word2);
/* generate(result, word, desc, n) - generation by morph. description(s)
* example:
* char ** result;
* char * affix = "is:plural"; // description depends from dictionaries, too
* int n = generate(&result, "word", &affix, 1);
* for (int i = 0; i < n; i++) printf("%s\n", result[i]);
*/
int generate(char*** slst, const char * word, char ** desc, int n);
/* functions for run-time modification of the dictionary */
/* add word to the run-time dictionary */
int add(const char * word);
/* add word to the run-time dictionary with affix flags of
* the example (a dictionary word): Hunspell will recognize
* affixed forms of the new word, too.
*/
int add_with_affix(const char * word, const char * example);
/* remove word from the run-time dictionary */
int remove(const char * word);
/* other */
/* get extra word characters definied in affix file for tokenization */
const char * get_wordchars();
unsigned short * get_wordchars_utf16(int * len);
struct cs_info * get_csconv();
const char * get_version();
int get_langnum() const;
/* need for putdic */
int input_conv(const char * word, char * dest);
/* experimental and deprecated functions */
#ifdef HUNSPELL_EXPERIMENTAL
/* suffix is an affix flag string, similarly in dictionary files */
int put_word_suffix(const char * word, const char * suffix);
char * morph_with_correction(const char * word);
/* spec. suggestions */
int suggest_auto(char*** slst, const char * word);
int suggest_pos_stems(char*** slst, const char * word);
#endif
private:
int cleanword(char *, const char *, int * pcaptype, int * pabbrev);
int cleanword2(char *, const char *, w_char *, int * w_len, int * pcaptype, int * pabbrev);
void mkinitcap(char *);
int mkinitcap2(char * p, w_char * u, int nc);
int mkinitsmall2(char * p, w_char * u, int nc);
void mkallcap(char *);
int mkallcap2(char * p, w_char * u, int nc);
void mkallsmall(char *);
int mkallsmall2(char * p, w_char * u, int nc);
struct hentry * checkword(const char *, int * info, char **root);
char * sharps_u8_l1(char * dest, char * source);
hentry * spellsharps(char * base, char *, int, int, char * tmp, int * info, char **root);
int is_keepcase(const hentry * rv);
int insert_sug(char ***slst, char * word, int ns);
void cat_result(char * result, char * st);
char * stem_description(const char * desc);
int spellml(char*** slst, const char * word);
int get_xml_par(char * dest, const char * par, int maxl);
const char * get_xml_pos(const char * s, const char * attr);
int get_xml_list(char ***slst, char * list, const char * tag);
int check_xml_par(const char * q, const char * attr, const char * value);
};
#endif

View File

@ -1,196 +0,0 @@
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "hunzip.hxx"
#include "csutil.hxx"
#define CODELEN 65536
#define BASEBITREC 5000
#define UNCOMPRESSED '\002'
#define MAGIC "hz0"
#define MAGIC_ENCRYPT "hz1"
#define MAGICLEN (sizeof(MAGIC) - 1)
int Hunzip::fail(const char * err, const char * par) {
fprintf(stderr, err, par);
return -1;
}
Hunzip::Hunzip(const char * file, const char * key)
: fin(NULL)
, bufsiz(0)
, lastbit(0)
, inc(0)
, inbits(0)
, outc(0)
, dec(NULL)
{
in[0] = out[0] = line[0] = '\0';
filename = mystrdup(file);
if (getcode(key) == -1) bufsiz = -1;
else bufsiz = getbuf();
}
int Hunzip::getcode(const char * key) {
unsigned char c[2];
int i, j, n, p;
int allocatedbit = BASEBITREC;
const char * enc = key;
if (!filename) return -1;
fin = myfopen(filename, "rb");
if (!fin) return -1;
// read magic number
if ((fread(in, 1, 3, fin) < MAGICLEN)
|| !(strncmp(MAGIC, in, MAGICLEN) == 0 ||
strncmp(MAGIC_ENCRYPT, in, MAGICLEN) == 0)) {
return fail(MSG_FORMAT, filename);
}
// check encryption
if (strncmp(MAGIC_ENCRYPT, in, MAGICLEN) == 0) {
unsigned char cs;
if (!key) return fail(MSG_KEY, filename);
if (fread(&c, 1, 1, fin) < 1) return fail(MSG_FORMAT, filename);
for (cs = 0; *enc; enc++) cs ^= *enc;
if (cs != c[0]) return fail(MSG_KEY, filename);
enc = key;
} else key = NULL;
// read record count
if (fread(&c, 1, 2, fin) < 2) return fail(MSG_FORMAT, filename);
if (key) {
c[0] ^= *enc;
if (*(++enc) == '\0') enc = key;
c[1] ^= *enc;
}
n = ((int) c[0] << 8) + c[1];
dec = (struct bit *) malloc(BASEBITREC * sizeof(struct bit));
if (!dec) return fail(MSG_MEMORY, filename);
dec[0].v[0] = 0;
dec[0].v[1] = 0;
// read codes
for (i = 0; i < n; i++) {
unsigned char l;
if (fread(c, 1, 2, fin) < 2) return fail(MSG_FORMAT, filename);
if (key) {
if (*(++enc) == '\0') enc = key;
c[0] ^= *enc;
if (*(++enc) == '\0') enc = key;
c[1] ^= *enc;
}
if (fread(&l, 1, 1, fin) < 1) return fail(MSG_FORMAT, filename);
if (key) {
if (*(++enc) == '\0') enc = key;
l ^= *enc;
}
if (fread(in, 1, l/8+1, fin) < (size_t) l/8+1) return fail(MSG_FORMAT, filename);
if (key) for (j = 0; j <= l/8; j++) {
if (*(++enc) == '\0') enc = key;
in[j] ^= *enc;
}
p = 0;
for (j = 0; j < l; j++) {
int b = (in[j/8] & (1 << (7 - (j % 8)))) ? 1 : 0;
int oldp = p;
p = dec[p].v[b];
if (p == 0) {
lastbit++;
if (lastbit == allocatedbit) {
allocatedbit += BASEBITREC;
dec = (struct bit *) realloc(dec, allocatedbit * sizeof(struct bit));
}
dec[lastbit].v[0] = 0;
dec[lastbit].v[1] = 0;
dec[oldp].v[b] = lastbit;
p = lastbit;
}
}
dec[p].c[0] = c[0];
dec[p].c[1] = c[1];
}
return 0;
}
Hunzip::~Hunzip()
{
if (dec) free(dec);
if (fin) fclose(fin);
if (filename) free(filename);
}
int Hunzip::getbuf() {
int p = 0;
int o = 0;
do {
if (inc == 0) inbits = fread(in, 1, BUFSIZE, fin) * 8;
for (; inc < inbits; inc++) {
int b = (in[inc / 8] & (1 << (7 - (inc % 8)))) ? 1 : 0;
int oldp = p;
p = dec[p].v[b];
if (p == 0) {
if (oldp == lastbit) {
fclose(fin);
fin = NULL;
// add last odd byte
if (dec[lastbit].c[0]) out[o++] = dec[lastbit].c[1];
return o;
}
out[o++] = dec[oldp].c[0];
out[o++] = dec[oldp].c[1];
if (o == BUFSIZE) return o;
p = dec[p].v[b];
}
}
inc = 0;
} while (inbits == BUFSIZE * 8);
return fail(MSG_FORMAT, filename);
}
const char * Hunzip::getline() {
char linebuf[BUFSIZE];
int l = 0, eol = 0, left = 0, right = 0;
if (bufsiz == -1) return NULL;
while (l < bufsiz && !eol) {
linebuf[l++] = out[outc];
switch (out[outc]) {
case '\t': break;
case 31: { // escape
if (++outc == bufsiz) {
bufsiz = getbuf();
outc = 0;
}
linebuf[l - 1] = out[outc];
break;
}
case ' ': break;
default: if (((unsigned char) out[outc]) < 47) {
if (out[outc] > 32) {
right = out[outc] - 31;
if (++outc == bufsiz) {
bufsiz = getbuf();
outc = 0;
}
}
if (out[outc] == 30) left = 9; else left = out[outc];
linebuf[l-1] = '\n';
eol = 1;
}
}
if (++outc == bufsiz) {
outc = 0;
bufsiz = fin ? getbuf(): -1;
}
}
if (right) strcpy(linebuf + l - 1, line + strlen(line) - right - 1);
else linebuf[l] = '\0';
strcpy(line + left, linebuf);
return line;
}

View File

@ -1,47 +0,0 @@
/* hunzip: file decompression for sorted dictionaries with optional encryption,
* algorithm: prefix-suffix encoding and 16-bit Huffman encoding */
#ifndef _HUNZIP_HXX_
#define _HUNZIP_HXX_
#include "hunvisapi.h"
#include <stdio.h>
#define BUFSIZE 65536
#define HZIP_EXTENSION ".hz"
#define MSG_OPEN "error: %s: cannot open\n"
#define MSG_FORMAT "error: %s: not in hzip format\n"
#define MSG_MEMORY "error: %s: missing memory\n"
#define MSG_KEY "error: %s: missing or bad password\n"
struct bit {
unsigned char c[2];
int v[2];
};
class LIBHUNSPELL_DLL_EXPORTED Hunzip
{
private:
Hunzip(const Hunzip&);
Hunzip& operator = (const Hunzip&);
protected:
char * filename;
FILE * fin;
int bufsiz, lastbit, inc, inbits, outc;
struct bit * dec; // code table
char in[BUFSIZE]; // input buffer
char out[BUFSIZE + 1]; // Huffman-decoded buffer
char line[BUFSIZE + 50]; // decoded line
int getcode(const char * key);
int getbuf();
int fail(const char * err, const char * par);
public:
Hunzip(const char * filename, const char * key = NULL);
~Hunzip();
const char * getline();
};
#endif

View File

@ -1,38 +0,0 @@
#ifndef _LANGNUM_HXX_
#define _LANGNUM_HXX_
/*
language numbers for language specific codes
see http://l10n.openoffice.org/languages.html
*/
enum {
LANG_ar=96,
LANG_az=100, // custom number
LANG_bg=41,
LANG_ca=37,
LANG_cs=42,
LANG_da=45,
LANG_de=49,
LANG_el=30,
LANG_en=01,
LANG_es=34,
LANG_eu=10,
LANG_fr=02,
LANG_gl=38,
LANG_hr=78,
LANG_hu=36,
LANG_it=39,
LANG_la=99, // custom number
LANG_lv=101, // custom number
LANG_nl=31,
LANG_pl=48,
LANG_pt=03,
LANG_ru=07,
LANG_sv=50,
LANG_tr=90,
LANG_uk=80,
LANG_xx=999
};
#endif

View File

@ -1,61 +0,0 @@
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*
*
* NOTE: A special thanks and credit goes to Geoff Kuenning
* the creator of ispell. MySpell's affix algorithms were
* based on those of ispell which should be noted is
* copyright Geoff Kuenning et.al. and now available
* under a BSD style license. For more information on ispell
* and affix compression in general, please see:
* http://www.cs.ucla.edu/ficus-members/geoff/ispell.html
* (the home page for ispell)
*
* An almost complete rewrite of MySpell for use by
* the Mozilla project has been developed by David Einstein
* (Deinst@world.std.com). David and I are now
* working on parallel development tracks to help
* our respective projects (Mozilla and OpenOffice.org
* and we will maintain full affix file and dictionary
* file compatibility and work on merging our versions
* of MySpell back into a single tree. David has been
* a significant help in improving MySpell.
*
* Special thanks also go to La'szlo' Ne'meth
* <nemethl@gyorsposta.hu> who is the author of the
* Hungarian dictionary and who developed and contributed
* the code to support compound words in MySpell
* and fixed numerous problems with the encoding
* case conversion tables.
*
*/

View File

@ -1,65 +0,0 @@
#*************************************************************************
# Version: MPL 1.1/GPL 2.0/LGPL 2.1
#
# The contents of this file are subject to the Mozilla Public License Version
# 1.1 (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
# http://www.mozilla.org/MPL/
#
# Software distributed under the License is distributed on an "AS IS" basis,
# WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
# for the specific language governing rights and limitations under the
# License.
#
# Alternatively, the contents of this file may be used under the terms of
# either the GNU General Public License Version 2 or later (the "GPL"), or
# the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
# in which case the provisions of the GPL or the LGPL are applicable instead
# of those above. If you wish to allow use of your version of this file only
# under the terms of either the GPL or the LGPL, and not to allow others to
# use your version of this file under the terms of the MPL, indicate your
# decision by deleting the provisions above and replace them with the notice
# and other provisions required by the GPL or the LGPL. If you do not delete
# the provisions above, a recipient may use your version of this file under
# the terms of any one of the MPL, the GPL or the LGPL.
#
#*************************************************************************
PRJ = ../../../../../..
PRJNAME = hunspell
TARGET = hunspell
LIBTARGET=YES
EXTERNAL_WARNINGS_NOT_ERRORS := TRUE
UWINAPILIB=
#----- Settings ---------------------------------------------------------
.INCLUDE : settings.mk
# --- Files --------------------------------------------------------
CFLAGS+=-I..$/..$/
CDEFS+=-DOPENOFFICEORG
SLOFILES= \
$(SLO)$/affentry.obj \
$(SLO)$/affixmgr.obj \
$(SLO)$/dictmgr.obj \
$(SLO)$/csutil.obj \
$(SLO)$/hashmgr.obj \
$(SLO)$/suggestmgr.obj \
$(SLO)$/phonet.obj \
$(SLO)$/hunzip.obj \
$(SLO)$/filemgr.obj \
$(SLO)$/replist.obj \
$(SLO)$/hunspell.obj
LIB1TARGET= $(SLB)$/lib$(TARGET).lib
LIB1ARCHIV= $(LB)/lib$(TARGET).a
LIB1OBJFILES= $(SLOFILES)
# --- Targets ------------------------------------------------------
.INCLUDE : target.mk

View File

@ -1,293 +0,0 @@
/* phonetic.c - generic replacement aglogithms for phonetic transformation
Copyright (C) 2000 Bjoern Jacke
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License version 2.1 as published by the Free Software Foundation;
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; If not, see
<http://www.gnu.org/licenses/>.
Changelog:
2000-01-05 Bjoern Jacke <bjoern at j3e.de>
Initial Release insprired by the article about phonetic
transformations out of c't 25/1999
2007-07-26 Bjoern Jacke <bjoern at j3e.de>
Released under MPL/GPL/LGPL tri-license for Hunspell
2007-08-23 Laszlo Nemeth <nemeth at OOo>
Porting from Aspell to Hunspell using C-like structs
*/
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
#include "csutil.hxx"
#include "phonet.hxx"
void init_phonet_hash(phonetable & parms)
{
int i, k;
for (i = 0; i < HASHSIZE; i++) {
parms.hash[i] = -1;
}
for (i = 0; parms.rules[i][0] != '\0'; i += 2) {
/** set hash value **/
k = (unsigned char) parms.rules[i][0];
if (parms.hash[k] < 0) {
parms.hash[k] = i;
}
}
}
// like strcpy but safe if the strings overlap
// but only if dest < src
static inline void strmove(char * dest, char * src) {
while (*src)
*dest++ = *src++;
*dest = '\0';
}
static int myisalpha(char ch) {
if ((unsigned char) ch < 128) return isalpha(ch);
return 1;
}
/* phonetic transcription algorithm */
/* see: http://aspell.net/man-html/Phonetic-Code.html */
/* convert string to uppercase before this call */
int phonet (const char * inword, char * target,
int len,
phonetable & parms)
{
/** Do phonetic transformation. **/
/** "len" = length of "inword" incl. '\0'. **/
/** result: >= 0: length of "target" **/
/** otherwise: error **/
int i,j,k=0,n,p,z;
int k0,n0,p0=-333,z0;
char c, c0;
const char * s;
typedef unsigned char uchar;
char word[MAXPHONETUTF8LEN + 1];
if (len == -1) len = strlen(inword);
if (len > MAXPHONETUTF8LEN) return 0;
strncpy(word, inword, MAXPHONETUTF8LEN);
word[MAXPHONETUTF8LEN] = '\0';
/** check word **/
i = j = z = 0;
while ((c = word[i]) != '\0') {
n = parms.hash[(uchar) c];
z0 = 0;
if (n >= 0) {
/** check all rules for the same letter **/
while (parms.rules[n][0] == c) {
/** check whole string **/
k = 1; /** number of found letters **/
p = 5; /** default priority **/
s = parms.rules[n];
s++; /** important for (see below) "*(s-1)" **/
while (*s != '\0' && word[i+k] == *s
&& !isdigit ((unsigned char) *s) && strchr ("(-<^$", *s) == NULL) {
k++;
s++;
}
if (*s == '(') {
/** check letters in "(..)" **/
if (myisalpha(word[i+k]) // ...could be implied?
&& strchr(s+1, word[i+k]) != NULL) {
k++;
while (*s != ')')
s++;
s++;
}
}
p0 = (int) *s;
k0 = k;
while (*s == '-' && k > 1) {
k--;
s++;
}
if (*s == '<')
s++;
if (isdigit ((unsigned char) *s)) {
/** determine priority **/
p = *s - '0';
s++;
}
if (*s == '^' && *(s+1) == '^')
s++;
if (*s == '\0'
|| (*s == '^'
&& (i == 0 || ! myisalpha(word[i-1]))
&& (*(s+1) != '$'
|| (! myisalpha(word[i+k0]) )))
|| (*s == '$' && i > 0
&& myisalpha(word[i-1])
&& (! myisalpha(word[i+k0]) )))
{
/** search for followup rules, if: **/
/** parms.followup and k > 1 and NO '-' in searchstring **/
c0 = word[i+k-1];
n0 = parms.hash[(uchar) c0];
// if (parms.followup && k > 1 && n0 >= 0
if (k > 1 && n0 >= 0
&& p0 != (int) '-' && word[i+k] != '\0') {
/** test follow-up rule for "word[i+k]" **/
while (parms.rules[n0][0] == c0) {
/** check whole string **/
k0 = k;
p0 = 5;
s = parms.rules[n0];
s++;
while (*s != '\0' && word[i+k0] == *s
&& ! isdigit((unsigned char) *s) && strchr("(-<^$",*s) == NULL) {
k0++;
s++;
}
if (*s == '(') {
/** check letters **/
if (myisalpha(word[i+k0])
&& strchr (s+1, word[i+k0]) != NULL) {
k0++;
while (*s != ')' && *s != '\0')
s++;
if (*s == ')')
s++;
}
}
while (*s == '-') {
/** "k0" gets NOT reduced **/
/** because "if (k0 == k)" **/
s++;
}
if (*s == '<')
s++;
if (isdigit ((unsigned char) *s)) {
p0 = *s - '0';
s++;
}
if (*s == '\0'
/** *s == '^' cuts **/
|| (*s == '$' && ! myisalpha(word[i+k0])))
{
if (k0 == k) {
/** this is just a piece of the string **/
n0 += 2;
continue;
}
if (p0 < p) {
/** priority too low **/
n0 += 2;
continue;
}
/** rule fits; stop search **/
break;
}
n0 += 2;
} /** End of "while (parms.rules[n0][0] == c0)" **/
if (p0 >= p && parms.rules[n0][0] == c0) {
n += 2;
continue;
}
} /** end of follow-up stuff **/
/** replace string **/
s = parms.rules[n+1];
p0 = (parms.rules[n][0] != '\0'
&& strchr (parms.rules[n]+1,'<') != NULL) ? 1:0;
if (p0 == 1 && z == 0) {
/** rule with '<' is used **/
if (j > 0 && *s != '\0'
&& (target[j-1] == c || target[j-1] == *s)) {
j--;
}
z0 = 1;
z = 1;
k0 = 0;
while (*s != '\0' && word[i+k0] != '\0') {
word[i+k0] = *s;
k0++;
s++;
}
if (k > k0)
strmove (&word[0]+i+k0, &word[0]+i+k);
/** new "actual letter" **/
c = word[i];
}
else { /** no '<' rule used **/
i += k - 1;
z = 0;
while (*s != '\0'
&& *(s+1) != '\0' && j < len) {
if (j == 0 || target[j-1] != *s) {
target[j] = *s;
j++;
}
s++;
}
/** new "actual letter" **/
c = *s;
if (parms.rules[n][0] != '\0'
&& strstr (parms.rules[n]+1, "^^") != NULL) {
if (c != '\0') {
target[j] = c;
j++;
}
strmove (&word[0], &word[0]+i+1);
i = 0;
z0 = 1;
}
}
break;
} /** end of follow-up stuff **/
n += 2;
} /** end of while (parms.rules[n][0] == c) **/
} /** end of if (n >= 0) **/
if (z0 == 0) {
// if (k && (assert(p0!=-333),!p0) && j < len && c != '\0'
// && (!parms.collapse_result || j == 0 || target[j-1] != c)){
if (k && !p0 && j < len && c != '\0'
&& (1 || j == 0 || target[j-1] != c)){
/** condense only double letters **/
target[j] = c;
///printf("\n setting \n");
j++;
}
i++;
z = 0;
k=0;
}
} /** end of while ((c = word[i]) != '\0') **/
target[j] = '\0';
return (j);
} /** end of function "phonet" **/

View File

@ -1,87 +0,0 @@
#include "license.hunspell"
#include "license.myspell"
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "replist.hxx"
#include "csutil.hxx"
RepList::RepList(int n) {
dat = (replentry **) malloc(sizeof(replentry *) * n);
if (dat == 0) size = 0; else size = n;
pos = 0;
}
RepList::~RepList()
{
for (int i = 0; i < pos; i++) {
free(dat[i]->pattern);
free(dat[i]->pattern2);
free(dat[i]);
}
free(dat);
}
int RepList::get_pos() {
return pos;
}
replentry * RepList::item(int n) {
return dat[n];
}
int RepList::near(const char * word) {
int p1 = 0;
int p2 = pos;
while ((p2 - p1) > 1) {
int m = (p1 + p2) / 2;
int c = strcmp(word, dat[m]->pattern);
if (c <= 0) {
if (c < 0) p2 = m; else p1 = p2 = m;
} else p1 = m;
}
return p1;
}
int RepList::match(const char * word, int n) {
if (strncmp(word, dat[n]->pattern, strlen(dat[n]->pattern)) == 0) return strlen(dat[n]->pattern);
return 0;
}
int RepList::add(char * pat1, char * pat2) {
if (pos >= size || pat1 == NULL || pat2 == NULL) return 1;
replentry * r = (replentry *) malloc(sizeof(replentry));
if (r == NULL) return 1;
r->pattern = mystrrep(pat1, "_", " ");
r->pattern2 = mystrrep(pat2, "_", " ");
r->start = false;
r->end = false;
dat[pos++] = r;
for (int i = pos - 1; i > 0; i--) {
r = dat[i];
if (strcmp(r->pattern, dat[i - 1]->pattern) < 0) {
dat[i] = dat[i - 1];
dat[i - 1] = r;
} else break;
}
return 0;
}
int RepList::conv(const char * word, char * dest) {
int stl = 0;
int change = 0;
for (size_t i = 0; i < strlen(word); i++) {
int n = near(word + i);
int l = match(word + i, n);
if (l) {
strcpy(dest + stl, dat[n]->pattern2);
stl += strlen(dat[n]->pattern2);
i += l - 1;
change = 1;
} else dest[stl++] = word[i];
}
dest[stl] = '\0';
return change;
}

View File

@ -1,30 +0,0 @@
/* string replacement list class */
#ifndef _REPLIST_HXX_
#define _REPLIST_HXX_
#include "hunvisapi.h"
#include "w_char.hxx"
class LIBHUNSPELL_DLL_EXPORTED RepList
{
private:
RepList(const RepList&);
RepList& operator = (const RepList&);
protected:
replentry ** dat;
int size;
int pos;
public:
RepList(int n);
~RepList();
int get_pos();
int add(char * pat1, char * pat2);
replentry * item(int n);
int near(const char * word);
int match(const char * word, int n);
int conv(const char * word, char * dest);
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,115 +0,0 @@
#ifndef _SUGGESTMGR_HXX_
#define _SUGGESTMGR_HXX_
#define MAXSWL 100
#define MAXSWUTF8L (MAXSWL * 4)
#define MAX_ROOTS 100
#define MAX_WORDS 100
#define MAX_GUESS 200
#define MAXNGRAMSUGS 4
#define MAXPHONSUGS 2
#define MAXCOMPOUNDSUGS 3
// timelimit: max ~1/4 sec (process time on Linux) for a time consuming function
#define TIMELIMIT (CLOCKS_PER_SEC >> 2)
#define MINTIMER 100
#define MAXPLUSTIMER 100
#define NGRAM_LONGER_WORSE (1 << 0)
#define NGRAM_ANY_MISMATCH (1 << 1)
#define NGRAM_LOWERING (1 << 2)
#define NGRAM_WEIGHTED (1 << 3)
#include "hunvisapi.h"
#include "atypes.hxx"
#include "affixmgr.hxx"
#include "hashmgr.hxx"
#include "langnum.hxx"
#include <time.h>
enum { LCS_UP, LCS_LEFT, LCS_UPLEFT };
class LIBHUNSPELL_DLL_EXPORTED SuggestMgr
{
private:
SuggestMgr(const SuggestMgr&);
SuggestMgr& operator = (const SuggestMgr&);
private:
char * ckey;
int ckeyl;
w_char * ckey_utf;
char * ctry;
int ctryl;
w_char * ctry_utf;
AffixMgr* pAMgr;
int maxSug;
struct cs_info * csconv;
int utf8;
int langnum;
int nosplitsugs;
int maxngramsugs;
int maxcpdsugs;
int complexprefixes;
public:
SuggestMgr(const char * tryme, int maxn, AffixMgr *aptr);
~SuggestMgr();
int suggest(char*** slst, const char * word, int nsug, int * onlycmpdsug);
int ngsuggest(char ** wlst, char * word, int ns, HashMgr** pHMgr, int md);
int suggest_auto(char*** slst, const char * word, int nsug);
int suggest_stems(char*** slst, const char * word, int nsug);
int suggest_pos_stems(char*** slst, const char * word, int nsug);
char * suggest_morph(const char * word);
char * suggest_gen(char ** pl, int pln, char * pattern);
char * suggest_morph_for_spelling_error(const char * word);
private:
int testsug(char** wlst, const char * candidate, int wl, int ns, int cpdsuggest,
int * timer, clock_t * timelimit);
int checkword(const char *, int, int, int *, clock_t *);
int check_forbidden(const char *, int);
int capchars(char **, const char *, int, int);
int replchars(char**, const char *, int, int);
int doubletwochars(char**, const char *, int, int);
int forgotchar(char **, const char *, int, int);
int swapchar(char **, const char *, int, int);
int longswapchar(char **, const char *, int, int);
int movechar(char **, const char *, int, int);
int extrachar(char **, const char *, int, int);
int badcharkey(char **, const char *, int, int);
int badchar(char **, const char *, int, int);
int twowords(char **, const char *, int, int);
int fixstems(char **, const char *, int);
int capchars_utf(char **, const w_char *, int wl, int, int);
int doubletwochars_utf(char**, const w_char *, int wl, int, int);
int forgotchar_utf(char**, const w_char *, int wl, int, int);
int extrachar_utf(char**, const w_char *, int wl, int, int);
int badcharkey_utf(char **, const w_char *, int wl, int, int);
int badchar_utf(char **, const w_char *, int wl, int, int);
int swapchar_utf(char **, const w_char *, int wl, int, int);
int longswapchar_utf(char **, const w_char *, int, int, int);
int movechar_utf(char **, const w_char *, int, int, int);
int mapchars(char**, const char *, int, int);
int map_related(const char *, char *, int, int, char ** wlst, int, int, const mapentry*, int, int *, clock_t *);
int ngram(int n, char * s1, const char * s2, int opt);
int mystrlen(const char * word);
int leftcommonsubstring(char * s1, const char * s2);
int commoncharacterpositions(char * s1, const char * s2, int * is_swap);
void bubblesort( char ** rwd, char ** rwd2, int * rsc, int n);
void lcs(const char * s, const char * s2, int * l1, int * l2, char ** result);
int lcslen(const char * s, const char* s2);
char * suggest_hentry_gen(hentry * rv, char * pattern);
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -1,21 +0,0 @@
#ifndef __WCHARHXX__
#define __WCHARHXX__
#ifndef GCC
typedef struct {
#else
typedef struct __attribute__ ((packed)) {
#endif
unsigned char l;
unsigned char h;
} w_char;
// two character arrays
struct replentry {
char * pattern;
char * pattern2;
bool start;
bool end;
};
#endif

View File

@ -1,38 +0,0 @@
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "firstparser.hxx"
#ifndef W32
using namespace std;
#endif
FirstParser::FirstParser(const char * wordchars)
{
init(wordchars);
}
FirstParser::~FirstParser()
{
}
char * FirstParser::next_token()
{
char * tabpos = strchr(line[actual],'\t');
if ((tabpos) && (tabpos - line[actual]>token))
{
char * t = (char *) malloc(tabpos - line[actual] + 1);
if (!t)
{
fprintf(stderr,"Error - Insufficient Memory\n");
return NULL;
}
t[tabpos - line[actual]] = '\0';
token = tabpos - line[actual] +1;
return strncpy(t, line[actual], tabpos - line[actual]);
}
return NULL;
}

View File

@ -1,34 +0,0 @@
/*
* parser classes of HunTools
*
* implemented: text, HTML, TeX, first word
*
* Copyright (C) 2003, Laszlo Nemeth
*
*/
#ifndef _FIRSTPARSER_HXX_
#define _FIRSTPARSER_HXX_
#include "textparser.hxx"
/*
* Check first word of the input line
*
*/
class FirstParser : public TextParser
{
public:
FirstParser(const char * wc);
virtual ~FirstParser();
virtual char * next_token();
};
#endif

View File

@ -1,56 +0,0 @@
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "htmlparser.hxx"
#ifndef W32
using namespace std;
#endif
static const char * PATTERN[][2] = {
{ "<script", "</script>" },
{ "<style", "</style>" },
{ "<code", "</code>" },
{ "<samp", "</samp>" },
{ "<kbd", "</kbd>" },
{ "<var", "</var>" },
{ "<listing", "</listing>" },
{ "<address", "</address>" },
{ "<pre", "</pre>" },
{ "<!--", "-->" },
{ "<[cdata[", "]]>" }, // XML comment
{ "<", ">" }
};
#define PATTERN_LEN (sizeof(PATTERN) / (sizeof(char *) * 2))
static const char * PATTERN2[][2] = {
{ "<img", "alt=" }, // ALT and TITLE attrib handled spec.
{ "<img", "title=" },
{ "<a ", "title=" }
};
#define PATTERN_LEN2 (sizeof(PATTERN2) / (sizeof(char *) * 2))
HTMLParser::HTMLParser(const char * wordchars)
{
init(wordchars);
}
HTMLParser::HTMLParser(unsigned short * wordchars, int len)
{
init(wordchars, len);
}
char * HTMLParser::next_token()
{
return XMLParser::next_token(PATTERN, PATTERN_LEN, PATTERN2, PATTERN_LEN2);
}
HTMLParser::~HTMLParser()
{
}

View File

@ -1,34 +0,0 @@
/*
* HTML parser class for MySpell
*
* implemented: text, HTML, TeX
*
* Copyright (C) 2002, Laszlo Nemeth
*
*/
#ifndef _HTMLPARSER_HXX_
#define _HTMLPARSER_HXX_
#include "xmlparser.hxx"
/*
* HTML Parser
*
*/
class HTMLParser : public XMLParser
{
public:
HTMLParser(const char * wc);
HTMLParser(unsigned short * wordchars, int len);
char * next_token();
virtual ~HTMLParser();
};
#endif

View File

@ -1,223 +0,0 @@
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "latexparser.hxx"
#ifndef W32
using namespace std;
#endif
static struct {
const char * pat[2];
int arg;
} PATTERN[] = {
{ { "\\(", "\\)" } , 0 },
{ { "$$", "$$" } , 0 },
{ { "$", "$" } , 0 },
{ { "\\begin{math}", "\\end{math}" } , 0 },
{ { "\\[", "\\]" } , 0 },
{ { "\\begin{displaymath}", "\\end{displaymath}" } , 0 },
{ { "\\begin{equation}", "\\end{equation}" } , 0 },
{ { "\\begin{equation*}", "\\end{equation*}" } , 0 },
{ { "\\cite", NULL } , 1 },
{ { "\\nocite", NULL } , 1 },
{ { "\\index", NULL } , 1 },
{ { "\\label", NULL } , 1 },
{ { "\\ref", NULL } , 1 },
{ { "\\pageref", NULL } , 1 },
{ { "\\parbox", NULL } , 1 },
{ { "\\begin{verbatim}", "\\end{verbatim}" } , 0 },
{ { "\\verb+", "+" } , 0 },
{ { "\\verb|", "|" } , 0 },
{ { "\\verb#", "#" } , 0 },
{ { "\\verb*", "*" } , 0 },
{ { "\\documentstyle", "\\begin{document}" } , 0 },
{ { "\\documentclass", "\\begin{document}" } , 0 },
// { { "\\documentclass", NULL } , 1 },
{ { "\\usepackage", NULL } , 1 },
{ { "\\includeonly", NULL } , 1 },
{ { "\\include", NULL } , 1 },
{ { "\\input", NULL } , 1 },
{ { "\\vspace", NULL } , 1 },
{ { "\\setlength", NULL } , 2 },
{ { "\\addtolength", NULL } , 2 },
{ { "\\settowidth", NULL } , 2 },
{ { "\\rule", NULL } , 2 },
{ { "\\hspace", NULL } , 1 } ,
{ { "\\vspace", NULL } , 1 } ,
{ { "\\\\[", "]" } , 0 },
{ { "\\pagebreak[", "]" } , 0 } ,
{ { "\\nopagebreak[", "]" } , 0 } ,
{ { "\\enlargethispage", NULL } , 1 } ,
{ { "\\begin{tabular}", NULL } , 1 } ,
{ { "\\addcontentsline", NULL } , 2 } ,
{ { "\\begin{thebibliography}", NULL } , 1 } ,
{ { "\\bibliography", NULL } , 1 } ,
{ { "\\bibliographystyle", NULL } , 1 } ,
{ { "\\bibitem", NULL } , 1 } ,
{ { "\\begin", NULL } , 1 } ,
{ { "\\end", NULL } , 1 } ,
{ { "\\pagestyle", NULL } , 1 } ,
{ { "\\pagenumbering", NULL } , 1 } ,
{ { "\\thispagestyle", NULL } , 1 } ,
{ { "\\newtheorem", NULL } , 2 },
{ { "\\newcommand", NULL } , 2 },
{ { "\\renewcommand", NULL } , 2 },
{ { "\\setcounter", NULL } , 2 },
{ { "\\addtocounter", NULL } , 1 },
{ { "\\stepcounter", NULL } , 1 },
{ { "\\selectlanguage", NULL } , 1 },
{ { "\\inputencoding", NULL } , 1 },
{ { "\\hyphenation", NULL } , 1 },
{ { "\\definecolor", NULL } , 3 },
{ { "\\color", NULL } , 1 },
{ { "\\textcolor", NULL } , 1 },
{ { "\\pagecolor", NULL } , 1 },
{ { "\\colorbox", NULL } , 2 },
{ { "\\fcolorbox", NULL } , 2 },
{ { "\\declaregraphicsextensions", NULL } , 1 },
{ { "\\psfig", NULL } , 1 },
{ { "\\url", NULL } , 1 },
{ { "\\eqref", NULL } , 1 },
{ { "\\vskip", NULL } , 1 },
{ { "\\vglue", NULL } , 1 },
{ { "\'\'", NULL } , 1 }
};
#define PATTERN_LEN (sizeof(PATTERN) / sizeof(PATTERN[0]))
LaTeXParser::LaTeXParser(const char * wordchars)
{
init(wordchars);
}
LaTeXParser::LaTeXParser(unsigned short * wordchars, int len)
{
init(wordchars, len);
}
LaTeXParser::~LaTeXParser()
{
}
int LaTeXParser::look_pattern(int col)
{
for (unsigned int i = 0; i < PATTERN_LEN; i++) {
char * j = line[actual] + head;
const char * k = PATTERN[i].pat[col];
if (! k) continue;
while ((*k != '\0') && (tolower(*j) == *k)) {
j++;
k++;
}
if (*k == '\0') return i;
}
return -1;
}
/*
* LaTeXParser
*
* state 0: not wordchar
* state 1: wordchar
* state 2: comments
* state 3: commands
* state 4: commands with arguments
* state 5: % comment
*
*/
char * LaTeXParser::next_token()
{
int i;
int slash = 0;
int apostrophe;
for (;;) {
// fprintf(stderr,"depth: %d, state: %d, , arg: %d, token: %s\n",depth,state,arg,line[actual]+head);
switch (state)
{
case 0: // non word chars
if ((pattern_num = look_pattern(0)) != -1) {
if (PATTERN[pattern_num].pat[1]) {
state = 2;
} else {
state = 4;
depth = 0;
arg = 0;
opt = 1;
}
head += strlen(PATTERN[pattern_num].pat[0]) - 1;
} else if ((line[actual][head] == '%')) {
state = 5;
} else if (is_wordchar(line[actual] + head)) {
state = 1;
token = head;
} else if (line[actual][head] == '\\') {
if (line[actual][head + 1] == '\\' || // \\ (linebreak)
(line[actual][head + 1] == '$') || // \$ (dollar sign)
(line[actual][head + 1] == '%')) { // \% (percent)
head++;
break;
}
state = 3;
} else if (line[actual][head] == '%') {
if ((head==0) || (line[actual][head - 1] != '\\')) state = 5;
}
break;
case 1: // wordchar
apostrophe = 0;
if (! is_wordchar(line[actual] + head) ||
(line[actual][head] == '\'' && line[actual][head+1] == '\'' && ++apostrophe)) {
state = 0;
char * t = alloc_token(token, &head);
if (apostrophe) head += 2;
if (t) return t;
}
break;
case 2: // comment, labels, etc
if (((i = look_pattern(1)) != -1) &&
(strcmp(PATTERN[i].pat[1],PATTERN[pattern_num].pat[1]) == 0)) {
state = 0;
head += strlen(PATTERN[pattern_num].pat[1]) - 1;
}
break;
case 3: // command
if ((tolower(line[actual][head]) < 'a') || (tolower(line[actual][head]) > 'z')) {
state = 0;
head--;
}
break;
case 4: // command with arguments
if (slash && (line[actual][head] != '\0')) {
slash = 0;
head++;
break;
} else if (line[actual][head]=='\\') {
slash = 1;
} else if ((line[actual][head] == '{') ||
((opt) && (line[actual][head] == '['))) {
depth++;
opt = 0;
} else if (line[actual][head] == '}') {
depth--;
if (depth == 0) {
opt = 1;
arg++;
}
if (((depth == 0) && (arg == PATTERN[pattern_num].arg)) ||
(depth < 0) ) {
state = 0; // XXX not handles the last optional arg.
}
} else if (line[actual][head] == ']') depth--;
} // case
if (next_char(line[actual], &head)) {
if (state == 5) state = 0;
return NULL;
}
}
}

View File

@ -1,44 +0,0 @@
/*
* parser classes for MySpell
*
* implemented: text, HTML, TeX
*
* Copyright (C) 2002, Laszlo Nemeth
*
*/
#ifndef _LATEXPARSER_HXX_
#define _LATEXPARSER_HXX_
#include "textparser.hxx"
/*
* HTML Parser
*
*/
class LaTeXParser : public TextParser
{
int pattern_num; // number of comment
int depth; // depth of blocks
int arg; // arguments's number
int opt; // optional argument attrib.
public:
LaTeXParser(const char * wc);
LaTeXParser(unsigned short * wordchars, int len);
virtual ~LaTeXParser();
virtual char * next_token();
private:
int look_pattern(int col);
};
#endif

View File

@ -1,71 +0,0 @@
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "manparser.hxx"
#ifndef W32
using namespace std;
#endif
ManParser::ManParser() {
}
ManParser::ManParser(const char * wordchars)
{
init(wordchars);
}
ManParser::ManParser(unsigned short * wordchars, int len)
{
init(wordchars, len);
}
ManParser::~ManParser()
{
}
char * ManParser::next_token()
{
for (;;) {
switch (state)
{
case 1: // command arguments
if (line[actual][head] == ' ') state = 2;
break;
case 0: // dot in begin of line
if (line[actual][0] == '.') {
state = 1;
break;
} else {
state = 2;
}
// no break
case 2: // non word chars
if (is_wordchar(line[actual] + head)) {
state = 3;
token = head;
} else if ((line[actual][head] == '\\') &&
(line[actual][head + 1] == 'f') &&
(line[actual][head + 2] != '\0')) {
head += 2;
}
break;
case 3: // wordchar
if (! is_wordchar(line[actual] + head)) {
state = 2;
char * t = alloc_token(token, &head);
if (t) return t;
}
break;
}
if (next_char(line[actual], &head)) {
state = 0;
return NULL;
}
}
}

View File

@ -1,38 +0,0 @@
/*
* parser classes for MySpell
*
* implemented: text, HTML, TeX
*
* Copyright (C) 2002, Laszlo Nemeth
*
*/
#ifndef _MANPARSER_HXX_
#define _MANPARSER_HXX_
#include "textparser.hxx"
/*
* Manparse Parser
*
*/
class ManParser : public TextParser
{
protected:
public:
ManParser();
ManParser(const char * wc);
ManParser(unsigned short * wordchars, int len);
virtual ~ManParser();
virtual char * next_token();
};
#endif

View File

@ -1,47 +0,0 @@
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "odfparser.hxx"
#ifndef W32
using namespace std;
#endif
static const char * PATTERN[][2] = {
{ "<office:meta>", "</office:meta>" },
{ "<office:settings>", "</office:settings>" },
{ "<office:binary-data>", "</office:binary-data>" },
{ "<!--", "-->" },
{ "<[cdata[", "]]>" }, // XML comment
{ "<", ">" }
};
#define PATTERN_LEN (sizeof(PATTERN) / (sizeof(char *) * 2))
static const char * PATTERN2[][2] = {
};
#define PATTERN_LEN2 (sizeof(PATTERN2) / (sizeof(char *) * 2))
ODFParser::ODFParser(const char * wordchars)
{
init(wordchars);
}
ODFParser::ODFParser(unsigned short * wordchars, int len)
{
init(wordchars, len);
}
char * ODFParser::next_token()
{
return XMLParser::next_token(PATTERN, PATTERN_LEN, PATTERN2, PATTERN_LEN2);
}
ODFParser::~ODFParser()
{
}

View File

@ -1,31 +0,0 @@
/*
* ODF parser class for MySpell
*
* Copyright (C) 2014, Laszlo Nemeth
*
*/
#ifndef _ODFPARSER_HXX_
#define _ODFPARSER_HXX_
#include "xmlparser.hxx"
/*
* HTML Parser
*
*/
class ODFParser : public XMLParser
{
public:
ODFParser(const char * wc);
ODFParser(unsigned short * wordchars, int len);
char * next_token();
virtual ~ODFParser();
};
#endif

View File

@ -1,52 +0,0 @@
#include <cstring>
#include <cstdlib>
#include <cstdio>
#include "textparser.hxx"
#include "htmlparser.hxx"
#include "latexparser.hxx"
#include "xmlparser.hxx"
#ifndef W32
using namespace std;
#endif
int
main(int argc, char** argv)
{
FILE * f;
/* first parse the command line options */
if (argc < 2) {
fprintf(stderr,"correct syntax is:\n");
fprintf(stderr,"testparser file\n");
fprintf(stderr,"example: testparser /dev/stdin\n");
exit(1);
}
/* open the words to check list */
f = fopen(argv[1],"r");
if (!f) {
fprintf(stderr,"Error - could not open file of words to check\n");
exit(1);
}
TextParser * p = new TextParser("qwertzuiopasdfghjklyxcvbnméáúõûóüöíQWERTZUIOPASDFGHJKLYXCVBNMÍÉÁÕÚÖÜÓÛ");
char buf[MAXLNLEN];
char * next;
while(fgets(buf,MAXLNLEN,f)) {
p->put_line(buf);
p->set_url_checking(1);
while ((next=p->next_token())) {
fprintf(stdout,"token: %s\n",next);
free(next);
}
}
delete p;
fclose(f);
return 0;
}

View File

@ -1,303 +0,0 @@
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "textparser.hxx"
#ifndef W32
using namespace std;
#endif
// ISO-8859-1 HTML character entities
static const char * LATIN1[] = {
"&Agrave;",
"&Atilde;",
"&Aring;",
"&AElig;",
"&Egrave;",
"&Ecirc;",
"&Igrave;",
"&Iuml;",
"&ETH;",
"&Ntilde;",
"&Ograve;",
"&Oslash;",
"&Ugrave;",
"&THORN;",
"&agrave;",
"&atilde;",
"&aring;",
"&aelig;",
"&egrave;",
"&ecirc;",
"&igrave;",
"&iuml;",
"&eth;",
"&ntilde;",
"&ograve;",
"&oslash;",
"&ugrave;",
"&thorn;",
"&yuml;"
};
#define LATIN1_LEN (sizeof(LATIN1) / sizeof(char *))
#define ENTITY_APOS "&apos;"
#define UTF8_APOS "\xe2\x80\x99"
#define APOSTROPHE "'"
TextParser::TextParser() {
init((char *) NULL);
}
TextParser::TextParser(const char * wordchars)
{
init(wordchars);
}
TextParser::TextParser(unsigned short * wordchars, int len)
{
init(wordchars, len);
}
TextParser::~TextParser()
{
}
int TextParser::is_wordchar(char * w)
{
if (*w == '\0') return 0;
if (utf8) {
w_char wc;
unsigned short idx;
u8_u16(&wc, 1, w);
idx = (wc.h << 8) + wc.l;
return (unicodeisalpha(idx) || (wordchars_utf16 && flag_bsearch(wordchars_utf16, *((unsigned short *) &wc), wclen)));
} else {
return wordcharacters[(*w + 256) % 256];
}
}
const char * TextParser::get_latin1(char * s)
{
if (s[0] == '&') {
unsigned int i = 0;
while ((i < LATIN1_LEN) &&
strncmp(LATIN1[i], s, strlen(LATIN1[i]))) i++;
if (i != LATIN1_LEN) return LATIN1[i];
}
return NULL;
}
void TextParser::init(const char * wordchars)
{
for (int i = 0; i < MAXPREVLINE; i++) {
line[i][0] = '\0';
}
actual = 0;
head = 0;
token = 0;
state = 0;
utf8 = 0;
checkurl = 0;
unsigned int j;
for (j = 0; j < 256; j++) {
wordcharacters[j] = 0;
}
if (!wordchars) wordchars = "qwertzuiopasdfghjklyxcvbnmQWERTZUIOPASDFGHJKLYXCVBNM";
for (j = 0; j < strlen(wordchars); j++) {
wordcharacters[(wordchars[j] + 256) % 256] = 1;
}
}
void TextParser::init(unsigned short * wc, int len)
{
for (int i = 0; i < MAXPREVLINE; i++) {
line[i][0] = '\0';
}
actual = 0;
head = 0;
token = 0;
state = 0;
utf8 = 1;
checkurl = 0;
wordchars_utf16 = wc;
wclen = len;
}
int TextParser::next_char(char * line, int * pos) {
if (*(line + *pos) == '\0') return 1;
if (utf8) {
if (*(line + *pos) >> 7) {
// jump to next UTF-8 character
for((*pos)++; (*(line + *pos) & 0xc0) == 0x80; (*pos)++);
} else {
(*pos)++;
}
} else (*pos)++;
return 0;
}
void TextParser::put_line(char * word)
{
actual = (actual + 1) % MAXPREVLINE;
strcpy(line[actual], word);
token = 0;
head = 0;
check_urls();
}
char * TextParser::get_prevline(int n)
{
return mystrdup(line[(actual + MAXPREVLINE - n) % MAXPREVLINE]);
}
char * TextParser::get_line()
{
return get_prevline(0);
}
char * TextParser::next_token()
{
const char * latin1;
for (;;) {
switch (state)
{
case 0: // non word chars
if (is_wordchar(line[actual] + head)) {
state = 1;
token = head;
} else if ((latin1 = get_latin1(line[actual] + head))) {
state = 1;
token = head;
head += strlen(latin1);
}
break;
case 1: // wordchar
if ((latin1 = get_latin1(line[actual] + head))) {
head += strlen(latin1);
} else if ((is_wordchar((char *) APOSTROPHE) || (is_utf8() && is_wordchar((char *) UTF8_APOS))) && line[actual][head] == '\'' &&
is_wordchar(line[actual] + head + 1)) {
head++;
} else if (is_utf8() && is_wordchar((char *) APOSTROPHE) && // add Unicode apostrophe to the WORDCHARS, if needed
strncmp(line[actual] + head, UTF8_APOS, strlen(UTF8_APOS)) == 0 &&
is_wordchar(line[actual] + head + strlen(UTF8_APOS))) {
head += strlen(UTF8_APOS) - 1;
} else if (! is_wordchar(line[actual] + head)) {
state = 0;
char * t = alloc_token(token, &head);
if (t) return t;
}
break;
}
if (next_char(line[actual], &head)) return NULL;
}
}
int TextParser::get_tokenpos()
{
return token;
}
int TextParser::change_token(const char * word)
{
if (word) {
char * r = mystrdup(line[actual] + head);
strcpy(line[actual] + token, word);
strcat(line[actual], r);
head = token;
free(r);
return 1;
}
return 0;
}
void TextParser::check_urls()
{
int url_state = 0;
int url_head = 0;
int url_token = 0;
int url = 0;
for (;;) {
switch (url_state)
{
case 0: // non word chars
if (is_wordchar(line[actual] + url_head)) {
url_state = 1;
url_token = url_head;
// Unix path
} else if (*(line[actual] + url_head) == '/') {
url_state = 1;
url_token = url_head;
url = 1;
}
break;
case 1: // wordchar
char ch = *(line[actual] + url_head);
// e-mail address
if ((ch == '@') ||
// MS-DOS, Windows path
(strncmp(line[actual] + url_head, ":\\", 2) == 0) ||
// URL
(strncmp(line[actual] + url_head, "://", 3) == 0)) {
url = 1;
} else if (! (is_wordchar(line[actual] + url_head) ||
(ch == '-') || (ch == '_') || (ch == '\\') ||
(ch == '.') || (ch == ':') || (ch == '/') ||
(ch == '~') || (ch == '%') || (ch == '*') ||
(ch == '$') || (ch == '[') || (ch == ']') ||
(ch == '?') || (ch == '!') ||
((ch >= '0') && (ch <= '9')))) {
url_state = 0;
if (url == 1) {
for (int i = url_token; i < url_head; i++) {
*(urlline + i) = 1;
}
}
url = 0;
}
break;
}
*(urlline + url_head) = 0;
if (next_char(line[actual], &url_head)) return;
}
}
int TextParser::get_url(int token_pos, int * head)
{
for (int i = *head; urlline[i] && *(line[actual]+i); i++, (*head)++);
return checkurl ? 0 : urlline[token_pos];
}
void TextParser::set_url_checking(int check)
{
checkurl = check;
}
char * TextParser::alloc_token(int token, int * head)
{
int url_head = *head;
if (get_url(token, &url_head)) return NULL;
char * t = (char *) malloc(*head - token + 1);
if (t) {
t[*head - token] = '\0';
strncpy(t, line[actual] + token, *head - token);
// remove colon for Finnish and Swedish language
if (t[*head - token - 1] == ':') {
t[*head - token - 1] = '\0';
if (!t[0]) {
free(t);
return NULL;
}
}
return t;
}
fprintf(stderr,"Error - Insufficient Memory\n");
return NULL;
}

View File

@ -1,70 +0,0 @@
/*
* parser classes for MySpell
*
* implemented: text, HTML, TeX
*
* Copyright (C) 2002, Laszlo Nemeth
*
*/
#ifndef _TEXTPARSER_HXX_
#define _TEXTPARSER_HXX_
// set sum of actual and previous lines
#define MAXPREVLINE 4
#ifndef MAXLNLEN
#define MAXLNLEN 8192
#endif
/*
* Base Text Parser
*
*/
class TextParser
{
protected:
int wordcharacters[256]; // for detection of the word boundaries
char line[MAXPREVLINE][MAXLNLEN]; // parsed and previous lines
char urlline[MAXLNLEN]; // mask for url detection
int checkurl;
int actual; // actual line
int head; // head position
int token; // begin of token
int state; // state of automata
int utf8; // UTF-8 character encoding
int next_char(char * line, int * pos);
unsigned short * wordchars_utf16;
int wclen;
public:
TextParser();
TextParser(unsigned short * wordchars, int len);
TextParser(const char * wc);
void init(const char *);
void init(unsigned short * wordchars, int len);
virtual ~TextParser();
void put_line(char * line);
char * get_line();
char * get_prevline(int n);
virtual char * next_token();
virtual int change_token(const char * word);
void set_url_checking(int check);
int get_tokenpos();
int is_wordchar(char * w);
inline int is_utf8() { return utf8; }
const char * get_latin1(char * s);
char * next_char();
int tokenize_urls();
void check_urls();
int get_url(int token_pos, int * head);
char * alloc_token(int token, int * head);
};
#endif

View File

@ -1,176 +0,0 @@
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "xmlparser.hxx"
#ifndef W32
using namespace std;
#endif
enum { ST_NON_WORD, ST_WORD, ST_TAG, ST_CHAR_ENTITY, ST_OTHER_TAG, ST_ATTRIB };
static const char * __PATTERN__[][2] = {
{ "<!--", "-->" },
{ "<[cdata[", "]]>" }, // XML comment
{ "<", ">" }
};
#define __PATTERN_LEN__ (sizeof(__PATTERN__) / (sizeof(char *) * 2))
static const char * __PATTERN2__[][2] = {
};
#define __PATTERN_LEN2__ (sizeof(__PATTERN2__) / (sizeof(char *) * 2))
#define ENTITY_APOS "&apos;"
#define UTF8_APOS "\xe2\x80\x99"
#define APOSTROPHE "'"
XMLParser::XMLParser()
{
}
XMLParser::XMLParser(const char * wordchars)
{
init(wordchars);
}
XMLParser::XMLParser(unsigned short * wordchars, int len)
{
init(wordchars, len);
}
XMLParser::~XMLParser()
{
}
int XMLParser::look_pattern(const char * p[][2], unsigned int len, int column)
{
for (unsigned int i = 0; i < len; i++) {
char * j = line[actual] + head;
const char * k = p[i][column];
while ((*k != '\0') && (tolower(*j) == *k)) {
j++;
k++;
}
if (*k == '\0') return i;
}
return -1;
}
/*
* XML parser
*
*/
char * XMLParser::next_token(const char * PATTERN[][2], unsigned int PATTERN_LEN, const char * PATTERN2[][2], unsigned int PATTERN_LEN2)
{
const char * latin1;
for (;;) {
switch (state)
{
case ST_NON_WORD: // non word chars
prevstate = ST_NON_WORD;
if ((pattern_num = look_pattern(PATTERN, PATTERN_LEN, 0)) != -1) {
checkattr = 0;
if ((pattern2_num = look_pattern(PATTERN2, PATTERN_LEN2, 0)) != -1) {
checkattr = 1;
}
state = ST_TAG;
} else if (is_wordchar(line[actual] + head)) {
state = ST_WORD;
token = head;
} else if ((latin1 = get_latin1(line[actual] + head))) {
state = ST_WORD;
token = head;
head += strlen(latin1);
} else if (line[actual][head] == '&') {
state = ST_CHAR_ENTITY;
}
break;
case ST_WORD: // wordchar
if ((latin1 = get_latin1(line[actual] + head))) {
head += strlen(latin1);
} else if ((is_wordchar((char *) APOSTROPHE) || (is_utf8() && is_wordchar((char *) UTF8_APOS))) &&
strncmp(line[actual] + head, ENTITY_APOS, strlen(ENTITY_APOS)) == 0 &&
is_wordchar(line[actual] + head + strlen(ENTITY_APOS))) {
head += strlen(ENTITY_APOS) - 1;
} else if (is_utf8() && is_wordchar((char *) APOSTROPHE) && // add Unicode apostrophe to the WORDCHARS, if needed
strncmp(line[actual] + head, UTF8_APOS, strlen(UTF8_APOS)) == 0 &&
is_wordchar(line[actual] + head + strlen(UTF8_APOS))) {
head += strlen(UTF8_APOS) - 1;
} else if (! is_wordchar(line[actual] + head)) {
state = prevstate;
char * t = alloc_token(token, &head);
if (t) return t;
}
break;
case ST_TAG: // comment, labels, etc
int i;
if ((checkattr == 1) && ((i = look_pattern(PATTERN2, PATTERN_LEN2, 1)) != -1)
&& (strcmp(PATTERN2[i][0],PATTERN2[pattern2_num][0]) == 0)) {
checkattr = 2;
} else if ((checkattr > 0) && (line[actual][head] == '>')) {
state = ST_NON_WORD;
} else if (((i = look_pattern(PATTERN, PATTERN_LEN, 1)) != -1) &&
(strcmp(PATTERN[i][1],PATTERN[pattern_num][1]) == 0)) {
state = ST_NON_WORD;
head += strlen(PATTERN[pattern_num][1]) - 1;
} else if ( (strcmp(PATTERN[pattern_num][0], "<") == 0) &&
((line[actual][head] == '"') || (line[actual][head] == '\''))) {
quotmark = line[actual][head];
state = ST_ATTRIB;
}
break;
case ST_ATTRIB: // non word chars
prevstate = ST_ATTRIB;
if (line[actual][head] == quotmark) {
state = ST_TAG;
if (checkattr == 2) checkattr = 1;
// for IMG ALT
} else if (is_wordchar(line[actual] + head) && (checkattr == 2)) {
state = ST_WORD;
token = head;
} else if (line[actual][head] == '&') {
state = ST_CHAR_ENTITY;
}
break;
case ST_CHAR_ENTITY: // SGML element
if ((tolower(line[actual][head]) == ';')) {
state = prevstate;
head--;
}
}
if (next_char(line[actual], &head)) return NULL;
}
}
char * XMLParser::next_token()
{
return next_token(__PATTERN__, __PATTERN_LEN__, __PATTERN2__, __PATTERN_LEN2__);
}
int XMLParser::change_token(const char * word)
{
if (strstr(word, APOSTROPHE) != NULL ||
strchr(word, '"') != NULL ||
strchr(word, '&') != NULL ||
strchr(word, '<') != NULL ||
strchr(word, '>') != NULL) {
char r[MAXLNLEN];
strcpy(r, word);
return TextParser::change_token(mystrrep(mystrrep(mystrrep(mystrrep(mystrrep(mystrrep(r,
"&", "__namp;__"),
"__namp;__", "&amp;"),
APOSTROPHE, ENTITY_APOS),
"\"", "&quot;"),
">", "&gt;"),
"<", "&lt;"));
}
return TextParser::change_token(word);
}

View File

@ -1,44 +0,0 @@
/*
* HTML parser class for MySpell
*
* implemented: text, HTML, TeX
*
* Copyright (C) 2014, Laszlo Nemeth
*
*/
#ifndef _XMLPARSER_HXX_
#define _XMLPARSER_HXX_
#include "textparser.hxx"
/*
* XML Parser
*
*/
class XMLParser : public TextParser
{
public:
XMLParser();
XMLParser(const char * wc);
XMLParser(unsigned short * wordchars, int len);
char * next_token(const char * p[][2], unsigned int len, const char * p2[][2], unsigned int len2);
char * next_token();
int change_token(const char * word);
virtual ~XMLParser();
private:
int look_pattern(const char * p[][2], unsigned int len, int column);
int pattern_num;
int pattern2_num;
int prevstate;
int checkattr;
char quotmark;
};
#endif

View File

@ -1,208 +0,0 @@
/* config.h.in. Generated from configure.ac by autoheader. */
/* Define to one of `_getb67', `GETB67', `getb67' for Cray-2 and Cray-YMP
systems. This function is required for `alloca.c' support on those systems.
*/
#define CRAY_STACKSEG_END 1
/* Define to 1 if using `alloca.c'. */
#define C_ALLOCA 1
/* Define to 1 if translation of program messages to the user's native
language is requested. */
#undef ENABLE_NLS
/* Define to 1 if you have `alloca', as a function or macro. */
#define HAVE_ALLOCA 1
/* Define to 1 if you have <alloca.h> and it should be used (not on Ultrix).
*/
#define HAVE_ALLOCA_H 1
/* Define to 1 if you have the <argz.h> header file. */
#define HAVE_ARGZ_H 1
/* "Define if you have the <curses.h> header" */
#undef HAVE_CURSES_H
/* Define if the GNU dcgettext() function is already present or preinstalled.
*/
#define HAVE_DCGETTEXT 1
/* Define to 1 if you have the <dlfcn.h> header file. */
#define HAVE_DLFCN_H 1
/* Define to 1 if you have the <error.h> header file. */
#define HAVE_ERROR_H 1
/* Define to 1 if you have the <fcntl.h> header file. */
#define HAVE_FCNTL_H 1
/* Define to 1 if you have the `feof_unlocked' function. */
#define HAVE_FEOF_UNLOCKED 1
/* Define to 1 if you have the `fgets_unlocked' function. */
#define HAVE_FGETS_UNLOCKED 1
/* Define to 1 if you have the `getcwd' function. */
#define HAVE_GETCWD 1
/* Define to 1 if you have the `getc_unlocked' function. */
#define HAVE_GETC_UNLOCKED 1
/* Define to 1 if you have the `getegid' function. */
#define HAVE_GETEGID 1
/* Define to 1 if you have the `geteuid' function. */
#define HAVE_GETEUID 1
/* Define to 1 if you have the `getgid' function. */
#define HAVE_GETGID 1
/* Define to 1 if you have the `getpagesize' function. */
#define HAVE_GETPAGESIZE 1
/* Define if the GNU gettext() function is already present or preinstalled. */
#define HAVE_GETTEXT 1
/* Define to 1 if you have the `getuid' function. */
#define HAVE_GETUID 1
/* Define if you have the iconv() function. */
#undef HAVE_ICONV
/* Define to 1 if you have the <inttypes.h> header file. */
#define HAVE_INTTYPES_H 1
/* Define if you have <langinfo.h> and nl_langinfo(CODESET). */
#define HAVE_LANGINFO_CODESET 1
/* Define if your <locale.h> file defines LC_MESSAGES. */
#define HAVE_LC_MESSAGES 1
/* Define to 1 if you have the <libintl.h> header file. */
#define HAVE_LIBINTL_H 1
/* Define to 1 if you have the <limits.h> header file. */
#define HAVE_LIMITS_H 1
/* Define to 1 if you have the <locale.h> header file. */
#define HAVE_LOCALE_H 1
/* Define to 1 if you have the `memchr' function. */
#define HAVE_MEMCHR 1
/* Define to 1 if you have the <memory.h> header file. */
#define HAVE_MEMORY_H 1
/* Define to 1 if you have the `mempcpy' function. */
#define HAVE_MEMPCPY 1
/* Define to 1 if you have a working `mmap' system call. */
#define HAVE_MMAP 1
/* Define to 1 if you have the `munmap' function. */
#define HAVE_MUNMAP 1
/* "Define if you have the <ncursesw/curses.h> header" */
#define HAVE_NCURSESW_H 1
/* Define to 1 if you have the <nl_types.h> header file. */
#define HAVE_NL_TYPES_H 1
/* Define to 1 if you have the `putenv' function. */
#define HAVE_PUTENV 1
/* "Define if you have fancy command input editing with Readline" */
#undef HAVE_READLINE
/* Define to 1 if you have the `setenv' function. */
#define HAVE_SETENV 1
/* Define to 1 if you have the `setlocale' function. */
#define HAVE_SETLOCALE 1
/* Define to 1 if you have the <stddef.h> header file. */
#define HAVE_STDDEF_H 1
/* Define to 1 if you have the <stdint.h> header file. */
#define HAVE_STDINT_H 1
/* Define to 1 if you have the <stdlib.h> header file. */
#define HAVE_STDLIB_H 1
/* Define to 1 if you have the `stpcpy' function. */
#define HAVE_STPCPY 1
/* Define to 1 if you have the `strcasecmp' function. */
#define HAVE_STRCASECMP 1
/* Define to 1 if you have the `strchr' function. */
#define HAVE_STRCHR 1
/* Define to 1 if you have the `strdup' function. */
#define HAVE_STRDUP 1
/* Define to 1 if you have the <strings.h> header file. */
#define HAVE_STRINGS_H 1
/* Define to 1 if you have the <string.h> header file. */
#define HAVE_STRING_H 1
/* Define to 1 if you have the `strstr' function. */
#define HAVE_STRSTR 1
/* Define to 1 if you have the `strtoul' function. */
#define HAVE_STRTOUL 1
/* Define to 1 if you have the <sys/param.h> header file. */
#define HAVE_SYS_PARAM_H 1
/* Define to 1 if you have the <sys/stat.h> header file. */
#define HAVE_SYS_STAT_H 1
/* Define to 1 if you have the <sys/types.h> header file. */
#define HAVE_SYS_TYPES_H 1
/* Define to 1 if you have the `tsearch' function. */
#define HAVE_TSEARCH 1
/* Define to 1 if you have the <unistd.h> header file. */
#define HAVE_UNISTD_H 1
/* Define to 1 if you have the `__argz_count' function. */
#define HAVE___ARGZ_COUNT 1
/* Define to 1 if you have the `__argz_next' function. */
#define HAVE___ARGZ_NEXT 1
/* Define to 1 if you have the `__argz_stringify' function. */
#define HAVE___ARGZ_STRINGIFY 1
/* "Define if you use exterimental functions" */
#undef HUNSPELL_EXPERIMENTAL
/* "Define if you need warning messages" */
#define HUNSPELL_WARNING_ON
/* Define as const if the declaration of iconv() needs const. */
#define ICONV_CONST 1
/* Name of package */
#define PACKAGE
/* Define to the address where bug reports for this package should be sent. */
#define PACKAGE_BUGREPORT
/* Define to the full name of this package. */
#define PACKAGE_NAME
/* Define to the full name and version of this package. */
#define PACKAGE_STRING
/* Define to the one symbol short name of this package. */
#define PACKAGE_TARNAME
/* Define to the version of this package. */
#define PACKAGE_VERSION "1.3.3"
#define VERSION "1.3.3"

674
3rdparty/hunspell/1.6.2/COPYING vendored Normal file
View File

@ -0,0 +1,674 @@
GNU GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The GNU General Public License is a free, copyleft license for
software and other kinds of works.
The licenses for most software and other practical works are designed
to take away your freedom to share and change the works. By contrast,
the GNU General Public License is intended to guarantee your freedom to
share and change all versions of a program--to make sure it remains free
software for all its users. We, the Free Software Foundation, use the
GNU General Public License for most of our software; it applies also to
any other work released this way by its authors. You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
them if you wish), that you receive source code or can get it if you
want it, that you can change the software or use pieces of it in new
free programs, and that you know you can do these things.
To protect your rights, we need to prevent others from denying you
these rights or asking you to surrender the rights. Therefore, you have
certain responsibilities if you distribute copies of the software, or if
you modify it: responsibilities to respect the freedom of others.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must pass on to the recipients the same
freedoms that you received. You must make sure that they, too, receive
or can get the source code. And you must show them these terms so they
know their rights.
Developers that use the GNU GPL protect your rights with two steps:
(1) assert copyright on the software, and (2) offer you this License
giving you legal permission to copy, distribute and/or modify it.
For the developers' and authors' protection, the GPL clearly explains
that there is no warranty for this free software. For both users' and
authors' sake, the GPL requires that modified versions be marked as
changed, so that their problems will not be attributed erroneously to
authors of previous versions.
Some devices are designed to deny users access to install or run
modified versions of the software inside them, although the manufacturer
can do so. This is fundamentally incompatible with the aim of
protecting users' freedom to change the software. The systematic
pattern of such abuse occurs in the area of products for individuals to
use, which is precisely where it is most unacceptable. Therefore, we
have designed this version of the GPL to prohibit the practice for those
products. If such problems arise substantially in other domains, we
stand ready to extend this provision to those domains in future versions
of the GPL, as needed to protect the freedom of users.
Finally, every program is threatened constantly by software patents.
States should not allow patents to restrict development and use of
software on general-purpose computers, but in those that do, we wish to
avoid the special danger that patents applied to a free program could
make it effectively proprietary. To prevent this, the GPL assures that
patents cannot be used to render the program non-free.
The precise terms and conditions for copying, distribution and
modification follow.
TERMS AND CONDITIONS
0. Definitions.
"This License" refers to version 3 of the GNU General Public License.
"Copyright" also means copyright-like laws that apply to other kinds of
works, such as semiconductor masks.
"The Program" refers to any copyrightable work licensed under this
License. Each licensee is addressed as "you". "Licensees" and
"recipients" may be individuals or organizations.
To "modify" a work means to copy from or adapt all or part of the work
in a fashion requiring copyright permission, other than the making of an
exact copy. The resulting work is called a "modified version" of the
earlier work or a work "based on" the earlier work.
A "covered work" means either the unmodified Program or a work based
on the Program.
To "propagate" a work means to do anything with it that, without
permission, would make you directly or secondarily liable for
infringement under applicable copyright law, except executing it on a
computer or modifying a private copy. Propagation includes copying,
distribution (with or without modification), making available to the
public, and in some countries other activities as well.
To "convey" a work means any kind of propagation that enables other
parties to make or receive copies. Mere interaction with a user through
a computer network, with no transfer of a copy, is not conveying.
An interactive user interface displays "Appropriate Legal Notices"
to the extent that it includes a convenient and prominently visible
feature that (1) displays an appropriate copyright notice, and (2)
tells the user that there is no warranty for the work (except to the
extent that warranties are provided), that licensees may convey the
work under this License, and how to view a copy of this License. If
the interface presents a list of user commands or options, such as a
menu, a prominent item in the list meets this criterion.
1. Source Code.
The "source code" for a work means the preferred form of the work
for making modifications to it. "Object code" means any non-source
form of a work.
A "Standard Interface" means an interface that either is an official
standard defined by a recognized standards body, or, in the case of
interfaces specified for a particular programming language, one that
is widely used among developers working in that language.
The "System Libraries" of an executable work include anything, other
than the work as a whole, that (a) is included in the normal form of
packaging a Major Component, but which is not part of that Major
Component, and (b) serves only to enable use of the work with that
Major Component, or to implement a Standard Interface for which an
implementation is available to the public in source code form. A
"Major Component", in this context, means a major essential component
(kernel, window system, and so on) of the specific operating system
(if any) on which the executable work runs, or a compiler used to
produce the work, or an object code interpreter used to run it.
The "Corresponding Source" for a work in object code form means all
the source code needed to generate, install, and (for an executable
work) run the object code and to modify the work, including scripts to
control those activities. However, it does not include the work's
System Libraries, or general-purpose tools or generally available free
programs which are used unmodified in performing those activities but
which are not part of the work. For example, Corresponding Source
includes interface definition files associated with source files for
the work, and the source code for shared libraries and dynamically
linked subprograms that the work is specifically designed to require,
such as by intimate data communication or control flow between those
subprograms and other parts of the work.
The Corresponding Source need not include anything that users
can regenerate automatically from other parts of the Corresponding
Source.
The Corresponding Source for a work in source code form is that
same work.
2. Basic Permissions.
All rights granted under this License are granted for the term of
copyright on the Program, and are irrevocable provided the stated
conditions are met. This License explicitly affirms your unlimited
permission to run the unmodified Program. The output from running a
covered work is covered by this License only if the output, given its
content, constitutes a covered work. This License acknowledges your
rights of fair use or other equivalent, as provided by copyright law.
You may make, run and propagate covered works that you do not
convey, without conditions so long as your license otherwise remains
in force. You may convey covered works to others for the sole purpose
of having them make modifications exclusively for you, or provide you
with facilities for running those works, provided that you comply with
the terms of this License in conveying all material for which you do
not control copyright. Those thus making or running the covered works
for you must do so exclusively on your behalf, under your direction
and control, on terms that prohibit them from making any copies of
your copyrighted material outside their relationship with you.
Conveying under any other circumstances is permitted solely under
the conditions stated below. Sublicensing is not allowed; section 10
makes it unnecessary.
3. Protecting Users' Legal Rights From Anti-Circumvention Law.
No covered work shall be deemed part of an effective technological
measure under any applicable law fulfilling obligations under article
11 of the WIPO copyright treaty adopted on 20 December 1996, or
similar laws prohibiting or restricting circumvention of such
measures.
When you convey a covered work, you waive any legal power to forbid
circumvention of technological measures to the extent such circumvention
is effected by exercising rights under this License with respect to
the covered work, and you disclaim any intention to limit operation or
modification of the work as a means of enforcing, against the work's
users, your or third parties' legal rights to forbid circumvention of
technological measures.
4. Conveying Verbatim Copies.
You may convey verbatim copies of the Program's source code as you
receive it, in any medium, provided that you conspicuously and
appropriately publish on each copy an appropriate copyright notice;
keep intact all notices stating that this License and any
non-permissive terms added in accord with section 7 apply to the code;
keep intact all notices of the absence of any warranty; and give all
recipients a copy of this License along with the Program.
You may charge any price or no price for each copy that you convey,
and you may offer support or warranty protection for a fee.
5. Conveying Modified Source Versions.
You may convey a work based on the Program, or the modifications to
produce it from the Program, in the form of source code under the
terms of section 4, provided that you also meet all of these conditions:
a) The work must carry prominent notices stating that you modified
it, and giving a relevant date.
b) The work must carry prominent notices stating that it is
released under this License and any conditions added under section
7. This requirement modifies the requirement in section 4 to
"keep intact all notices".
c) You must license the entire work, as a whole, under this
License to anyone who comes into possession of a copy. This
License will therefore apply, along with any applicable section 7
additional terms, to the whole of the work, and all its parts,
regardless of how they are packaged. This License gives no
permission to license the work in any other way, but it does not
invalidate such permission if you have separately received it.
d) If the work has interactive user interfaces, each must display
Appropriate Legal Notices; however, if the Program has interactive
interfaces that do not display Appropriate Legal Notices, your
work need not make them do so.
A compilation of a covered work with other separate and independent
works, which are not by their nature extensions of the covered work,
and which are not combined with it such as to form a larger program,
in or on a volume of a storage or distribution medium, is called an
"aggregate" if the compilation and its resulting copyright are not
used to limit the access or legal rights of the compilation's users
beyond what the individual works permit. Inclusion of a covered work
in an aggregate does not cause this License to apply to the other
parts of the aggregate.
6. Conveying Non-Source Forms.
You may convey a covered work in object code form under the terms
of sections 4 and 5, provided that you also convey the
machine-readable Corresponding Source under the terms of this License,
in one of these ways:
a) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by the
Corresponding Source fixed on a durable physical medium
customarily used for software interchange.
b) Convey the object code in, or embodied in, a physical product
(including a physical distribution medium), accompanied by a
written offer, valid for at least three years and valid for as
long as you offer spare parts or customer support for that product
model, to give anyone who possesses the object code either (1) a
copy of the Corresponding Source for all the software in the
product that is covered by this License, on a durable physical
medium customarily used for software interchange, for a price no
more than your reasonable cost of physically performing this
conveying of source, or (2) access to copy the
Corresponding Source from a network server at no charge.
c) Convey individual copies of the object code with a copy of the
written offer to provide the Corresponding Source. This
alternative is allowed only occasionally and noncommercially, and
only if you received the object code with such an offer, in accord
with subsection 6b.
d) Convey the object code by offering access from a designated
place (gratis or for a charge), and offer equivalent access to the
Corresponding Source in the same way through the same place at no
further charge. You need not require recipients to copy the
Corresponding Source along with the object code. If the place to
copy the object code is a network server, the Corresponding Source
may be on a different server (operated by you or a third party)
that supports equivalent copying facilities, provided you maintain
clear directions next to the object code saying where to find the
Corresponding Source. Regardless of what server hosts the
Corresponding Source, you remain obligated to ensure that it is
available for as long as needed to satisfy these requirements.
e) Convey the object code using peer-to-peer transmission, provided
you inform other peers where the object code and Corresponding
Source of the work are being offered to the general public at no
charge under subsection 6d.
A separable portion of the object code, whose source code is excluded
from the Corresponding Source as a System Library, need not be
included in conveying the object code work.
A "User Product" is either (1) a "consumer product", which means any
tangible personal property which is normally used for personal, family,
or household purposes, or (2) anything designed or sold for incorporation
into a dwelling. In determining whether a product is a consumer product,
doubtful cases shall be resolved in favor of coverage. For a particular
product received by a particular user, "normally used" refers to a
typical or common use of that class of product, regardless of the status
of the particular user or of the way in which the particular user
actually uses, or expects or is expected to use, the product. A product
is a consumer product regardless of whether the product has substantial
commercial, industrial or non-consumer uses, unless such uses represent
the only significant mode of use of the product.
"Installation Information" for a User Product means any methods,
procedures, authorization keys, or other information required to install
and execute modified versions of a covered work in that User Product from
a modified version of its Corresponding Source. The information must
suffice to ensure that the continued functioning of the modified object
code is in no case prevented or interfered with solely because
modification has been made.
If you convey an object code work under this section in, or with, or
specifically for use in, a User Product, and the conveying occurs as
part of a transaction in which the right of possession and use of the
User Product is transferred to the recipient in perpetuity or for a
fixed term (regardless of how the transaction is characterized), the
Corresponding Source conveyed under this section must be accompanied
by the Installation Information. But this requirement does not apply
if neither you nor any third party retains the ability to install
modified object code on the User Product (for example, the work has
been installed in ROM).
The requirement to provide Installation Information does not include a
requirement to continue to provide support service, warranty, or updates
for a work that has been modified or installed by the recipient, or for
the User Product in which it has been modified or installed. Access to a
network may be denied when the modification itself materially and
adversely affects the operation of the network or violates the rules and
protocols for communication across the network.
Corresponding Source conveyed, and Installation Information provided,
in accord with this section must be in a format that is publicly
documented (and with an implementation available to the public in
source code form), and must require no special password or key for
unpacking, reading or copying.
7. Additional Terms.
"Additional permissions" are terms that supplement the terms of this
License by making exceptions from one or more of its conditions.
Additional permissions that are applicable to the entire Program shall
be treated as though they were included in this License, to the extent
that they are valid under applicable law. If additional permissions
apply only to part of the Program, that part may be used separately
under those permissions, but the entire Program remains governed by
this License without regard to the additional permissions.
When you convey a copy of a covered work, you may at your option
remove any additional permissions from that copy, or from any part of
it. (Additional permissions may be written to require their own
removal in certain cases when you modify the work.) You may place
additional permissions on material, added by you to a covered work,
for which you have or can give appropriate copyright permission.
Notwithstanding any other provision of this License, for material you
add to a covered work, you may (if authorized by the copyright holders of
that material) supplement the terms of this License with terms:
a) Disclaiming warranty or limiting liability differently from the
terms of sections 15 and 16 of this License; or
b) Requiring preservation of specified reasonable legal notices or
author attributions in that material or in the Appropriate Legal
Notices displayed by works containing it; or
c) Prohibiting misrepresentation of the origin of that material, or
requiring that modified versions of such material be marked in
reasonable ways as different from the original version; or
d) Limiting the use for publicity purposes of names of licensors or
authors of the material; or
e) Declining to grant rights under trademark law for use of some
trade names, trademarks, or service marks; or
f) Requiring indemnification of licensors and authors of that
material by anyone who conveys the material (or modified versions of
it) with contractual assumptions of liability to the recipient, for
any liability that these contractual assumptions directly impose on
those licensors and authors.
All other non-permissive additional terms are considered "further
restrictions" within the meaning of section 10. If the Program as you
received it, or any part of it, contains a notice stating that it is
governed by this License along with a term that is a further
restriction, you may remove that term. If a license document contains
a further restriction but permits relicensing or conveying under this
License, you may add to a covered work material governed by the terms
of that license document, provided that the further restriction does
not survive such relicensing or conveying.
If you add terms to a covered work in accord with this section, you
must place, in the relevant source files, a statement of the
additional terms that apply to those files, or a notice indicating
where to find the applicable terms.
Additional terms, permissive or non-permissive, may be stated in the
form of a separately written license, or stated as exceptions;
the above requirements apply either way.
8. Termination.
You may not propagate or modify a covered work except as expressly
provided under this License. Any attempt otherwise to propagate or
modify it is void, and will automatically terminate your rights under
this License (including any patent licenses granted under the third
paragraph of section 11).
However, if you cease all violation of this License, then your
license from a particular copyright holder is reinstated (a)
provisionally, unless and until the copyright holder explicitly and
finally terminates your license, and (b) permanently, if the copyright
holder fails to notify you of the violation by some reasonable means
prior to 60 days after the cessation.
Moreover, your license from a particular copyright holder is
reinstated permanently if the copyright holder notifies you of the
violation by some reasonable means, this is the first time you have
received notice of violation of this License (for any work) from that
copyright holder, and you cure the violation prior to 30 days after
your receipt of the notice.
Termination of your rights under this section does not terminate the
licenses of parties who have received copies or rights from you under
this License. If your rights have been terminated and not permanently
reinstated, you do not qualify to receive new licenses for the same
material under section 10.
9. Acceptance Not Required for Having Copies.
You are not required to accept this License in order to receive or
run a copy of the Program. Ancillary propagation of a covered work
occurring solely as a consequence of using peer-to-peer transmission
to receive a copy likewise does not require acceptance. However,
nothing other than this License grants you permission to propagate or
modify any covered work. These actions infringe copyright if you do
not accept this License. Therefore, by modifying or propagating a
covered work, you indicate your acceptance of this License to do so.
10. Automatic Licensing of Downstream Recipients.
Each time you convey a covered work, the recipient automatically
receives a license from the original licensors, to run, modify and
propagate that work, subject to this License. You are not responsible
for enforcing compliance by third parties with this License.
An "entity transaction" is a transaction transferring control of an
organization, or substantially all assets of one, or subdividing an
organization, or merging organizations. If propagation of a covered
work results from an entity transaction, each party to that
transaction who receives a copy of the work also receives whatever
licenses to the work the party's predecessor in interest had or could
give under the previous paragraph, plus a right to possession of the
Corresponding Source of the work from the predecessor in interest, if
the predecessor has it or can get it with reasonable efforts.
You may not impose any further restrictions on the exercise of the
rights granted or affirmed under this License. For example, you may
not impose a license fee, royalty, or other charge for exercise of
rights granted under this License, and you may not initiate litigation
(including a cross-claim or counterclaim in a lawsuit) alleging that
any patent claim is infringed by making, using, selling, offering for
sale, or importing the Program or any portion of it.
11. Patents.
A "contributor" is a copyright holder who authorizes use under this
License of the Program or a work on which the Program is based. The
work thus licensed is called the contributor's "contributor version".
A contributor's "essential patent claims" are all patent claims
owned or controlled by the contributor, whether already acquired or
hereafter acquired, that would be infringed by some manner, permitted
by this License, of making, using, or selling its contributor version,
but do not include claims that would be infringed only as a
consequence of further modification of the contributor version. For
purposes of this definition, "control" includes the right to grant
patent sublicenses in a manner consistent with the requirements of
this License.
Each contributor grants you a non-exclusive, worldwide, royalty-free
patent license under the contributor's essential patent claims, to
make, use, sell, offer for sale, import and otherwise run, modify and
propagate the contents of its contributor version.
In the following three paragraphs, a "patent license" is any express
agreement or commitment, however denominated, not to enforce a patent
(such as an express permission to practice a patent or covenant not to
sue for patent infringement). To "grant" such a patent license to a
party means to make such an agreement or commitment not to enforce a
patent against the party.
If you convey a covered work, knowingly relying on a patent license,
and the Corresponding Source of the work is not available for anyone
to copy, free of charge and under the terms of this License, through a
publicly available network server or other readily accessible means,
then you must either (1) cause the Corresponding Source to be so
available, or (2) arrange to deprive yourself of the benefit of the
patent license for this particular work, or (3) arrange, in a manner
consistent with the requirements of this License, to extend the patent
license to downstream recipients. "Knowingly relying" means you have
actual knowledge that, but for the patent license, your conveying the
covered work in a country, or your recipient's use of the covered work
in a country, would infringe one or more identifiable patents in that
country that you have reason to believe are valid.
If, pursuant to or in connection with a single transaction or
arrangement, you convey, or propagate by procuring conveyance of, a
covered work, and grant a patent license to some of the parties
receiving the covered work authorizing them to use, propagate, modify
or convey a specific copy of the covered work, then the patent license
you grant is automatically extended to all recipients of the covered
work and works based on it.
A patent license is "discriminatory" if it does not include within
the scope of its coverage, prohibits the exercise of, or is
conditioned on the non-exercise of one or more of the rights that are
specifically granted under this License. You may not convey a covered
work if you are a party to an arrangement with a third party that is
in the business of distributing software, under which you make payment
to the third party based on the extent of your activity of conveying
the work, and under which the third party grants, to any of the
parties who would receive the covered work from you, a discriminatory
patent license (a) in connection with copies of the covered work
conveyed by you (or copies made from those copies), or (b) primarily
for and in connection with specific products or compilations that
contain the covered work, unless you entered into that arrangement,
or that patent license was granted, prior to 28 March 2007.
Nothing in this License shall be construed as excluding or limiting
any implied license or other defenses to infringement that may
otherwise be available to you under applicable patent law.
12. No Surrender of Others' Freedom.
If conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot convey a
covered work so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you may
not convey it at all. For example, if you agree to terms that obligate you
to collect a royalty for further conveying from those to whom you convey
the Program, the only way you could satisfy both those terms and this
License would be to refrain entirely from conveying the Program.
13. Use with the GNU Affero General Public License.
Notwithstanding any other provision of this License, you have
permission to link or combine any covered work with a work licensed
under version 3 of the GNU Affero General Public License into a single
combined work, and to convey the resulting work. The terms of this
License will continue to apply to the part which is the covered work,
but the special requirements of the GNU Affero General Public License,
section 13, concerning interaction through a network will apply to the
combination as such.
14. Revised Versions of this License.
The Free Software Foundation may publish revised and/or new versions of
the GNU General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the
Program specifies that a certain numbered version of the GNU General
Public License "or any later version" applies to it, you have the
option of following the terms and conditions either of that numbered
version or of any later version published by the Free Software
Foundation. If the Program does not specify a version number of the
GNU General Public License, you may choose any version ever published
by the Free Software Foundation.
If the Program specifies that a proxy can decide which future
versions of the GNU General Public License can be used, that proxy's
public statement of acceptance of a version permanently authorizes you
to choose that version for the Program.
Later license versions may give you additional or different
permissions. However, no additional obligations are imposed on any
author or copyright holder as a result of your choosing to follow a
later version.
15. Disclaimer of Warranty.
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
16. Limitation of Liability.
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
SUCH DAMAGES.
17. Interpretation of Sections 15 and 16.
If the disclaimer of warranty and limitation of liability provided
above cannot be given local legal effect according to their terms,
reviewing courts shall apply local law that most closely approximates
an absolute waiver of all civil liability in connection with the
Program, unless a warranty or assumption of liability accompanies a
copy of the Program in return for a fee.
END OF TERMS AND CONDITIONS
How to Apply These Terms to Your New Programs
If you develop a new program, and you want it to be of the greatest
possible use to the public, the best way to achieve this is to make it
free software which everyone can redistribute and change under these terms.
To do so, attach the following notices to the program. It is safest
to attach them to the start of each source file to most effectively
state the exclusion of warranty; and each file should have at least
the "copyright" line and a pointer to where the full notice is found.
<one line to give the program's name and a brief idea of what it does.>
Copyright (C) <year> <name of author>
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program. If not, see <http://www.gnu.org/licenses/>.
Also add information on how to contact you by electronic and paper mail.
If the program does terminal interaction, make it output a short
notice like this when it starts in an interactive mode:
<program> Copyright (C) <year> <name of author>
This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
This is free software, and you are welcome to redistribute it
under certain conditions; type `show c' for details.
The hypothetical commands `show w' and `show c' should show the appropriate
parts of the General Public License. Of course, your program's commands
might be different; for a GUI interface, you would use an "about box".
You should also get your employer (if you work as a programmer) or school,
if any, to sign a "copyright disclaimer" for the program, if necessary.
For more information on this, and how to apply and follow the GNU GPL, see
<http://www.gnu.org/licenses/>.
The GNU General Public License does not permit incorporating your program
into proprietary programs. If your program is a subroutine library, you
may consider it more useful to permit linking proprietary applications with
the library. If this is what you want to do, use the GNU Lesser General
Public License instead of this License. But first, please read
<http://www.gnu.org/philosophy/why-not-lgpl.html>.

165
3rdparty/hunspell/1.6.2/COPYING.LESSER vendored Normal file
View File

@ -0,0 +1,165 @@
GNU LESSER GENERAL PUBLIC LICENSE
Version 3, 29 June 2007
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
This version of the GNU Lesser General Public License incorporates
the terms and conditions of version 3 of the GNU General Public
License, supplemented by the additional permissions listed below.
0. Additional Definitions.
As used herein, "this License" refers to version 3 of the GNU Lesser
General Public License, and the "GNU GPL" refers to version 3 of the GNU
General Public License.
"The Library" refers to a covered work governed by this License,
other than an Application or a Combined Work as defined below.
An "Application" is any work that makes use of an interface provided
by the Library, but which is not otherwise based on the Library.
Defining a subclass of a class defined by the Library is deemed a mode
of using an interface provided by the Library.
A "Combined Work" is a work produced by combining or linking an
Application with the Library. The particular version of the Library
with which the Combined Work was made is also called the "Linked
Version".
The "Minimal Corresponding Source" for a Combined Work means the
Corresponding Source for the Combined Work, excluding any source code
for portions of the Combined Work that, considered in isolation, are
based on the Application, and not on the Linked Version.
The "Corresponding Application Code" for a Combined Work means the
object code and/or source code for the Application, including any data
and utility programs needed for reproducing the Combined Work from the
Application, but excluding the System Libraries of the Combined Work.
1. Exception to Section 3 of the GNU GPL.
You may convey a covered work under sections 3 and 4 of this License
without being bound by section 3 of the GNU GPL.
2. Conveying Modified Versions.
If you modify a copy of the Library, and, in your modifications, a
facility refers to a function or data to be supplied by an Application
that uses the facility (other than as an argument passed when the
facility is invoked), then you may convey a copy of the modified
version:
a) under this License, provided that you make a good faith effort to
ensure that, in the event an Application does not supply the
function or data, the facility still operates, and performs
whatever part of its purpose remains meaningful, or
b) under the GNU GPL, with none of the additional permissions of
this License applicable to that copy.
3. Object Code Incorporating Material from Library Header Files.
The object code form of an Application may incorporate material from
a header file that is part of the Library. You may convey such object
code under terms of your choice, provided that, if the incorporated
material is not limited to numerical parameters, data structure
layouts and accessors, or small macros, inline functions and templates
(ten or fewer lines in length), you do both of the following:
a) Give prominent notice with each copy of the object code that the
Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the object code with a copy of the GNU GPL and this license
document.
4. Combined Works.
You may convey a Combined Work under terms of your choice that,
taken together, effectively do not restrict modification of the
portions of the Library contained in the Combined Work and reverse
engineering for debugging such modifications, if you also do each of
the following:
a) Give prominent notice with each copy of the Combined Work that
the Library is used in it and that the Library and its use are
covered by this License.
b) Accompany the Combined Work with a copy of the GNU GPL and this license
document.
c) For a Combined Work that displays copyright notices during
execution, include the copyright notice for the Library among
these notices, as well as a reference directing the user to the
copies of the GNU GPL and this license document.
d) Do one of the following:
0) Convey the Minimal Corresponding Source under the terms of this
License, and the Corresponding Application Code in a form
suitable for, and under terms that permit, the user to
recombine or relink the Application with a modified version of
the Linked Version to produce a modified Combined Work, in the
manner specified by section 6 of the GNU GPL for conveying
Corresponding Source.
1) Use a suitable shared library mechanism for linking with the
Library. A suitable mechanism is one that (a) uses at run time
a copy of the Library already present on the user's computer
system, and (b) will operate properly with a modified version
of the Library that is interface-compatible with the Linked
Version.
e) Provide Installation Information, but only if you would otherwise
be required to provide such information under section 6 of the
GNU GPL, and only to the extent that such information is
necessary to install and execute a modified version of the
Combined Work produced by recombining or relinking the
Application with a modified version of the Linked Version. (If
you use option 4d0, the Installation Information must accompany
the Minimal Corresponding Source and Corresponding Application
Code. If you use option 4d1, you must provide the Installation
Information in the manner specified by section 6 of the GNU GPL
for conveying Corresponding Source.)
5. Combined Libraries.
You may place library facilities that are a work based on the
Library side by side in a single library together with other library
facilities that are not Applications and are not covered by this
License, and convey such a combined library under terms of your
choice, if you do both of the following:
a) Accompany the combined library with a copy of the same work based
on the Library, uncombined with any other library facilities,
conveyed under the terms of this License.
b) Give prominent notice with the combined library that part of it
is a work based on the Library, and explaining where to find the
accompanying uncombined form of the same work.
6. Revised Versions of the GNU Lesser General Public License.
The Free Software Foundation may publish revised and/or new versions
of the GNU Lesser General Public License from time to time. Such new
versions will be similar in spirit to the present version, but may
differ in detail to address new problems or concerns.
Each version is given a distinguishing version number. If the
Library as you received it specifies that a certain numbered version
of the GNU Lesser General Public License "or any later version"
applies to it, you have the option of following the terms and
conditions either of that published version or of any later version
published by the Free Software Foundation. If the Library as you
received it does not specify a version number of the GNU Lesser
General Public License, you may choose any version of the GNU Lesser
General Public License ever published by the Free Software Foundation.
If the Library as you received it specifies that a proxy can decide
whether future versions of the GNU Lesser General Public License shall
apply, that proxy's public statement of acceptance of any version is
permanent authorization for you to choose that version for the
Library.

1993
3rdparty/hunspell/1.6.2/ChangeLog vendored Normal file

File diff suppressed because it is too large Load Diff

182
3rdparty/hunspell/1.6.2/README.md vendored Normal file
View File

@ -0,0 +1,182 @@
About Hunspell
==============
NOTICE: Version 2 is in the works. For contributing see
[version 2 specification][v2spec] and the folder `src/hunspell2`.
[v2spec]: https://github.com/hunspell/hunspell/wiki/Version-2-Specification
Hunspell is a spell checker and morphological analyzer library and program
designed for languages with rich morphology and complex word compounding or
character encoding. Hunspell interfaces: Ispell-like terminal interface
using Curses library, Ispell pipe interface, C++ class and C functions.
Hunspell's code base comes from the OpenOffice.org MySpell
(http://lingucomponent.openoffice.org/MySpell-3.zip). See README.MYSPELL,
AUTHORS.MYSPELL and license.myspell files.
Hunspell is designed to eventually replace Myspell in OpenOffice.org.
Main features of Hunspell spell checker and morphological analyzer:
- Unicode support (affix rules work only with the first 65535 Unicode
characters)
- Morphological analysis (in custom item and arrangement style) and stemming
- Max. 65535 affix classes and twofold affix stripping (for agglutinative
languages, like Azeri, Basque, Estonian, Finnish, Hungarian, Turkish, etc.)
- Support complex compoundings (for example, Hungarian and German)
- Support language specific features (for example, special casing of
Azeri and Turkish dotted i, or German sharp s)
- Handle conditional affixes, circumfixes, fogemorphemes,
forbidden words, pseudoroots and homonyms.
- Free software. Versions 1.x are licenced under LGPL, GPL, MPL tri-license.
Version 2 is licenced only under GNU LGPL.
Compiling on GNU/Linux and Unixes
=================================
autoreconf -vfi
./configure
make
sudo make install
sudo ldconfig
For dictionary development, use the `--with-warnings` option of configure.
For interactive user interface of Hunspell executable, use the `--with-ui option`.
The developer packages you need to compile Hunspell's interface:
autoconf automake autopoint libtool g++
Optional developer packages:
- ncurses (need for --with-ui), eg. libncursesw5 for UTF-8
- readline (for fancy input line editing,
configure parameter: --with-readline)
- locale and gettext (but you can also use the
--with-included-gettext configure parameter)
Compiling on Windows
====================
## 1. Compiling with Mingw64 and MSYS2
Download Msys2, update everything and install the following packages:
pacman -S base-devel mingw-w64-x86_64-toolchain mingw-w64-x86_64-libtool
Open Mingw-w64 Win64 prompt and compile the same way as on Linux, see above.
## 2. Compiling in Cygwin environment
Download and install Cygwin environment for Windows with the following
extra packages:
- make
- automake
- autoconf
- libtool
- gcc-g++ development package
- ncurses, readline (for user interface)
- iconv (character conversion)
Then compile the same way as on Linux. Cygwin builds depend on Cygwin1.dll.
Debugging
=========
For debugging we need to create a debug build and then we need to start `gdb`.
make clean
make CXXFLAGS='-g -O0'
libtool --mode=execute gdb src/tools/hunspell
Testing
=======
Testing Hunspell (see tests in tests/ subdirectory):
make check
or with Valgrind debugger:
make check
VALGRIND=[Valgrind_tool] make check
For example:
make check
VALGRIND=memcheck make check
Documentation
=============
features and dictionary format:
man 5 hunspell
man hunspell
hunspell -h
http://hunspell.github.io/
Usage
=====
The src/tools directory contains ten executables after compiling:
- affixcompress: dictionary generation from large (millions of words)
vocabularies
- analyze: example of spell checking, stemming and morphological analysis
- chmorph: example of automatic morphological generation and conversion
- example: example of spell checking and suggestion
- hunspell: main program for spell checking and others (see manual)
- hunzip: decompressor of hzip format
- hzip: compressor of hzip format
- makealias: alias compression (Hunspell only, not back compatible with MySpell)
- munch: dictionary generation from vocabularies (it needs an affix file, too).
- unmunch: list all recognized words of a MySpell dictionary
- wordforms: word generation (Hunspell version of unmunch)
After compiling and installing (see INSTALL) you can
run the Hunspell spell checker (compiled with user interface)
with a Hunspell or Myspell dictionary:
hunspell -d en_US text.txt
or without interface:
hunspell
hunspell -d en_UK -l <text.txt
Dictionaries consist of an affix and dictionary file, see tests/
or http://wiki.services.openoffice.org/wiki/Dictionaries.
Using Hunspell library with GCC
===============================
Including in your program:
#include <hunspell.hxx>
Linking with Hunspell static library:
g++ -lhunspell example.cxx
Dictionaries
------------
Myspell & Hunspell dictionaries:
- http://extensions.libreoffice.org
- http://cgit.freedesktop.org/libreoffice/dictionaries
- http://extensions.openoffice.org
- http://wiki.services.openoffice.org/wiki/Dictionaries
Aspell dictionaries (need some conversion):
- ftp://ftp.gnu.org/gnu/aspell/dict
Conversion steps: see relevant feature request at http://hunspell.github.io/ .
László Németh
nemeth at numbertext org

4
3rdparty/hunspell/1.6.2/TODO vendored Normal file
View File

@ -0,0 +1,4 @@
* shared dictionaries for multi-user environment
* improve compound handling
* Unicode unmunch (munch)
* forbiddenword and pseudoword support in unmunch

View File

@ -1,6 +1,8 @@
/* ***** BEGIN LICENSE BLOCK ***** /* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1 * Version: MPL 1.1/GPL 2.0/LGPL 2.1
* *
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version * The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with * 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at * the License. You may obtain a copy of the License at
@ -11,12 +13,7 @@
* for the specific language governing rights and limitations under the * for the specific language governing rights and limitations under the
* License. * License.
* *
* The Original Code is Hunspell, based on MySpell. * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* The Initial Developers of the Original Code are
* Kevin Hendricks (MySpell) and Németh László (Hunspell).
* Portions created by the Initial Developers are Copyright (C) 2002-2005
* the Initial Developers. All Rights Reserved.
* *
* Contributor(s): * Contributor(s):
* David Einstein * David Einstein
@ -39,6 +36,8 @@
* Bram Moolenaar * Bram Moolenaar
* Dafydd Jones * Dafydd Jones
* Harri Pitkänen * Harri Pitkänen
* Andras Timar
* Tor Lillqvist
* *
* Alternatively, the contents of this file may be used under the terms of * Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or * either the GNU General Public License Version 2 or later (the "GPL"), or

View File

@ -0,0 +1,983 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
#include "affentry.hxx"
#include "csutil.hxx"
AffEntry::~AffEntry() {
if (opts & aeLONGCOND)
free(c.l.conds2);
if (morphcode && !(opts & aeALIASM))
free(morphcode);
if (contclass && !(opts & aeALIASF))
free(contclass);
}
PfxEntry::PfxEntry(AffixMgr* pmgr)
// register affix manager
: pmyMgr(pmgr),
next(NULL),
nexteq(NULL),
nextne(NULL),
flgnxt(NULL) {
}
// add prefix to this word assuming conditions hold
std::string PfxEntry::add(const char* word, size_t len) {
std::string result;
if ((len > strip.size() || (len == 0 && pmyMgr->get_fullstrip())) &&
(len >= numconds) && test_condition(word) &&
(!strip.size() || (strncmp(word, strip.c_str(), strip.size()) == 0))) {
/* we have a match so add prefix */
result.assign(appnd);
result.append(word + strip.size());
}
return result;
}
inline char* PfxEntry::nextchar(char* p) {
if (p) {
p++;
if (opts & aeLONGCOND) {
// jump to the 2nd part of the condition
if (p == c.conds + MAXCONDLEN_1)
return c.l.conds2;
// end of the MAXCONDLEN length condition
} else if (p == c.conds + MAXCONDLEN)
return NULL;
return *p ? p : NULL;
}
return NULL;
}
inline int PfxEntry::test_condition(const char* st) {
const char* pos = NULL; // group with pos input position
bool neg = false; // complementer
bool ingroup = false; // character in the group
if (numconds == 0)
return 1;
char* p = c.conds;
while (1) {
switch (*p) {
case '\0':
return 1;
case '[': {
neg = false;
ingroup = false;
p = nextchar(p);
pos = st;
break;
}
case '^': {
p = nextchar(p);
neg = true;
break;
}
case ']': {
if ((neg && ingroup) || (!neg && !ingroup))
return 0;
pos = NULL;
p = nextchar(p);
// skip the next character
if (!ingroup && *st)
for (st++; (opts & aeUTF8) && (*st & 0xc0) == 0x80; st++)
;
if (*st == '\0' && p)
return 0; // word <= condition
break;
}
case '.':
if (!pos) { // dots are not metacharacters in groups: [.]
p = nextchar(p);
// skip the next character
for (st++; (opts & aeUTF8) && (*st & 0xc0) == 0x80; st++)
;
if (*st == '\0' && p)
return 0; // word <= condition
break;
}
/* FALLTHROUGH */
default: {
if (*st == *p) {
st++;
p = nextchar(p);
if ((opts & aeUTF8) && (*(st - 1) & 0x80)) { // multibyte
while (p && (*p & 0xc0) == 0x80) { // character
if (*p != *st) {
if (!pos)
return 0;
st = pos;
break;
}
p = nextchar(p);
st++;
}
if (pos && st != pos) {
ingroup = true;
while (p && *p != ']' && ((p = nextchar(p)) != NULL)) {
}
}
} else if (pos) {
ingroup = true;
while (p && *p != ']' && ((p = nextchar(p)) != NULL)) {
}
}
} else if (pos) { // group
p = nextchar(p);
} else
return 0;
}
}
if (!p)
return 1;
}
}
// check if this prefix entry matches
struct hentry* PfxEntry::checkword(const char* word,
int len,
char in_compound,
const FLAG needflag) {
struct hentry* he; // hash entry of root word or NULL
// on entry prefix is 0 length or already matches the beginning of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
int tmpl = len - appnd.size(); // length of tmpword
if (tmpl > 0 || (tmpl == 0 && pmyMgr->get_fullstrip())) {
// generate new root word by removing prefix and adding
// back any characters that would have been stripped
std::string tmpword(strip);
tmpword.append(word + appnd.size());
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then check if resulting
// root word in the dictionary
if (test_condition(tmpword.c_str())) {
tmpl += strip.size();
if ((he = pmyMgr->lookup(tmpword.c_str())) != NULL) {
do {
if (TESTAFF(he->astr, aflag, he->alen) &&
// forbid single prefixes with needaffix flag
!TESTAFF(contclass, pmyMgr->get_needaffix(), contclasslen) &&
// needflag
((!needflag) || TESTAFF(he->astr, needflag, he->alen) ||
(contclass && TESTAFF(contclass, needflag, contclasslen))))
return he;
he = he->next_homonym; // check homonyms
} while (he);
}
// prefix matched but no root word was found
// if aeXPRODUCT is allowed, try again but now
// ross checked combined with a suffix
// if ((opts & aeXPRODUCT) && in_compound) {
if ((opts & aeXPRODUCT)) {
he = pmyMgr->suffix_check(tmpword.c_str(), tmpl, aeXPRODUCT, this,
FLAG_NULL, needflag, in_compound);
if (he)
return he;
}
}
}
return NULL;
}
// check if this prefix entry matches
struct hentry* PfxEntry::check_twosfx(const char* word,
int len,
char in_compound,
const FLAG needflag) {
// on entry prefix is 0 length or already matches the beginning of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
int tmpl = len - appnd.size(); // length of tmpword
if ((tmpl > 0 || (tmpl == 0 && pmyMgr->get_fullstrip())) &&
(tmpl + strip.size() >= numconds)) {
// generate new root word by removing prefix and adding
// back any characters that would have been stripped
std::string tmpword(strip);
tmpword.append(word + appnd.size());
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then check if resulting
// root word in the dictionary
if (test_condition(tmpword.c_str())) {
tmpl += strip.size();
// prefix matched but no root word was found
// if aeXPRODUCT is allowed, try again but now
// cross checked combined with a suffix
if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {
// hash entry of root word or NULL
struct hentry* he = pmyMgr->suffix_check_twosfx(tmpword.c_str(), tmpl, aeXPRODUCT, this,
needflag);
if (he)
return he;
}
}
}
return NULL;
}
// check if this prefix entry matches
std::string PfxEntry::check_twosfx_morph(const char* word,
int len,
char in_compound,
const FLAG needflag) {
std::string result;
// on entry prefix is 0 length or already matches the beginning of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
int tmpl = len - appnd.size(); // length of tmpword
if ((tmpl > 0 || (tmpl == 0 && pmyMgr->get_fullstrip())) &&
(tmpl + strip.size() >= numconds)) {
// generate new root word by removing prefix and adding
// back any characters that would have been stripped
std::string tmpword(strip);
tmpword.append(word + appnd.size());
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then check if resulting
// root word in the dictionary
if (test_condition(tmpword.c_str())) {
tmpl += strip.size();
// prefix matched but no root word was found
// if aeXPRODUCT is allowed, try again but now
// ross checked combined with a suffix
if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {
result = pmyMgr->suffix_check_twosfx_morph(tmpword.c_str(), tmpl,
aeXPRODUCT,
this, needflag);
}
}
}
return result;
}
// check if this prefix entry matches
std::string PfxEntry::check_morph(const char* word,
int len,
char in_compound,
const FLAG needflag) {
std::string result;
// on entry prefix is 0 length or already matches the beginning of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
int tmpl = len - appnd.size(); // length of tmpword
if ((tmpl > 0 || (tmpl == 0 && pmyMgr->get_fullstrip())) &&
(tmpl + strip.size() >= numconds)) {
// generate new root word by removing prefix and adding
// back any characters that would have been stripped
std::string tmpword(strip);
tmpword.append(word + appnd.size());
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then check if resulting
// root word in the dictionary
if (test_condition(tmpword.c_str())) {
tmpl += strip.size();
struct hentry* he; // hash entry of root word or NULL
if ((he = pmyMgr->lookup(tmpword.c_str())) != NULL) {
do {
if (TESTAFF(he->astr, aflag, he->alen) &&
// forbid single prefixes with needaffix flag
!TESTAFF(contclass, pmyMgr->get_needaffix(), contclasslen) &&
// needflag
((!needflag) || TESTAFF(he->astr, needflag, he->alen) ||
(contclass && TESTAFF(contclass, needflag, contclasslen)))) {
if (morphcode) {
result.append(" ");
result.append(morphcode);
} else
result.append(getKey());
if (!HENTRY_FIND(he, MORPH_STEM)) {
result.append(" ");
result.append(MORPH_STEM);
result.append(HENTRY_WORD(he));
}
// store the pointer of the hash entry
if (HENTRY_DATA(he)) {
result.append(" ");
result.append(HENTRY_DATA2(he));
} else {
// return with debug information
char* flag = pmyMgr->encode_flag(getFlag());
result.append(" ");
result.append(MORPH_FLAG);
result.append(flag);
free(flag);
}
result.append("\n");
}
he = he->next_homonym;
} while (he);
}
// prefix matched but no root word was found
// if aeXPRODUCT is allowed, try again but now
// ross checked combined with a suffix
if ((opts & aeXPRODUCT) && (in_compound != IN_CPD_BEGIN)) {
std::string st = pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, aeXPRODUCT, this,
FLAG_NULL, needflag);
if (!st.empty()) {
result.append(st);
}
}
}
}
return result;
}
SfxEntry::SfxEntry(AffixMgr* pmgr)
: pmyMgr(pmgr) // register affix manager
,
next(NULL),
nexteq(NULL),
nextne(NULL),
flgnxt(NULL),
l_morph(NULL),
r_morph(NULL),
eq_morph(NULL) {
}
// add suffix to this word assuming conditions hold
std::string SfxEntry::add(const char* word, size_t len) {
std::string result;
/* make sure all conditions match */
if ((len > strip.size() || (len == 0 && pmyMgr->get_fullstrip())) &&
(len >= numconds) && test_condition(word + len, word) &&
(!strip.size() ||
(strcmp(word + len - strip.size(), strip.c_str()) == 0))) {
result.assign(word);
/* we have a match so add suffix */
result.replace(len - strip.size(), std::string::npos, appnd);
}
return result;
}
inline char* SfxEntry::nextchar(char* p) {
if (p) {
p++;
if (opts & aeLONGCOND) {
// jump to the 2nd part of the condition
if (p == c.l.conds1 + MAXCONDLEN_1)
return c.l.conds2;
// end of the MAXCONDLEN length condition
} else if (p == c.conds + MAXCONDLEN)
return NULL;
return *p ? p : NULL;
}
return NULL;
}
inline int SfxEntry::test_condition(const char* st, const char* beg) {
const char* pos = NULL; // group with pos input position
bool neg = false; // complementer
bool ingroup = false; // character in the group
if (numconds == 0)
return 1;
char* p = c.conds;
st--;
int i = 1;
while (1) {
switch (*p) {
case '\0':
return 1;
case '[':
p = nextchar(p);
pos = st;
break;
case '^':
p = nextchar(p);
neg = true;
break;
case ']':
if (!neg && !ingroup)
return 0;
i++;
// skip the next character
if (!ingroup) {
for (; (opts & aeUTF8) && (st >= beg) && (*st & 0xc0) == 0x80; st--)
;
st--;
}
pos = NULL;
neg = false;
ingroup = false;
p = nextchar(p);
if (st < beg && p)
return 0; // word <= condition
break;
case '.':
if (!pos) {
// dots are not metacharacters in groups: [.]
p = nextchar(p);
// skip the next character
for (st--; (opts & aeUTF8) && (st >= beg) && (*st & 0xc0) == 0x80;
st--)
;
if (st < beg) { // word <= condition
if (p)
return 0;
else
return 1;
}
if ((opts & aeUTF8) && (*st & 0x80)) { // head of the UTF-8 character
st--;
if (st < beg) { // word <= condition
if (p)
return 0;
else
return 1;
}
}
break;
}
/* FALLTHROUGH */
default: {
if (*st == *p) {
p = nextchar(p);
if ((opts & aeUTF8) && (*st & 0x80)) {
st--;
while (p && (st >= beg)) {
if (*p != *st) {
if (!pos)
return 0;
st = pos;
break;
}
// first byte of the UTF-8 multibyte character
if ((*p & 0xc0) != 0x80)
break;
p = nextchar(p);
st--;
}
if (pos && st != pos) {
if (neg)
return 0;
else if (i == numconds)
return 1;
ingroup = true;
while (p && *p != ']' && ((p = nextchar(p)) != NULL)) {
}
st--;
}
if (p && *p != ']')
p = nextchar(p);
} else if (pos) {
if (neg)
return 0;
else if (i == numconds)
return 1;
ingroup = true;
while (p && *p != ']' && ((p = nextchar(p)) != NULL)) {
}
// if (p && *p != ']') p = nextchar(p);
st--;
}
if (!pos) {
i++;
st--;
}
if (st < beg && p && *p != ']')
return 0; // word <= condition
} else if (pos) { // group
p = nextchar(p);
} else
return 0;
}
}
if (!p)
return 1;
}
}
// see if this suffix is present in the word
struct hentry* SfxEntry::checkword(const char* word,
int len,
int optflags,
PfxEntry* ppfx,
const FLAG cclass,
const FLAG needflag,
const FLAG badflag) {
struct hentry* he; // hash entry pointer
PfxEntry* ep = ppfx;
// if this suffix is being cross checked with a prefix
// but it does not support cross products skip it
if (((optflags & aeXPRODUCT) != 0) && ((opts & aeXPRODUCT) == 0))
return NULL;
// upon entry suffix is 0 length or already matches the end of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
int tmpl = len - appnd.size(); // length of tmpword
// the second condition is not enough for UTF-8 strings
// it checked in test_condition()
if ((tmpl > 0 || (tmpl == 0 && pmyMgr->get_fullstrip())) &&
(tmpl + strip.size() >= numconds)) {
// generate new root word by removing suffix and adding
// back any characters that would have been stripped or
// or null terminating the shorter string
std::string tmpstring(word, tmpl);
if (strip.size()) {
tmpstring.append(strip);
}
const char* tmpword = tmpstring.c_str();
const char* endword = tmpword + tmpstring.size();
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then check if resulting
// root word in the dictionary
if (test_condition(endword, tmpword)) {
#ifdef SZOSZABLYA_POSSIBLE_ROOTS
fprintf(stdout, "%s %s %c\n", word, tmpword, aflag);
#endif
if ((he = pmyMgr->lookup(tmpword)) != NULL) {
do {
// check conditional suffix (enabled by prefix)
if ((TESTAFF(he->astr, aflag, he->alen) ||
(ep && ep->getCont() &&
TESTAFF(ep->getCont(), aflag, ep->getContLen()))) &&
(((optflags & aeXPRODUCT) == 0) ||
(ep && TESTAFF(he->astr, ep->getFlag(), he->alen)) ||
// enabled by prefix
((contclass) &&
(ep && TESTAFF(contclass, ep->getFlag(), contclasslen)))) &&
// handle cont. class
((!cclass) ||
((contclass) && TESTAFF(contclass, cclass, contclasslen))) &&
// check only in compound homonyms (bad flags)
(!badflag || !TESTAFF(he->astr, badflag, he->alen)) &&
// handle required flag
((!needflag) ||
(TESTAFF(he->astr, needflag, he->alen) ||
((contclass) && TESTAFF(contclass, needflag, contclasslen)))))
return he;
he = he->next_homonym; // check homonyms
} while (he);
}
}
}
return NULL;
}
// see if two-level suffix is present in the word
struct hentry* SfxEntry::check_twosfx(const char* word,
int len,
int optflags,
PfxEntry* ppfx,
const FLAG needflag) {
PfxEntry* ep = ppfx;
// if this suffix is being cross checked with a prefix
// but it does not support cross products skip it
if ((optflags & aeXPRODUCT) != 0 && (opts & aeXPRODUCT) == 0)
return NULL;
// upon entry suffix is 0 length or already matches the end of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
int tmpl = len - appnd.size(); // length of tmpword
if ((tmpl > 0 || (tmpl == 0 && pmyMgr->get_fullstrip())) &&
(tmpl + strip.size() >= numconds)) {
// generate new root word by removing suffix and adding
// back any characters that would have been stripped or
// or null terminating the shorter string
std::string tmpword(word);
tmpword.resize(tmpl);
tmpword.append(strip);
tmpl += strip.size();
const char* beg = tmpword.c_str();
const char* end = beg + tmpl;
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then recall suffix_check
if (test_condition(end, beg)) {
struct hentry* he; // hash entry pointer
if (ppfx) {
// handle conditional suffix
if ((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen))
he = pmyMgr->suffix_check(tmpword.c_str(), tmpl, 0, NULL,
(FLAG)aflag, needflag, IN_CPD_NOT);
else
he = pmyMgr->suffix_check(tmpword.c_str(), tmpl, optflags, ppfx,
(FLAG)aflag, needflag, IN_CPD_NOT);
} else {
he = pmyMgr->suffix_check(tmpword.c_str(), tmpl, 0, NULL,
(FLAG)aflag, needflag, IN_CPD_NOT);
}
if (he)
return he;
}
}
return NULL;
}
// see if two-level suffix is present in the word
std::string SfxEntry::check_twosfx_morph(const char* word,
int len,
int optflags,
PfxEntry* ppfx,
const FLAG needflag) {
PfxEntry* ep = ppfx;
std::string result;
// if this suffix is being cross checked with a prefix
// but it does not support cross products skip it
if ((optflags & aeXPRODUCT) != 0 && (opts & aeXPRODUCT) == 0)
return result;
// upon entry suffix is 0 length or already matches the end of the word.
// So if the remaining root word has positive length
// and if there are enough chars in root word and added back strip chars
// to meet the number of characters conditions, then test it
int tmpl = len - appnd.size(); // length of tmpword
if ((tmpl > 0 || (tmpl == 0 && pmyMgr->get_fullstrip())) &&
(tmpl + strip.size() >= numconds)) {
// generate new root word by removing suffix and adding
// back any characters that would have been stripped or
// or null terminating the shorter string
std::string tmpword(word);
tmpword.resize(tmpl);
tmpword.append(strip);
tmpl += strip.size();
const char* beg = tmpword.c_str();
const char* end = beg + tmpl;
// now make sure all of the conditions on characters
// are met. Please see the appendix at the end of
// this file for more info on exactly what is being
// tested
// if all conditions are met then recall suffix_check
if (test_condition(end, beg)) {
if (ppfx) {
// handle conditional suffix
if ((contclass) && TESTAFF(contclass, ep->getFlag(), contclasslen)) {
std::string st = pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, 0, NULL, aflag,
needflag);
if (!st.empty()) {
if (ppfx->getMorph()) {
result.append(ppfx->getMorph());
result.append(" ");
}
result.append(st);
mychomp(result);
}
} else {
std::string st = pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, optflags, ppfx, aflag,
needflag);
if (!st.empty()) {
result.append(st);
mychomp(result);
}
}
} else {
std::string st = pmyMgr->suffix_check_morph(tmpword.c_str(), tmpl, 0, NULL, aflag, needflag);
if (!st.empty()) {
result.append(st);
mychomp(result);
}
}
}
}
return result;
}
// get next homonym with same affix
struct hentry* SfxEntry::get_next_homonym(struct hentry* he,
int optflags,
PfxEntry* ppfx,
const FLAG cclass,
const FLAG needflag) {
PfxEntry* ep = ppfx;
FLAG eFlag = ep ? ep->getFlag() : FLAG_NULL;
while (he->next_homonym) {
he = he->next_homonym;
if ((TESTAFF(he->astr, aflag, he->alen) ||
(ep && ep->getCont() &&
TESTAFF(ep->getCont(), aflag, ep->getContLen()))) &&
((optflags & aeXPRODUCT) == 0 || TESTAFF(he->astr, eFlag, he->alen) ||
// handle conditional suffix
((contclass) && TESTAFF(contclass, eFlag, contclasslen))) &&
// handle cont. class
((!cclass) ||
((contclass) && TESTAFF(contclass, cclass, contclasslen))) &&
// handle required flag
((!needflag) ||
(TESTAFF(he->astr, needflag, he->alen) ||
((contclass) && TESTAFF(contclass, needflag, contclasslen)))))
return he;
}
return NULL;
}
void SfxEntry::initReverseWord() {
rappnd = appnd;
reverseword(rappnd);
}
#if 0
Appendix: Understanding Affix Code
An affix is either a prefix or a suffix attached to root words to make
other words.
Basically a Prefix or a Suffix is set of AffEntry objects
which store information about the prefix or suffix along
with supporting routines to check if a word has a particular
prefix or suffix or a combination.
The structure affentry is defined as follows:
struct affentry
{
unsigned short aflag; // ID used to represent the affix
std::string strip; // string to strip before adding affix
std::string appnd; // the affix string to add
char numconds; // the number of conditions that must be met
char opts; // flag: aeXPRODUCT- combine both prefix and suffix
char conds[SETSIZE]; // array which encodes the conditions to be met
};
Here is a suffix borrowed from the en_US.aff file. This file
is whitespace delimited.
SFX D Y 4
SFX D 0 e d
SFX D y ied [^aeiou]y
SFX D 0 ed [^ey]
SFX D 0 ed [aeiou]y
This information can be interpreted as follows:
In the first line has 4 fields
Field
-----
1 SFX - indicates this is a suffix
2 D - is the name of the character flag which represents this suffix
3 Y - indicates it can be combined with prefixes (cross product)
4 4 - indicates that sequence of 4 affentry structures are needed to
properly store the affix information
The remaining lines describe the unique information for the 4 SfxEntry
objects that make up this affix. Each line can be interpreted
as follows: (note fields 1 and 2 are as a check against line 1 info)
Field
-----
1 SFX - indicates this is a suffix
2 D - is the name of the character flag for this affix
3 y - the string of chars to strip off before adding affix
(a 0 here indicates the NULL string)
4 ied - the string of affix characters to add
5 [^aeiou]y - the conditions which must be met before the affix
can be applied
Field 5 is interesting. Since this is a suffix, field 5 tells us that
there are 2 conditions that must be met. The first condition is that
the next to the last character in the word must *NOT* be any of the
following "a", "e", "i", "o" or "u". The second condition is that
the last character of the word must end in "y".
So how can we encode this information concisely and be able to
test for both conditions in a fast manner? The answer is found
but studying the wonderful ispell code of Geoff Kuenning, et.al.
(now available under a normal BSD license).
If we set up a conds array of 256 bytes indexed (0 to 255) and access it
using a character (cast to an unsigned char) of a string, we have 8 bits
of information we can store about that character. Specifically we
could use each bit to say if that character is allowed in any of the
last (or first for prefixes) 8 characters of the word.
Basically, each character at one end of the word (up to the number
of conditions) is used to index into the conds array and the resulting
value found there says whether the that character is valid for a
specific character position in the word.
For prefixes, it does this by setting bit 0 if that char is valid
in the first position, bit 1 if valid in the second position, and so on.
If a bit is not set, then that char is not valid for that postion in the
word.
If working with suffixes bit 0 is used for the character closest
to the front, bit 1 for the next character towards the end, ...,
with bit numconds-1 representing the last char at the end of the string.
Note: since entries in the conds[] are 8 bits, only 8 conditions
(read that only 8 character positions) can be examined at one
end of a word (the beginning for prefixes and the end for suffixes.
So to make this clearer, lets encode the conds array values for the
first two affentries for the suffix D described earlier.
For the first affentry:
numconds = 1 (only examine the last character)
conds['e'] = (1 << 0) (the word must end in an E)
all others are all 0
For the second affentry:
numconds = 2 (only examine the last two characters)
conds[X] = conds[X] | (1 << 0) (aeiou are not allowed)
where X is all characters *but* a, e, i, o, or u
conds['y'] = (1 << 1) (the last char must be a y)
all other bits for all other entries in the conds array are zero
#endif

View File

@ -0,0 +1,223 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef AFFIX_HXX_
#define AFFIX_HXX_
#include "atypes.hxx"
#include "baseaffix.hxx"
#include "affixmgr.hxx"
/* A Prefix Entry */
class PfxEntry : public AffEntry {
private:
PfxEntry(const PfxEntry&);
PfxEntry& operator=(const PfxEntry&);
private:
AffixMgr* pmyMgr;
PfxEntry* next;
PfxEntry* nexteq;
PfxEntry* nextne;
PfxEntry* flgnxt;
public:
explicit PfxEntry(AffixMgr* pmgr);
bool allowCross() const { return ((opts & aeXPRODUCT) != 0); }
struct hentry* checkword(const char* word,
int len,
char in_compound,
const FLAG needflag = FLAG_NULL);
struct hentry* check_twosfx(const char* word,
int len,
char in_compound,
const FLAG needflag = FLAG_NULL);
std::string check_morph(const char* word,
int len,
char in_compound,
const FLAG needflag = FLAG_NULL);
std::string check_twosfx_morph(const char* word,
int len,
char in_compound,
const FLAG needflag = FLAG_NULL);
FLAG getFlag() { return aflag; }
const char* getKey() { return appnd.c_str(); }
std::string add(const char* word, size_t len);
inline short getKeyLen() { return appnd.size(); }
inline const char* getMorph() { return morphcode; }
inline const unsigned short* getCont() { return contclass; }
inline short getContLen() { return contclasslen; }
inline PfxEntry* getNext() { return next; }
inline PfxEntry* getNextNE() { return nextne; }
inline PfxEntry* getNextEQ() { return nexteq; }
inline PfxEntry* getFlgNxt() { return flgnxt; }
inline void setNext(PfxEntry* ptr) { next = ptr; }
inline void setNextNE(PfxEntry* ptr) { nextne = ptr; }
inline void setNextEQ(PfxEntry* ptr) { nexteq = ptr; }
inline void setFlgNxt(PfxEntry* ptr) { flgnxt = ptr; }
inline char* nextchar(char* p);
inline int test_condition(const char* st);
};
/* A Suffix Entry */
class SfxEntry : public AffEntry {
private:
SfxEntry(const SfxEntry&);
SfxEntry& operator=(const SfxEntry&);
private:
AffixMgr* pmyMgr;
std::string rappnd;
SfxEntry* next;
SfxEntry* nexteq;
SfxEntry* nextne;
SfxEntry* flgnxt;
SfxEntry* l_morph;
SfxEntry* r_morph;
SfxEntry* eq_morph;
public:
explicit SfxEntry(AffixMgr* pmgr);
bool allowCross() const { return ((opts & aeXPRODUCT) != 0); }
struct hentry* checkword(const char* word,
int len,
int optflags,
PfxEntry* ppfx,
const FLAG cclass,
const FLAG needflag,
const FLAG badflag);
struct hentry* check_twosfx(const char* word,
int len,
int optflags,
PfxEntry* ppfx,
const FLAG needflag = FLAG_NULL);
std::string check_twosfx_morph(const char* word,
int len,
int optflags,
PfxEntry* ppfx,
const FLAG needflag = FLAG_NULL);
struct hentry* get_next_homonym(struct hentry* he);
struct hentry* get_next_homonym(struct hentry* word,
int optflags,
PfxEntry* ppfx,
const FLAG cclass,
const FLAG needflag);
FLAG getFlag() { return aflag; }
const char* getKey() { return rappnd.c_str(); }
std::string add(const char* word, size_t len);
inline const char* getMorph() { return morphcode; }
inline const unsigned short* getCont() { return contclass; }
inline short getContLen() { return contclasslen; }
inline const char* getAffix() { return appnd.c_str(); }
inline short getKeyLen() { return appnd.size(); }
inline SfxEntry* getNext() { return next; }
inline SfxEntry* getNextNE() { return nextne; }
inline SfxEntry* getNextEQ() { return nexteq; }
inline SfxEntry* getLM() { return l_morph; }
inline SfxEntry* getRM() { return r_morph; }
inline SfxEntry* getEQM() { return eq_morph; }
inline SfxEntry* getFlgNxt() { return flgnxt; }
inline void setNext(SfxEntry* ptr) { next = ptr; }
inline void setNextNE(SfxEntry* ptr) { nextne = ptr; }
inline void setNextEQ(SfxEntry* ptr) { nexteq = ptr; }
inline void setFlgNxt(SfxEntry* ptr) { flgnxt = ptr; }
void initReverseWord();
inline char* nextchar(char* p);
inline int test_condition(const char* st, const char* begin);
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,369 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef AFFIXMGR_HXX_
#define AFFIXMGR_HXX_
#include <stdio.h>
#include <string>
#include <vector>
#include "atypes.hxx"
#include "baseaffix.hxx"
#include "hashmgr.hxx"
#include "phonet.hxx"
#include "replist.hxx"
// check flag duplication
#define dupSFX (1 << 0)
#define dupPFX (1 << 1)
class PfxEntry;
class SfxEntry;
class AffixMgr {
PfxEntry* pStart[SETSIZE];
SfxEntry* sStart[SETSIZE];
PfxEntry* pFlag[SETSIZE];
SfxEntry* sFlag[SETSIZE];
const std::vector<HashMgr*>& alldic;
const HashMgr* pHMgr;
std::string keystring;
std::string trystring;
std::string encoding;
struct cs_info* csconv;
int utf8;
int complexprefixes;
FLAG compoundflag;
FLAG compoundbegin;
FLAG compoundmiddle;
FLAG compoundend;
FLAG compoundroot;
FLAG compoundforbidflag;
FLAG compoundpermitflag;
int compoundmoresuffixes;
int checkcompounddup;
int checkcompoundrep;
int checkcompoundcase;
int checkcompoundtriple;
int simplifiedtriple;
FLAG forbiddenword;
FLAG nosuggest;
FLAG nongramsuggest;
FLAG needaffix;
int cpdmin;
bool parsedrep;
std::vector<replentry> reptable;
RepList* iconvtable;
RepList* oconvtable;
bool parsedmaptable;
std::vector<mapentry> maptable;
bool parsedbreaktable;
std::vector<std::string> breaktable;
bool parsedcheckcpd;
std::vector<patentry> checkcpdtable;
int simplifiedcpd;
bool parseddefcpd;
std::vector<flagentry> defcpdtable;
phonetable* phone;
int maxngramsugs;
int maxcpdsugs;
int maxdiff;
int onlymaxdiff;
int nosplitsugs;
int sugswithdots;
int cpdwordmax;
int cpdmaxsyllable;
std::string cpdvowels; // vowels (for calculating of Hungarian compounding limit,
std::vector<w_char> cpdvowels_utf16; //vowels for UTF-8 encoding
std::string cpdsyllablenum; // syllable count incrementing flag
const char* pfxappnd; // BUG: not stateless
const char* sfxappnd; // BUG: not stateless
int sfxextra; // BUG: not stateless
FLAG sfxflag; // BUG: not stateless
char* derived; // BUG: not stateless
SfxEntry* sfx; // BUG: not stateless
PfxEntry* pfx; // BUG: not stateless
int checknum;
std::string wordchars; // letters + spec. word characters
std::vector<w_char> wordchars_utf16;
std::string ignorechars; // letters + spec. word characters
std::vector<w_char> ignorechars_utf16;
std::string version; // affix and dictionary file version string
std::string lang; // language
int langnum;
FLAG lemma_present;
FLAG circumfix;
FLAG onlyincompound;
FLAG keepcase;
FLAG forceucase;
FLAG warn;
int forbidwarn;
FLAG substandard;
int checksharps;
int fullstrip;
int havecontclass; // boolean variable
char contclasses[CONTSIZE]; // flags of possible continuing classes (twofold
// affix)
public:
AffixMgr(const char* affpath, const std::vector<HashMgr*>& ptr, const char* key = NULL);
~AffixMgr();
struct hentry* affix_check(const char* word,
int len,
const unsigned short needflag = (unsigned short)0,
char in_compound = IN_CPD_NOT);
struct hentry* prefix_check(const char* word,
int len,
char in_compound,
const FLAG needflag = FLAG_NULL);
inline int isSubset(const char* s1, const char* s2);
struct hentry* prefix_check_twosfx(const char* word,
int len,
char in_compound,
const FLAG needflag = FLAG_NULL);
inline int isRevSubset(const char* s1, const char* end_of_s2, int len);
struct hentry* suffix_check(const char* word,
int len,
int sfxopts,
PfxEntry* ppfx,
const FLAG cclass = FLAG_NULL,
const FLAG needflag = FLAG_NULL,
char in_compound = IN_CPD_NOT);
struct hentry* suffix_check_twosfx(const char* word,
int len,
int sfxopts,
PfxEntry* ppfx,
const FLAG needflag = FLAG_NULL);
std::string affix_check_morph(const char* word,
int len,
const FLAG needflag = FLAG_NULL,
char in_compound = IN_CPD_NOT);
std::string prefix_check_morph(const char* word,
int len,
char in_compound,
const FLAG needflag = FLAG_NULL);
std::string suffix_check_morph(const char* word,
int len,
int sfxopts,
PfxEntry* ppfx,
const FLAG cclass = FLAG_NULL,
const FLAG needflag = FLAG_NULL,
char in_compound = IN_CPD_NOT);
std::string prefix_check_twosfx_morph(const char* word,
int len,
char in_compound,
const FLAG needflag = FLAG_NULL);
std::string suffix_check_twosfx_morph(const char* word,
int len,
int sfxopts,
PfxEntry* ppfx,
const FLAG needflag = FLAG_NULL);
std::string morphgen(const char* ts,
int wl,
const unsigned short* ap,
unsigned short al,
const char* morph,
const char* targetmorph,
int level);
int expand_rootword(struct guessword* wlst,
int maxn,
const char* ts,
int wl,
const unsigned short* ap,
unsigned short al,
const char* bad,
int,
const char*);
short get_syllable(const std::string& word);
int cpdrep_check(const char* word, int len);
int cpdpat_check(const char* word,
int len,
hentry* r1,
hentry* r2,
const char affixed);
int defcpd_check(hentry*** words,
short wnum,
hentry* rv,
hentry** rwords,
char all);
int cpdcase_check(const char* word, int len);
inline int candidate_check(const char* word, int len);
void setcminmax(int* cmin, int* cmax, const char* word, int len);
struct hentry* compound_check(const std::string& word,
short wordnum,
short numsyllable,
short maxwordnum,
short wnum,
hentry** words,
hentry** rwords,
char hu_mov_rule,
char is_sug,
int* info);
int compound_check_morph(const char* word,
int len,
short wordnum,
short numsyllable,
short maxwordnum,
short wnum,
hentry** words,
hentry** rwords,
char hu_mov_rule,
std::string& result,
const std::string* partresult);
std::vector<std::string> get_suffix_words(short unsigned* suff,
int len,
const char* root_word);
struct hentry* lookup(const char* word);
const std::vector<replentry>& get_reptable() const;
RepList* get_iconvtable() const;
RepList* get_oconvtable() const;
struct phonetable* get_phonetable() const;
const std::vector<mapentry>& get_maptable() const;
const std::vector<std::string>& get_breaktable() const;
const std::string& get_encoding();
int get_langnum() const;
char* get_key_string();
char* get_try_string() const;
const std::string& get_wordchars() const;
const std::vector<w_char>& get_wordchars_utf16() const;
const char* get_ignore() const;
const std::vector<w_char>& get_ignore_utf16() const;
int get_compound() const;
FLAG get_compoundflag() const;
FLAG get_forbiddenword() const;
FLAG get_nosuggest() const;
FLAG get_nongramsuggest() const;
FLAG get_needaffix() const;
FLAG get_onlyincompound() const;
const char* get_derived() const;
const std::string& get_version() const;
int have_contclass() const;
int get_utf8() const;
int get_complexprefixes() const;
char* get_suffixed(char) const;
int get_maxngramsugs() const;
int get_maxcpdsugs() const;
int get_maxdiff() const;
int get_onlymaxdiff() const;
int get_nosplitsugs() const;
int get_sugswithdots(void) const;
FLAG get_keepcase(void) const;
FLAG get_forceucase(void) const;
FLAG get_warn(void) const;
int get_forbidwarn(void) const;
int get_checksharps(void) const;
char* encode_flag(unsigned short aflag) const;
int get_fullstrip() const;
private:
int parse_file(const char* affpath, const char* key);
bool parse_flag(const std::string& line, unsigned short* out, FileMgr* af);
bool parse_num(const std::string& line, int* out, FileMgr* af);
bool parse_cpdsyllable(const std::string& line, FileMgr* af);
bool parse_reptable(const std::string& line, FileMgr* af);
bool parse_convtable(const std::string& line,
FileMgr* af,
RepList** rl,
const std::string& keyword);
bool parse_phonetable(const std::string& line, FileMgr* af);
bool parse_maptable(const std::string& line, FileMgr* af);
bool parse_breaktable(const std::string& line, FileMgr* af);
bool parse_checkcpdtable(const std::string& line, FileMgr* af);
bool parse_defcpdtable(const std::string& line, FileMgr* af);
bool parse_affix(const std::string& line, const char at, FileMgr* af, char* dupflags);
void reverse_condition(std::string&);
std::string& debugflag(std::string& result, unsigned short flag);
int condlen(const char*);
int encodeit(AffEntry& entry, const char* cs);
int build_pfxtree(PfxEntry* pfxptr);
int build_sfxtree(SfxEntry* sfxptr);
int process_pfx_order();
int process_sfx_order();
PfxEntry* process_pfx_in_order(PfxEntry* ptr, PfxEntry* nptr);
SfxEntry* process_sfx_in_order(SfxEntry* ptr, SfxEntry* nptr);
int process_pfx_tree_to_list();
int process_sfx_tree_to_list();
int redundant_condition(char, const char* strip, int stripl, const char* cond, int);
void finishFileMgr(FileMgr* afflst);
};
#endif

View File

@ -0,0 +1,119 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#ifndef ATYPES_HXX_
#define ATYPES_HXX_
#ifndef HUNSPELL_WARNING
#include <stdio.h>
#ifdef HUNSPELL_WARNING_ON
#define HUNSPELL_WARNING fprintf
#else
// empty inline function to switch off warnings (instead of the C99 standard
// variadic macros)
static inline void HUNSPELL_WARNING(FILE*, const char*, ...) {}
#endif
#endif
// HUNSTEM def.
#define HUNSTEM
#include "w_char.hxx"
#include <algorithm>
#include <string>
#include <vector>
#define SETSIZE 256
#define CONTSIZE 65536
// AffEntry options
#define aeXPRODUCT (1 << 0)
#define aeUTF8 (1 << 1)
#define aeALIASF (1 << 2)
#define aeALIASM (1 << 3)
#define aeLONGCOND (1 << 4)
// compound options
#define IN_CPD_NOT 0
#define IN_CPD_BEGIN 1
#define IN_CPD_END 2
#define IN_CPD_OTHER 3
// info options
#define SPELL_COMPOUND (1 << 0)
#define SPELL_FORBIDDEN (1 << 1)
#define SPELL_ALLCAP (1 << 2)
#define SPELL_NOCAP (1 << 3)
#define SPELL_INITCAP (1 << 4)
#define SPELL_ORIGCAP (1 << 5)
#define SPELL_WARN (1 << 6)
#define MINCPDLEN 3
#define MAXCOMPOUND 10
#define MAXCONDLEN 20
#define MAXCONDLEN_1 (MAXCONDLEN - sizeof(char*))
#define MAXACC 1000
#define FLAG unsigned short
#define FLAG_NULL 0x00
#define FREE_FLAG(a) a = 0
#define TESTAFF(a, b, c) (std::binary_search(a, a + c, b))
struct guessword {
char* word;
bool allow;
char* orig;
};
typedef std::vector<std::string> mapentry;
typedef std::vector<FLAG> flagentry;
struct patentry {
std::string pattern;
std::string pattern2;
std::string pattern3;
FLAG cond;
FLAG cond2;
patentry()
: cond(FLAG_NULL)
, cond2(FLAG_NULL) {
}
};
#endif

View File

@ -0,0 +1,74 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#ifndef BASEAFF_HXX_
#define BASEAFF_HXX_
#include <string>
class AffEntry {
private:
AffEntry(const AffEntry&);
AffEntry& operator=(const AffEntry&);
public:
AffEntry()
: numconds(0),
opts(0),
aflag(0),
morphcode(0),
contclass(NULL),
contclasslen(0) {}
virtual ~AffEntry();
std::string appnd;
std::string strip;
unsigned char numconds;
char opts;
unsigned short aflag;
union {
char conds[MAXCONDLEN];
struct {
char conds1[MAXCONDLEN_1];
char* conds2;
} l;
} c;
char* morphcode;
unsigned short* contclass;
short contclasslen;
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,314 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef CSUTIL_HXX_
#define CSUTIL_HXX_
#include "hunvisapi.h"
// First some base level utility routines
#include <fstream>
#include <string>
#include <vector>
#include <string.h>
#include "w_char.hxx"
#include "htypes.hxx"
#ifdef MOZILLA_CLIENT
#include "nscore.h" // for mozalloc headers
#endif
// casing
#define NOCAP 0
#define INITCAP 1
#define ALLCAP 2
#define HUHCAP 3
#define HUHINITCAP 4
// default encoding and keystring
#define SPELL_ENCODING "ISO8859-1"
#define SPELL_KEYSTRING "qwertyuiop|asdfghjkl|zxcvbnm"
// default morphological fields
#define MORPH_STEM "st:"
#define MORPH_ALLOMORPH "al:"
#define MORPH_POS "po:"
#define MORPH_DERI_PFX "dp:"
#define MORPH_INFL_PFX "ip:"
#define MORPH_TERM_PFX "tp:"
#define MORPH_DERI_SFX "ds:"
#define MORPH_INFL_SFX "is:"
#define MORPH_TERM_SFX "ts:"
#define MORPH_SURF_PFX "sp:"
#define MORPH_FREQ "fr:"
#define MORPH_PHON "ph:"
#define MORPH_HYPH "hy:"
#define MORPH_PART "pa:"
#define MORPH_FLAG "fl:"
#define MORPH_HENTRY "_H:"
#define MORPH_TAG_LEN strlen(MORPH_STEM)
#define MSEP_FLD ' '
#define MSEP_REC '\n'
#define MSEP_ALT '\v'
// default flags
#define DEFAULTFLAGS 65510
#define FORBIDDENWORD 65510
#define ONLYUPCASEFLAG 65511
// fix long pathname problem of WIN32 by using w_char std::fstream::open override
LIBHUNSPELL_DLL_EXPORTED void myopen(std::ifstream& stream, const char* path,
std::ios_base::openmode mode);
// convert UTF-16 characters to UTF-8
LIBHUNSPELL_DLL_EXPORTED std::string& u16_u8(std::string& dest,
const std::vector<w_char>& src);
// convert UTF-8 characters to UTF-16
LIBHUNSPELL_DLL_EXPORTED int u8_u16(std::vector<w_char>& dest,
const std::string& src);
// remove end of line char(s)
LIBHUNSPELL_DLL_EXPORTED void mychomp(std::string& s);
// duplicate string
LIBHUNSPELL_DLL_EXPORTED char* mystrdup(const char* s);
// parse into tokens with char delimiter
LIBHUNSPELL_DLL_EXPORTED std::string::const_iterator mystrsep(const std::string &str,
std::string::const_iterator& start);
// replace pat by rep in word and return word
LIBHUNSPELL_DLL_EXPORTED std::string& mystrrep(std::string& str,
const std::string& search,
const std::string& replace);
// append s to ends of every lines in text
LIBHUNSPELL_DLL_EXPORTED std::string& strlinecat(std::string& str,
const std::string& apd);
// tokenize into lines with new line
LIBHUNSPELL_DLL_EXPORTED std::vector<std::string> line_tok(const std::string& text,
char breakchar);
// tokenize into lines with new line and uniq in place
LIBHUNSPELL_DLL_EXPORTED void line_uniq(std::string& text, char breakchar);
LIBHUNSPELL_DLL_EXPORTED void line_uniq_app(std::string& text, char breakchar);
// reverse word
LIBHUNSPELL_DLL_EXPORTED size_t reverseword(std::string& word);
// reverse word
LIBHUNSPELL_DLL_EXPORTED size_t reverseword_utf(std::string&);
// remove duplicates
LIBHUNSPELL_DLL_EXPORTED void uniqlist(std::vector<std::string>& list);
// character encoding information
struct cs_info {
unsigned char ccase;
unsigned char clower;
unsigned char cupper;
};
LIBHUNSPELL_DLL_EXPORTED void initialize_utf_tbl();
LIBHUNSPELL_DLL_EXPORTED void free_utf_tbl();
LIBHUNSPELL_DLL_EXPORTED unsigned short unicodetoupper(unsigned short c,
int langnum);
LIBHUNSPELL_DLL_EXPORTED w_char upper_utf(w_char u, int langnum);
LIBHUNSPELL_DLL_EXPORTED w_char lower_utf(w_char u, int langnum);
LIBHUNSPELL_DLL_EXPORTED unsigned short unicodetolower(unsigned short c,
int langnum);
LIBHUNSPELL_DLL_EXPORTED int unicodeisalpha(unsigned short c);
LIBHUNSPELL_DLL_EXPORTED struct cs_info* get_current_cs(const std::string& es);
// get language identifiers of language codes
LIBHUNSPELL_DLL_EXPORTED int get_lang_num(const std::string& lang);
// get characters of the given 8bit encoding with lower- and uppercase forms
LIBHUNSPELL_DLL_EXPORTED std::string get_casechars(const char* enc);
// convert std::string to all caps
LIBHUNSPELL_DLL_EXPORTED std::string& mkallcap(std::string& s,
const struct cs_info* csconv);
// convert null terminated string to all little
LIBHUNSPELL_DLL_EXPORTED std::string& mkallsmall(std::string& s,
const struct cs_info* csconv);
// convert first letter of string to little
LIBHUNSPELL_DLL_EXPORTED std::string& mkinitsmall(std::string& s,
const struct cs_info* csconv);
// convert first letter of string to capital
LIBHUNSPELL_DLL_EXPORTED std::string& mkinitcap(std::string& s,
const struct cs_info* csconv);
// convert first letter of UTF-8 string to capital
LIBHUNSPELL_DLL_EXPORTED std::vector<w_char>&
mkinitcap_utf(std::vector<w_char>& u, int langnum);
// convert UTF-8 string to little
LIBHUNSPELL_DLL_EXPORTED std::vector<w_char>&
mkallsmall_utf(std::vector<w_char>& u, int langnum);
// convert first letter of UTF-8 string to little
LIBHUNSPELL_DLL_EXPORTED std::vector<w_char>&
mkinitsmall_utf(std::vector<w_char>& u, int langnum);
// convert UTF-8 string to capital
LIBHUNSPELL_DLL_EXPORTED std::vector<w_char>&
mkallcap_utf(std::vector<w_char>& u, int langnum);
// get type of capitalization
LIBHUNSPELL_DLL_EXPORTED int get_captype(const std::string& q, cs_info*);
// get type of capitalization (UTF-8)
LIBHUNSPELL_DLL_EXPORTED int get_captype_utf8(const std::vector<w_char>& q, int langnum);
// strip all ignored characters in the string
LIBHUNSPELL_DLL_EXPORTED size_t remove_ignored_chars_utf(
std::string& word,
const std::vector<w_char>& ignored_chars);
// strip all ignored characters in the string
LIBHUNSPELL_DLL_EXPORTED size_t remove_ignored_chars(
std::string& word,
const std::string& ignored_chars);
LIBHUNSPELL_DLL_EXPORTED bool parse_string(const std::string& line,
std::string& out,
int ln);
LIBHUNSPELL_DLL_EXPORTED bool parse_array(const std::string& line,
std::string& out,
std::vector<w_char>& out_utf16,
int utf8,
int ln);
LIBHUNSPELL_DLL_EXPORTED int fieldlen(const char* r);
LIBHUNSPELL_DLL_EXPORTED bool copy_field(std::string& dest,
const std::string& morph,
const std::string& var);
// conversion function for protected memory
LIBHUNSPELL_DLL_EXPORTED void store_pointer(char* dest, char* source);
// conversion function for protected memory
LIBHUNSPELL_DLL_EXPORTED char* get_stored_pointer(const char* s);
// hash entry macros
LIBHUNSPELL_DLL_EXPORTED inline char* HENTRY_DATA(struct hentry* h) {
char* ret;
if (!h->var)
ret = NULL;
else if (h->var & H_OPT_ALIASM)
ret = get_stored_pointer(HENTRY_WORD(h) + h->blen + 1);
else
ret = HENTRY_WORD(h) + h->blen + 1;
return ret;
}
LIBHUNSPELL_DLL_EXPORTED inline const char* HENTRY_DATA(
const struct hentry* h) {
const char* ret;
if (!h->var)
ret = NULL;
else if (h->var & H_OPT_ALIASM)
ret = get_stored_pointer(HENTRY_WORD(h) + h->blen + 1);
else
ret = HENTRY_WORD(h) + h->blen + 1;
return ret;
}
// NULL-free version for warning-free OOo build
LIBHUNSPELL_DLL_EXPORTED inline const char* HENTRY_DATA2(
const struct hentry* h) {
const char* ret;
if (!h->var)
ret = "";
else if (h->var & H_OPT_ALIASM)
ret = get_stored_pointer(HENTRY_WORD(h) + h->blen + 1);
else
ret = HENTRY_WORD(h) + h->blen + 1;
return ret;
}
LIBHUNSPELL_DLL_EXPORTED inline char* HENTRY_FIND(struct hentry* h,
const char* p) {
return (HENTRY_DATA(h) ? strstr(HENTRY_DATA(h), p) : NULL);
}
#endif

View File

@ -0,0 +1,117 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "filemgr.hxx"
#include "csutil.hxx"
int FileMgr::fail(const char* err, const char* par) {
fprintf(stderr, err, par);
return -1;
}
FileMgr::FileMgr(const char* file, const char* key) : hin(NULL), linenum(0) {
in[0] = '\0';
myopen(fin, file, std::ios_base::in);
if (!fin.is_open()) {
// check hzipped file
std::string st(file);
st.append(HZIP_EXTENSION);
hin = new Hunzip(st.c_str(), key);
}
if (!fin.is_open() && !hin->is_open())
fail(MSG_OPEN, file);
}
FileMgr::~FileMgr() {
delete hin;
}
bool FileMgr::getline(std::string& dest) {
bool ret = false;
++linenum;
if (fin.is_open()) {
ret = static_cast<bool>(std::getline(fin, dest));
} else if (hin->is_open()) {
ret = hin->getline(dest);
}
if (!ret) {
--linenum;
}
return ret;
}
int FileMgr::getlinenum() {
return linenum;
}

View File

@ -0,0 +1,98 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/* file manager class - read lines of files [filename] OR [filename.hz] */
#ifndef FILEMGR_HXX_
#define FILEMGR_HXX_
#include "hunzip.hxx"
#include <stdio.h>
#include <string>
#include <fstream>
class FileMgr {
private:
FileMgr(const FileMgr&);
FileMgr& operator=(const FileMgr&);
protected:
std::ifstream fin;
Hunzip* hin;
char in[BUFSIZE + 50]; // input buffer
int fail(const char* err, const char* par);
int linenum;
public:
FileMgr(const char* filename, const char* key = NULL);
~FileMgr();
bool getline(std::string&);
int getlinenum();
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,145 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef HASHMGR_HXX_
#define HASHMGR_HXX_
#include <stdio.h>
#include <string>
#include <vector>
#include "htypes.hxx"
#include "filemgr.hxx"
#include "w_char.hxx"
enum flag { FLAG_CHAR, FLAG_LONG, FLAG_NUM, FLAG_UNI };
class HashMgr {
int tablesize;
struct hentry** tableptr;
flag flag_mode;
int complexprefixes;
int utf8;
unsigned short forbiddenword;
int langnum;
std::string enc;
std::string lang;
struct cs_info* csconv;
std::string ignorechars;
std::vector<w_char> ignorechars_utf16;
int numaliasf; // flag vector `compression' with aliases
unsigned short** aliasf;
unsigned short* aliasflen;
int numaliasm; // morphological desciption `compression' with aliases
char** aliasm;
public:
HashMgr(const char* tpath, const char* apath, const char* key = NULL);
~HashMgr();
struct hentry* lookup(const char*) const;
int hash(const char*) const;
struct hentry* walk_hashtable(int& col, struct hentry* hp) const;
int add(const std::string& word);
int add_with_affix(const std::string& word, const std::string& pattern);
int remove(const std::string& word);
int decode_flags(unsigned short** result, const std::string& flags, FileMgr* af) const;
bool decode_flags(std::vector<unsigned short>& result, const std::string& flags, FileMgr* af) const;
unsigned short decode_flag(const char* flag) const;
char* encode_flag(unsigned short flag) const;
int is_aliasf() const;
int get_aliasf(int index, unsigned short** fvec, FileMgr* af) const;
int is_aliasm() const;
char* get_aliasm(int index) const;
private:
int get_clen_and_captype(const std::string& word, int* captype);
int get_clen_and_captype(const std::string& word, int* captype, std::vector<w_char> &workbuf);
int load_tables(const char* tpath, const char* key);
int add_word(const std::string& word,
int wcl,
unsigned short* ap,
int al,
const std::string* desc,
bool onlyupcase);
int load_config(const char* affpath, const char* key);
bool parse_aliasf(const std::string& line, FileMgr* af);
int add_hidden_capitalized_word(const std::string& word,
int wcl,
unsigned short* flags,
int al,
const std::string* dp,
int captype);
bool parse_aliasm(const std::string& line, FileMgr* af);
int remove_forbidden_flag(const std::string& word);
};
#endif

View File

@ -0,0 +1,68 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#ifndef HTYPES_HXX_
#define HTYPES_HXX_
#define ROTATE_LEN 5
#define ROTATE(v, q) \
(v) = ((v) << (q)) | (((v) >> (32 - q)) & ((1 << (q)) - 1));
// hentry options
#define H_OPT (1 << 0)
#define H_OPT_ALIASM (1 << 1)
#define H_OPT_PHON (1 << 2)
// see also csutil.hxx
#define HENTRY_WORD(h) &(h->word[0])
// approx. number of user defined words
#define USERWORD 1000
struct hentry {
unsigned char blen; // word length in bytes
unsigned char clen; // word length in characters (different for UTF-8 enc.)
short alen; // length of affix flag vector
unsigned short* astr; // affix flag vector
struct hentry* next; // next word with same hash code
struct hentry* next_homonym; // next homonym word (with same hash code)
char var; // variable fields (only for special pronounciation yet)
char word[1]; // variable-length word (8-bit or UTF-8 encoding)
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,162 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* The Original Code is Hunspell, based on MySpell.
*
* The Initial Developers of the Original Code are
* Kevin Hendricks (MySpell) and Németh László (Hunspell).
* Portions created by the Initial Developers are Copyright (C) 2002-2005
* the Initial Developers. All Rights Reserved.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#ifndef MYSPELLMGR_H_
#define MYSPELLMGR_H_
#include "hunvisapi.h"
#ifdef __cplusplus
extern "C" {
#endif
typedef struct Hunhandle Hunhandle;
LIBHUNSPELL_DLL_EXPORTED Hunhandle* Hunspell_create(const char* affpath,
const char* dpath);
LIBHUNSPELL_DLL_EXPORTED Hunhandle* Hunspell_create_key(const char* affpath,
const char* dpath,
const char* key);
LIBHUNSPELL_DLL_EXPORTED void Hunspell_destroy(Hunhandle* pHunspell);
/* load extra dictionaries (only dic files)
* output: 0 = additional dictionary slots available, 1 = slots are now full*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_add_dic(Hunhandle* pHunspell,
const char* dpath);
/* spell(word) - spellcheck word
* output: 0 = bad word, not 0 = good word
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_spell(Hunhandle* pHunspell, const char*);
LIBHUNSPELL_DLL_EXPORTED char* Hunspell_get_dic_encoding(Hunhandle* pHunspell);
/* suggest(suggestions, word) - search suggestions
* input: pointer to an array of strings pointer and the (bad) word
* array of strings pointer (here *slst) may not be initialized
* output: number of suggestions in string array, and suggestions in
* a newly allocated array of strings (*slts will be NULL when number
* of suggestion equals 0.)
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_suggest(Hunhandle* pHunspell,
char*** slst,
const char* word);
/* morphological functions */
/* analyze(result, word) - morphological analysis of the word */
LIBHUNSPELL_DLL_EXPORTED int Hunspell_analyze(Hunhandle* pHunspell,
char*** slst,
const char* word);
/* stem(result, word) - stemmer function */
LIBHUNSPELL_DLL_EXPORTED int Hunspell_stem(Hunhandle* pHunspell,
char*** slst,
const char* word);
/* stem(result, analysis, n) - get stems from a morph. analysis
* example:
* char ** result, result2;
* int n1 = Hunspell_analyze(result, "words");
* int n2 = Hunspell_stem2(result2, result, n1);
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_stem2(Hunhandle* pHunspell,
char*** slst,
char** desc,
int n);
/* generate(result, word, word2) - morphological generation by example(s) */
LIBHUNSPELL_DLL_EXPORTED int Hunspell_generate(Hunhandle* pHunspell,
char*** slst,
const char* word,
const char* word2);
/* generate(result, word, desc, n) - generation by morph. description(s)
* example:
* char ** result;
* char * affix = "is:plural"; // description depends from dictionaries, too
* int n = Hunspell_generate2(result, "word", &affix, 1);
* for (int i = 0; i < n; i++) printf("%s\n", result[i]);
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_generate2(Hunhandle* pHunspell,
char*** slst,
const char* word,
char** desc,
int n);
/* functions for run-time modification of the dictionary */
/* add word to the run-time dictionary */
LIBHUNSPELL_DLL_EXPORTED int Hunspell_add(Hunhandle* pHunspell,
const char* word);
/* add word to the run-time dictionary with affix flags of
* the example (a dictionary word): Hunspell will recognize
* affixed forms of the new word, too.
*/
LIBHUNSPELL_DLL_EXPORTED int Hunspell_add_with_affix(Hunhandle* pHunspell,
const char* word,
const char* example);
/* remove word from the run-time dictionary */
LIBHUNSPELL_DLL_EXPORTED int Hunspell_remove(Hunhandle* pHunspell,
const char* word);
/* free suggestion lists */
LIBHUNSPELL_DLL_EXPORTED void Hunspell_free_list(Hunhandle* pHunspell,
char*** slst,
int n);
#ifdef __cplusplus
}
#endif
#endif

View File

@ -0,0 +1,229 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef MYSPELLMGR_HXX_
#define MYSPELLMGR_HXX_
#include "hunvisapi.h"
#include "w_char.hxx"
#include "atypes.hxx"
#include <string>
#include <vector>
#define SPELL_XML "<?xml?>"
#define MAXSUGGESTION 15
#define MAXSHARPS 5
#ifndef MAXWORDLEN
#define MAXWORDLEN 100
#endif
#if defined __GNUC__ && (__GNUC__ > 3 || (__GNUC__ == 3 && __GNUC_MINOR__ >= 1))
# define H_DEPRECATED __attribute__((__deprecated__))
#elif defined(_MSC_VER) && (_MSC_VER >= 1300)
# define H_DEPRECATED __declspec(deprecated)
#else
# define H_DEPRECATED
#endif
class HunspellImpl;
class LIBHUNSPELL_DLL_EXPORTED Hunspell {
private:
Hunspell(const Hunspell&);
Hunspell& operator=(const Hunspell&);
private:
HunspellImpl* m_Impl;
public:
/* Hunspell(aff, dic) - constructor of Hunspell class
* input: path of affix file and dictionary file
*
* In WIN32 environment, use UTF-8 encoded paths started with the long path
* prefix \\\\?\\ to handle system-independent character encoding and very
* long path names (without the long path prefix Hunspell will use fopen()
* with system-dependent character encoding instead of _wfopen()).
*/
Hunspell(const char* affpath, const char* dpath, const char* key = NULL);
~Hunspell();
/* load extra dictionaries (only dic files) */
int add_dic(const char* dpath, const char* key = NULL);
/* spell(word) - spellcheck word
* output: false = bad word, true = good word
*
* plus output:
* info: information bit array, fields:
* SPELL_COMPOUND = a compound word
* SPELL_FORBIDDEN = an explicit forbidden word
* root: root (stem), when input is a word with affix(es)
*/
bool spell(const std::string& word, int* info = NULL, std::string* root = NULL);
H_DEPRECATED int spell(const char* word, int* info = NULL, char** root = NULL);
/* suggest(suggestions, word) - search suggestions
* input: pointer to an array of strings pointer and the (bad) word
* array of strings pointer (here *slst) may not be initialized
* output: number of suggestions in string array, and suggestions in
* a newly allocated array of strings (*slts will be NULL when number
* of suggestion equals 0.)
*/
std::vector<std::string> suggest(const std::string& word);
H_DEPRECATED int suggest(char*** slst, const char* word);
/* Suggest words from suffix rules
* suffix_suggest(suggestions, root_word)
* input: pointer to an array of strings pointer and the word
* array of strings pointer (here *slst) may not be initialized
* output: number of suggestions in string array, and suggestions in
* a newly allocated array of strings (*slts will be NULL when number
* of suggestion equals 0.)
*/
std::vector<std::string> suffix_suggest(const std::string& root_word);
H_DEPRECATED int suffix_suggest(char*** slst, const char* root_word);
/* deallocate suggestion lists */
H_DEPRECATED void free_list(char*** slst, int n);
const std::string& get_dict_encoding() const;
char* get_dic_encoding();
/* morphological functions */
/* analyze(result, word) - morphological analysis of the word */
std::vector<std::string> analyze(const std::string& word);
H_DEPRECATED int analyze(char*** slst, const char* word);
/* stem(word) - stemmer function */
std::vector<std::string> stem(const std::string& word);
H_DEPRECATED int stem(char*** slst, const char* word);
/* stem(analysis, n) - get stems from a morph. analysis
* example:
* char ** result, result2;
* int n1 = analyze(&result, "words");
* int n2 = stem(&result2, result, n1);
*/
std::vector<std::string> stem(const std::vector<std::string>& morph);
H_DEPRECATED int stem(char*** slst, char** morph, int n);
/* generate(result, word, word2) - morphological generation by example(s) */
std::vector<std::string> generate(const std::string& word, const std::string& word2);
H_DEPRECATED int generate(char*** slst, const char* word, const char* word2);
/* generate(result, word, desc, n) - generation by morph. description(s)
* example:
* char ** result;
* char * affix = "is:plural"; // description depends from dictionaries, too
* int n = generate(&result, "word", &affix, 1);
* for (int i = 0; i < n; i++) printf("%s\n", result[i]);
*/
std::vector<std::string> generate(const std::string& word, const std::vector<std::string>& pl);
H_DEPRECATED int generate(char*** slst, const char* word, char** desc, int n);
/* functions for run-time modification of the dictionary */
/* add word to the run-time dictionary */
int add(const std::string& word);
/* add word to the run-time dictionary with affix flags of
* the example (a dictionary word): Hunspell will recognize
* affixed forms of the new word, too.
*/
int add_with_affix(const std::string& word, const std::string& example);
/* remove word from the run-time dictionary */
int remove(const std::string& word);
/* other */
/* get extra word characters definied in affix file for tokenization */
const char* get_wordchars() const;
const std::string& get_wordchars_cpp() const;
const std::vector<w_char>& get_wordchars_utf16() const;
struct cs_info* get_csconv();
const char* get_version() const;
const std::string& get_version_cpp() const;
int get_langnum() const;
/* need for putdic */
bool input_conv(const std::string& word, std::string& dest);
H_DEPRECATED int input_conv(const char* word, char* dest, size_t destsize);
};
#endif

View File

@ -1,5 +1,5 @@
#ifndef _HUNSPELL_VISIBILITY_H_ #ifndef HUNSPELL_VISIBILITY_H_
#define _HUNSPELL_VISIBILITY_H_ #define HUNSPELL_VISIBILITY_H_
#if defined(HUNSPELL_STATIC) #if defined(HUNSPELL_STATIC)
# define LIBHUNSPELL_DLL_EXPORTED # define LIBHUNSPELL_DLL_EXPORTED

View File

@ -1,5 +1,5 @@
#ifndef _HUNSPELL_VISIBILITY_H_ #ifndef HUNSPELL_VISIBILITY_H_
#define _HUNSPELL_VISIBILITY_H_ #define HUNSPELL_VISIBILITY_H_
#if defined(HUNSPELL_STATIC) #if defined(HUNSPELL_STATIC)
# define LIBHUNSPELL_DLL_EXPORTED # define LIBHUNSPELL_DLL_EXPORTED

View File

@ -0,0 +1,256 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include "hunzip.hxx"
#include "csutil.hxx"
#define CODELEN 65536
#define BASEBITREC 5000
#define UNCOMPRESSED '\002'
#define MAGIC "hz0"
#define MAGIC_ENCRYPT "hz1"
#define MAGICLEN (sizeof(MAGIC) - 1)
int Hunzip::fail(const char* err, const char* par) {
fprintf(stderr, err, par);
return -1;
}
Hunzip::Hunzip(const char* file, const char* key)
: bufsiz(0), lastbit(0), inc(0), inbits(0), outc(0) {
in[0] = out[0] = line[0] = '\0';
filename = mystrdup(file);
if (getcode(key) == -1)
bufsiz = -1;
else
bufsiz = getbuf();
}
int Hunzip::getcode(const char* key) {
unsigned char c[2];
int i, j, n;
int allocatedbit = BASEBITREC;
const char* enc = key;
if (!filename)
return -1;
myopen(fin, filename, std::ios_base::in | std::ios_base::binary);
if (!fin.is_open())
return -1;
// read magic number
if (!fin.read(in, 3) ||
!(strncmp(MAGIC, in, MAGICLEN) == 0 ||
strncmp(MAGIC_ENCRYPT, in, MAGICLEN) == 0)) {
return fail(MSG_FORMAT, filename);
}
// check encryption
if (strncmp(MAGIC_ENCRYPT, in, MAGICLEN) == 0) {
unsigned char cs;
if (!key)
return fail(MSG_KEY, filename);
if (!fin.read(reinterpret_cast<char*>(c), 1))
return fail(MSG_FORMAT, filename);
for (cs = 0; *enc; enc++)
cs ^= *enc;
if (cs != c[0])
return fail(MSG_KEY, filename);
enc = key;
} else
key = NULL;
// read record count
if (!fin.read(reinterpret_cast<char*>(c), 2))
return fail(MSG_FORMAT, filename);
if (key) {
c[0] ^= *enc;
if (*(++enc) == '\0')
enc = key;
c[1] ^= *enc;
}
n = ((int)c[0] << 8) + c[1];
dec.resize(BASEBITREC);
dec[0].v[0] = 0;
dec[0].v[1] = 0;
// read codes
for (i = 0; i < n; i++) {
unsigned char l;
if (!fin.read(reinterpret_cast<char*>(c), 2))
return fail(MSG_FORMAT, filename);
if (key) {
if (*(++enc) == '\0')
enc = key;
c[0] ^= *enc;
if (*(++enc) == '\0')
enc = key;
c[1] ^= *enc;
}
if (!fin.read(reinterpret_cast<char*>(&l), 1))
return fail(MSG_FORMAT, filename);
if (key) {
if (*(++enc) == '\0')
enc = key;
l ^= *enc;
}
if (!fin.read(in, l / 8 + 1))
return fail(MSG_FORMAT, filename);
if (key)
for (j = 0; j <= l / 8; j++) {
if (*(++enc) == '\0')
enc = key;
in[j] ^= *enc;
}
int p = 0;
for (j = 0; j < l; j++) {
int b = (in[j / 8] & (1 << (7 - (j % 8)))) ? 1 : 0;
int oldp = p;
p = dec[p].v[b];
if (p == 0) {
lastbit++;
if (lastbit == allocatedbit) {
allocatedbit += BASEBITREC;
dec.resize(allocatedbit);
}
dec[lastbit].v[0] = 0;
dec[lastbit].v[1] = 0;
dec[oldp].v[b] = lastbit;
p = lastbit;
}
}
dec[p].c[0] = c[0];
dec[p].c[1] = c[1];
}
return 0;
}
Hunzip::~Hunzip() {
if (filename)
free(filename);
}
int Hunzip::getbuf() {
int p = 0;
int o = 0;
do {
if (inc == 0) {
fin.read(in, BUFSIZE);
inbits = fin.gcount() * 8;
}
for (; inc < inbits; inc++) {
int b = (in[inc / 8] & (1 << (7 - (inc % 8)))) ? 1 : 0;
int oldp = p;
p = dec[p].v[b];
if (p == 0) {
if (oldp == lastbit) {
fin.close();
// add last odd byte
if (dec[lastbit].c[0])
out[o++] = dec[lastbit].c[1];
return o;
}
out[o++] = dec[oldp].c[0];
out[o++] = dec[oldp].c[1];
if (o == BUFSIZE)
return o;
p = dec[p].v[b];
}
}
inc = 0;
} while (inbits == BUFSIZE * 8);
return fail(MSG_FORMAT, filename);
}
bool Hunzip::getline(std::string& dest) {
char linebuf[BUFSIZE];
int l = 0, eol = 0, left = 0, right = 0;
if (bufsiz == -1)
return false;
while (l < bufsiz && !eol) {
linebuf[l++] = out[outc];
switch (out[outc]) {
case '\t':
break;
case 31: { // escape
if (++outc == bufsiz) {
bufsiz = getbuf();
outc = 0;
}
linebuf[l - 1] = out[outc];
break;
}
case ' ':
break;
default:
if (((unsigned char)out[outc]) < 47) {
if (out[outc] > 32) {
right = out[outc] - 31;
if (++outc == bufsiz) {
bufsiz = getbuf();
outc = 0;
}
}
if (out[outc] == 30)
left = 9;
else
left = out[outc];
linebuf[l - 1] = '\n';
eol = 1;
}
}
if (++outc == bufsiz) {
outc = 0;
bufsiz = fin.is_open() ? getbuf() : -1;
}
}
if (right)
strcpy(linebuf + l - 1, line + strlen(line) - right - 1);
else
linebuf[l] = '\0';
strcpy(line + left, linebuf);
dest.assign(line);
return true;
}

View File

@ -0,0 +1,87 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/* hunzip: file decompression for sorted dictionaries with optional encryption,
* algorithm: prefix-suffix encoding and 16-bit Huffman encoding */
#ifndef HUNZIP_HXX_
#define HUNZIP_HXX_
#include "hunvisapi.h"
#include <stdio.h>
#include <fstream>
#include <vector>
#define BUFSIZE 65536
#define HZIP_EXTENSION ".hz"
#define MSG_OPEN "error: %s: cannot open\n"
#define MSG_FORMAT "error: %s: not in hzip format\n"
#define MSG_MEMORY "error: %s: missing memory\n"
#define MSG_KEY "error: %s: missing or bad password\n"
struct bit {
unsigned char c[2];
int v[2];
};
class LIBHUNSPELL_DLL_EXPORTED Hunzip {
private:
Hunzip(const Hunzip&);
Hunzip& operator=(const Hunzip&);
protected:
char* filename;
std::ifstream fin;
int bufsiz, lastbit, inc, inbits, outc;
std::vector<bit> dec; // code table
char in[BUFSIZE]; // input buffer
char out[BUFSIZE + 1]; // Huffman-decoded buffer
char line[BUFSIZE + 50]; // decoded line
int getcode(const char* key);
int getbuf();
int fail(const char* err, const char* par);
public:
Hunzip(const char* filename, const char* key = NULL);
~Hunzip();
bool is_open() { return fin.is_open(); }
bool getline(std::string& dest);
};
#endif

View File

@ -0,0 +1,75 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#ifndef LANGNUM_HXX_
#define LANGNUM_HXX_
/*
language numbers for language specific codes
see https://wiki.openoffice.org/w/index.php?title=Languages&oldid=230199
*/
enum {
LANG_ar = 96,
LANG_az = 100, // custom number
LANG_bg = 41,
LANG_ca = 37,
LANG_cs = 42,
LANG_da = 45,
LANG_de = 49,
LANG_el = 30,
LANG_en = 01,
LANG_es = 34,
LANG_eu = 10,
LANG_fr = 02,
LANG_gl = 38,
LANG_hr = 78,
LANG_hu = 36,
LANG_it = 39,
LANG_la = 99, // custom number
LANG_lv = 101, // custom number
LANG_nl = 31,
LANG_pl = 48,
LANG_pt = 03,
LANG_ru = 07,
LANG_sv = 50,
LANG_tr = 90,
LANG_uk = 80,
LANG_xx = 999
};
#endif

View File

@ -0,0 +1,270 @@
/* phonetic.c - generic replacement aglogithms for phonetic transformation
Copyright (C) 2000 Bjoern Jacke
This library is free software; you can redistribute it and/or
modify it under the terms of the GNU Lesser General Public
License version 2.1 as published by the Free Software Foundation;
This library is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
Lesser General Public License for more details.
You should have received a copy of the GNU Lesser General Public
License along with this library; If not, see
<http://www.gnu.org/licenses/>.
Changelog:
2000-01-05 Bjoern Jacke <bjoern at j3e.de>
Initial Release insprired by the article about phonetic
transformations out of c't 25/1999
2007-07-26 Bjoern Jacke <bjoern at j3e.de>
Released under MPL/GPL/LGPL tri-license for Hunspell
2007-08-23 Laszlo Nemeth <nemeth at OOo>
Porting from Aspell to Hunspell using C-like structs
*/
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <ctype.h>
#include "csutil.hxx"
#include "phonet.hxx"
void init_phonet_hash(phonetable& parms) {
for (int i = 0; i < HASHSIZE; i++) {
parms.hash[i] = -1;
}
for (int i = 0; parms.rules[i][0] != '\0'; i += 2) {
/** set hash value **/
int k = (unsigned char)parms.rules[i][0];
if (parms.hash[k] < 0) {
parms.hash[k] = i;
}
}
}
// like strcpy but safe if the strings overlap
// but only if dest < src
static inline void strmove(char* dest, char* src) {
while (*src)
*dest++ = *src++;
*dest = '\0';
}
static int myisalpha(char ch) {
if ((unsigned char)ch < 128)
return isalpha(ch);
return 1;
}
/* Do phonetic transformation. */
/* phonetic transcription algorithm */
/* see: http://aspell.net/man-html/Phonetic-Code.html */
/* convert string to uppercase before this call */
std::string phonet(const std::string& inword, phonetable& parms) {
int i, k = 0, p, z;
int k0, n0, p0 = -333;
char c;
typedef unsigned char uchar;
size_t len = inword.size();
if (len > MAXPHONETUTF8LEN)
return std::string();
char word[MAXPHONETUTF8LEN + 1];
strncpy(word, inword.c_str(), MAXPHONETUTF8LEN);
word[MAXPHONETUTF8LEN] = '\0';
std::string target;
/** check word **/
i = z = 0;
while ((c = word[i]) != '\0') {
int n = parms.hash[(uchar)c];
int z0 = 0;
if (n >= 0 && !parms.rules[n].empty()) {
/** check all rules for the same letter **/
while (parms.rules[n][0] == c) {
/** check whole string **/
k = 1; /** number of found letters **/
p = 5; /** default priority **/
const char*s = parms.rules[n].c_str();
s++; /** important for (see below) "*(s-1)" **/
while (*s != '\0' && word[i + k] == *s && !isdigit((unsigned char)*s) &&
strchr("(-<^$", *s) == NULL) {
k++;
s++;
}
if (*s == '(') {
/** check letters in "(..)" **/
if (myisalpha(word[i + k]) // ...could be implied?
&& strchr(s + 1, word[i + k]) != NULL) {
k++;
while (*s != ')')
s++;
s++;
}
}
p0 = (int)*s;
k0 = k;
while (*s == '-' && k > 1) {
k--;
s++;
}
if (*s == '<')
s++;
if (isdigit((unsigned char)*s)) {
/** determine priority **/
p = *s - '0';
s++;
}
if (*s == '^' && *(s + 1) == '^')
s++;
if (*s == '\0' || (*s == '^' && (i == 0 || !myisalpha(word[i - 1])) &&
(*(s + 1) != '$' || (!myisalpha(word[i + k0])))) ||
(*s == '$' && i > 0 && myisalpha(word[i - 1]) &&
(!myisalpha(word[i + k0])))) {
/** search for followup rules, if: **/
/** parms.followup and k > 1 and NO '-' in searchstring **/
char c0 = word[i + k - 1];
n0 = parms.hash[(uchar)c0];
// if (parms.followup && k > 1 && n0 >= 0
if (k > 1 && n0 >= 0 && p0 != (int)'-' && word[i + k] != '\0' && !parms.rules[n0].empty()) {
/** test follow-up rule for "word[i+k]" **/
while (parms.rules[n0][0] == c0) {
/** check whole string **/
k0 = k;
p0 = 5;
s = parms.rules[n0].c_str();
s++;
while (*s != '\0' && word[i + k0] == *s &&
!isdigit((unsigned char)*s) &&
strchr("(-<^$", *s) == NULL) {
k0++;
s++;
}
if (*s == '(') {
/** check letters **/
if (myisalpha(word[i + k0]) &&
strchr(s + 1, word[i + k0]) != NULL) {
k0++;
while (*s != ')' && *s != '\0')
s++;
if (*s == ')')
s++;
}
}
while (*s == '-') {
/** "k0" gets NOT reduced **/
/** because "if (k0 == k)" **/
s++;
}
if (*s == '<')
s++;
if (isdigit((unsigned char)*s)) {
p0 = *s - '0';
s++;
}
if (*s == '\0'
/** *s == '^' cuts **/
|| (*s == '$' && !myisalpha(word[i + k0]))) {
if (k0 == k) {
/** this is just a piece of the string **/
n0 += 2;
continue;
}
if (p0 < p) {
/** priority too low **/
n0 += 2;
continue;
}
/** rule fits; stop search **/
break;
}
n0 += 2;
} /** End of "while (parms.rules[n0][0] == c0)" **/
if (p0 >= p && parms.rules[n0][0] == c0) {
n += 2;
continue;
}
} /** end of follow-up stuff **/
/** replace string **/
s = parms.rules[n + 1].c_str();
p0 = (!parms.rules[n].empty() &&
strchr(parms.rules[n].c_str() + 1, '<') != NULL)
? 1
: 0;
if (p0 == 1 && z == 0) {
/** rule with '<' is used **/
if (!target.empty() && *s != '\0' &&
(target[target.size()-1] == c || target[target.size()-1] == *s)) {
target.erase(target.size() - 1);
}
z0 = 1;
z = 1;
k0 = 0;
while (*s != '\0' && word[i + k0] != '\0') {
word[i + k0] = *s;
k0++;
s++;
}
if (k > k0)
strmove(&word[0] + i + k0, &word[0] + i + k);
/** new "actual letter" **/
c = word[i];
} else { /** no '<' rule used **/
i += k - 1;
z = 0;
while (*s != '\0' && *(s + 1) != '\0' && target.size() < len) {
if (target.empty() || target[target.size()-1] != *s) {
target.push_back(*s);
}
s++;
}
/** new "actual letter" **/
c = *s;
if (!parms.rules[n].empty() &&
strstr(parms.rules[n].c_str() + 1, "^^") != NULL) {
if (c != '\0') {
target.push_back(c);
}
strmove(&word[0], &word[0] + i + 1);
i = 0;
z0 = 1;
}
}
break;
} /** end of follow-up stuff **/
n += 2;
} /** end of while (parms.rules[n][0] == c) **/
} /** end of if (n >= 0) **/
if (z0 == 0) {
if (k && !p0 && target.size() < len && c != '\0') {
/** condense only double letters **/
target.push_back(c);
/// printf("\n setting \n");
}
i++;
z = 0;
k = 0;
}
} /** end of while ((c = word[i]) != '\0') **/
return target;
} /** end of function "phonet" **/

View File

@ -27,8 +27,8 @@
Porting from Aspell to Hunspell using C-like structs Porting from Aspell to Hunspell using C-like structs
*/ */
#ifndef __PHONETHXX__ #ifndef PHONET_HXX_
#define __PHONETHXX__ #define PHONET_HXX_
#define HASHSIZE 256 #define HASHSIZE 256
#define MAXPHONETLEN 256 #define MAXPHONETLEN 256
@ -38,15 +38,13 @@
struct phonetable { struct phonetable {
char utf8; char utf8;
cs_info * lang; std::vector<std::string> rules;
int num;
char * * rules;
int hash[HASHSIZE]; int hash[HASHSIZE];
}; };
LIBHUNSPELL_DLL_EXPORTED void init_phonet_hash(phonetable& parms); LIBHUNSPELL_DLL_EXPORTED void init_phonet_hash(phonetable& parms);
LIBHUNSPELL_DLL_EXPORTED int phonet (const char * inword, char * target, LIBHUNSPELL_DLL_EXPORTED std::string phonet(const std::string& inword,
int len, phonetable & phone); phonetable& phone);
#endif #endif

View File

@ -0,0 +1,196 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <limits>
#include "replist.hxx"
#include "csutil.hxx"
RepList::RepList(int n) {
dat = (replentry**)malloc(sizeof(replentry*) * n);
if (dat == 0)
size = 0;
else
size = n;
pos = 0;
}
RepList::~RepList() {
for (int i = 0; i < pos; i++) {
delete dat[i];
}
free(dat);
}
replentry* RepList::item(int n) {
return dat[n];
}
int RepList::find(const char* word) {
int p1 = 0;
int p2 = pos - 1;
int ret = -1;
while (p1 <= p2) {
int m = ((unsigned)p1 + (unsigned)p2) >> 1;
int c = strncmp(word, dat[m]->pattern.c_str(), dat[m]->pattern.size());
if (c < 0)
p2 = m - 1;
else if (c > 0)
p1 = m + 1;
else { // scan in the right half for a longer match
ret = m;
p1 = m + 1;
}
}
return ret;
}
std::string RepList::replace(const char* word, int ind, bool atstart) {
int type = atstart ? 1 : 0;
if (ind < 0)
return std::string();
if (strlen(word) == dat[ind]->pattern.size())
type = atstart ? 3 : 2;
while (type && dat[ind]->outstrings[type].empty())
type = (type == 2 && !atstart) ? 0 : type - 1;
return dat[ind]->outstrings[type];
}
int RepList::add(const std::string& in_pat1, const std::string& pat2) {
if (pos >= size || in_pat1.empty() || pat2.empty()) {
return 1;
}
// analyse word context
int type = 0;
std::string pat1(in_pat1);
if (pat1[0] == '_') {
pat1.erase(0, 1);
type = 1;
}
if (!pat1.empty() && pat1[pat1.size() - 1] == '_') {
type = type + 2;
pat1.erase(pat1.size() - 1);
}
mystrrep(pat1, "_", " ");
// find existing entry
int m = find(pat1.c_str());
if (m >= 0 && dat[m]->pattern == pat1) {
// since already used
dat[m]->outstrings[type] = pat2;
mystrrep(dat[m]->outstrings[type], "_", " ");
return 0;
}
// make a new entry if none exists
replentry* r = new replentry;
if (r == NULL)
return 1;
r->pattern = pat1;
r->outstrings[type] = pat2;
mystrrep(r->outstrings[type], "_", " ");
dat[pos++] = r;
// sort to the right place in the list
int i;
for (i = pos - 1; i > 0; i--) {
if (strcmp(r->pattern.c_str(), dat[i - 1]->pattern.c_str()) < 0) {
dat[i] = dat[i - 1];
} else
break;
}
dat[i] = r;
return 0;
}
bool RepList::conv(const std::string& in_word, std::string& dest) {
dest.clear();
size_t wordlen = in_word.size();
const char* word = in_word.c_str();
bool change = false;
for (size_t i = 0; i < wordlen; ++i) {
int n = find(word + i);
std::string l = replace(word + i, n, i == 0);
if (!l.empty()) {
dest.append(l);
i += dat[n]->pattern.size() - 1;
change = true;
} else {
dest.push_back(word[i]);
}
}
return change;
}

View File

@ -0,0 +1,100 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
/* string replacement list class */
#ifndef REPLIST_HXX_
#define REPLIST_HXX_
#include "w_char.hxx"
#include <string>
#include <vector>
class RepList {
private:
RepList(const RepList&);
RepList& operator=(const RepList&);
protected:
replentry** dat;
int size;
int pos;
public:
explicit RepList(int n);
~RepList();
int add(const std::string& pat1, const std::string& pat2);
replentry* item(int n);
int find(const char* word);
std::string replace(const char* word, int n, bool atstart);
bool conv(const std::string& word, std::string& dest);
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,188 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
/*
* Copyright 2002 Kevin B. Hendricks, Stratford, Ontario, Canada
* And Contributors. All rights reserved.
*
* Redistribution and use in source and binary forms, with or without
* modification, are permitted provided that the following conditions
* are met:
*
* 1. Redistributions of source code must retain the above copyright
* notice, this list of conditions and the following disclaimer.
*
* 2. Redistributions in binary form must reproduce the above copyright
* notice, this list of conditions and the following disclaimer in the
* documentation and/or other materials provided with the distribution.
*
* 3. All modifications to the source code must be clearly marked as
* such. Binary redistributions based on modified source code
* must be clearly marked as modified versions in the documentation
* and/or other materials provided with the distribution.
*
* THIS SOFTWARE IS PROVIDED BY KEVIN B. HENDRICKS AND CONTRIBUTORS
* ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
* LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS
* FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL
* KEVIN B. HENDRICKS OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT,
* INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING,
* BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
* LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
* HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
* LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
* OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
* SUCH DAMAGE.
*/
#ifndef SUGGESTMGR_HXX_
#define SUGGESTMGR_HXX_
#define MAX_ROOTS 100
#define MAX_WORDS 100
#define MAX_GUESS 200
#define MAXNGRAMSUGS 4
#define MAXPHONSUGS 2
#define MAXCOMPOUNDSUGS 3
// timelimit: max ~1/4 sec (process time on Linux) for a time consuming function
#define TIMELIMIT (CLOCKS_PER_SEC >> 2)
#define MINTIMER 100
#define MAXPLUSTIMER 100
#define NGRAM_LONGER_WORSE (1 << 0)
#define NGRAM_ANY_MISMATCH (1 << 1)
#define NGRAM_LOWERING (1 << 2)
#define NGRAM_WEIGHTED (1 << 3)
#include "atypes.hxx"
#include "affixmgr.hxx"
#include "hashmgr.hxx"
#include "langnum.hxx"
#include <time.h>
enum { LCS_UP, LCS_LEFT, LCS_UPLEFT };
class SuggestMgr {
private:
SuggestMgr(const SuggestMgr&);
SuggestMgr& operator=(const SuggestMgr&);
private:
char* ckey;
size_t ckeyl;
std::vector<w_char> ckey_utf;
char* ctry;
size_t ctryl;
std::vector<w_char> ctry_utf;
AffixMgr* pAMgr;
unsigned int maxSug;
struct cs_info* csconv;
int utf8;
int langnum;
int nosplitsugs;
int maxngramsugs;
int maxcpdsugs;
int complexprefixes;
public:
SuggestMgr(const char* tryme, unsigned int maxn, AffixMgr* aptr);
~SuggestMgr();
void suggest(std::vector<std::string>& slst, const char* word, int* onlycmpdsug);
void ngsuggest(std::vector<std::string>& slst, const char* word, const std::vector<HashMgr*>& rHMgr);
std::string suggest_morph(const std::string& word);
std::string suggest_gen(const std::vector<std::string>& pl, const std::string& pattern);
private:
void testsug(std::vector<std::string>& wlst,
const std::string& candidate,
int cpdsuggest,
int* timer,
clock_t* timelimit);
int checkword(const std::string& word, int, int*, clock_t*);
int check_forbidden(const char*, int);
void capchars(std::vector<std::string>&, const char*, int);
int replchars(std::vector<std::string>&, const char*, int);
int doubletwochars(std::vector<std::string>&, const char*, int);
int forgotchar(std::vector<std::string>&, const char*, int);
int swapchar(std::vector<std::string>&, const char*, int);
int longswapchar(std::vector<std::string>&, const char*, int);
int movechar(std::vector<std::string>&, const char*, int);
int extrachar(std::vector<std::string>&, const char*, int);
int badcharkey(std::vector<std::string>&, const char*, int);
int badchar(std::vector<std::string>&, const char*, int);
int twowords(std::vector<std::string>&, const char*, int);
void capchars_utf(std::vector<std::string>&, const w_char*, int wl, int);
int doubletwochars_utf(std::vector<std::string>&, const w_char*, int wl, int);
int forgotchar_utf(std::vector<std::string>&, const w_char*, int wl, int);
int extrachar_utf(std::vector<std::string>&, const w_char*, int wl, int);
int badcharkey_utf(std::vector<std::string>&, const w_char*, int wl, int);
int badchar_utf(std::vector<std::string>&, const w_char*, int wl, int);
int swapchar_utf(std::vector<std::string>&, const w_char*, int wl, int);
int longswapchar_utf(std::vector<std::string>&, const w_char*, int, int);
int movechar_utf(std::vector<std::string>&, const w_char*, int, int);
int mapchars(std::vector<std::string>&, const char*, int);
int map_related(const char*,
std::string&,
int,
std::vector<std::string>& wlst,
int,
const std::vector<mapentry>&,
int*,
clock_t*);
int ngram(int n, const std::vector<w_char>& su1,
const std::vector<w_char>& su2, int opt);
int ngram(int n, const std::string& s1, const std::string& s2, int opt);
int mystrlen(const char* word);
int leftcommonsubstring(const std::vector<w_char>& su1,
const std::vector<w_char>& su2);
int leftcommonsubstring(const char* s1, const char* s2);
int commoncharacterpositions(const char* s1, const char* s2, int* is_swap);
void bubblesort(char** rwd, char** rwd2, int* rsc, int n);
void lcs(const char* s, const char* s2, int* l1, int* l2, char** result);
int lcslen(const char* s, const char* s2);
int lcslen(const std::string& s, const std::string& s2);
std::string suggest_hentry_gen(hentry* rv, const char* pattern);
};
#endif

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,72 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#ifndef W_CHAR_HXX_
#define W_CHAR_HXX_
#include <string>
#ifndef GCC
struct w_char {
#else
struct __attribute__((packed)) w_char {
#endif
unsigned char l;
unsigned char h;
friend bool operator<(const w_char a, const w_char b) {
unsigned short a_idx = (a.h << 8) + a.l;
unsigned short b_idx = (b.h << 8) + b.l;
return a_idx < b_idx;
}
friend bool operator==(const w_char a, const w_char b) {
return (((a).l == (b).l) && ((a).h == (b).h));
}
friend bool operator!=(const w_char a, const w_char b) {
return !(a == b);;
}
};
// two character arrays
struct replentry {
std::string pattern;
std::string outstrings[4]; // med, ini, fin, isol
};
#endif

View File

@ -0,0 +1 @@
testparser

View File

@ -0,0 +1,18 @@
AM_CPPFLAGS=-I${top_builddir}/src/hunspell
noinst_LIBRARIES=libparsers.a
libparsers_a_SOURCES=firstparser.cxx xmlparser.cxx \
latexparser.cxx manparser.cxx \
textparser.cxx htmlparser.cxx \
odfparser.cxx
noinst_PROGRAMS=testparser
testparser_SOURCES=firstparser.cxx firstparser.hxx xmlparser.cxx \
xmlparser.hxx latexparser.cxx latexparser.hxx \
manparser.cxx manparser.hxx testparser.cxx \
textparser.cxx textparser.hxx htmlparser.cxx \
htmlparser.hxx odfparser.hxx odfparser.cxx
# need mystrdup()
LDADD = ../hunspell/libhunspell-1.6.la

View File

@ -0,0 +1,65 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "firstparser.hxx"
#ifndef W32
using namespace std;
#endif
FirstParser::FirstParser(const char* wordchars)
: TextParser(wordchars) {
}
FirstParser::~FirstParser() {}
bool FirstParser::next_token(std::string& t) {
t.clear();
const size_t tabpos = line[actual].find('\t');
if (tabpos != std::string::npos && tabpos > token) {
token = tabpos;
t = line[actual].substr(0, tabpos);
return true;
}
return false;
}

View File

@ -1,6 +1,8 @@
/* ***** BEGIN LICENSE BLOCK ***** /* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1 * Version: MPL 1.1/GPL 2.0/LGPL 2.1
* *
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version * The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with * 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at * the License. You may obtain a copy of the License at
@ -11,36 +13,13 @@
* for the specific language governing rights and limitations under the * for the specific language governing rights and limitations under the
* License. * License.
* *
* The Original Code is Hunspell, based on MySpell. * Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
* *
* The Initial Developers of the Original Code are * Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Kevin Hendricks (MySpell) and Laszlo Nemeth (Hunspell). * Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Portions created by the Initial Developers are Copyright (C) 2002-2005 * Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* the Initial Developers. All Rights Reserved. * Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* * Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
* Contributor(s):
* David Einstein
* Davide Prina
* Giuseppe Modugno
* Gianluca Turconi
* Simon Brouwer
* Noll Janos
* Biro Arpad
* Goldman Eleonora
* Sarlos Tamas
* Bencsath Boldizsar
* Halacsy Peter
* Dvornik Laszlo
* Gefferth Andras
* Nagy Viktor
* Varga Daniel
* Chris Halls
* Rene Engelhard
* Bram Moolenaar
* Dafydd Jones
* Harri Pitkanen
* Andras Timar
* Tor Lillqvist
* *
* Alternatively, the contents of this file may be used under the terms of * Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or * either the GNU General Public License Version 2 or later (the "GPL"), or
@ -56,4 +35,22 @@
* *
* ***** END LICENSE BLOCK ***** */ * ***** END LICENSE BLOCK ***** */
#include "config.h" #ifndef FIRSTPARSER_HXX_
#define FIRSTPARSER_HXX_
#include "textparser.hxx"
/*
* Check first word of the input line
*
*/
class FirstParser : public TextParser {
public:
explicit FirstParser(const char* wc);
virtual ~FirstParser();
virtual bool next_token(std::string&);
};
#endif

View File

@ -0,0 +1,84 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "htmlparser.hxx"
#ifndef W32
using namespace std;
#endif
static const char* PATTERN[][2] = {{"<script", "</script>"},
{"<style", "</style>"},
{"<code", "</code>"},
{"<samp", "</samp>"},
{"<kbd", "</kbd>"},
{"<var", "</var>"},
{"<listing", "</listing>"},
{"<address", "</address>"},
{"<pre", "</pre>"},
{"<!--", "-->"},
{"<[cdata[", "]]>"}, // XML comment
{"<", ">"}};
#define PATTERN_LEN (sizeof(PATTERN) / (sizeof(char*) * 2))
static const char* PATTERN2[][2] = {
{"<img", "alt="}, // ALT and TITLE attrib handled spec.
{"<img", "title="},
{"<a ", "title="}};
#define PATTERN_LEN2 (sizeof(PATTERN2) / (sizeof(char*) * 2))
HTMLParser::HTMLParser(const char* wordchars)
: XMLParser(wordchars) {
}
HTMLParser::HTMLParser(const w_char* wordchars, int len)
: XMLParser(wordchars, len) {
}
bool HTMLParser::next_token(std::string& t) {
return XMLParser::next_token(PATTERN, PATTERN_LEN, PATTERN2, PATTERN_LEN2, t);
}
HTMLParser::~HTMLParser() {}

View File

@ -0,0 +1,56 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#ifndef HTMLPARSER_HXX_
#define HTMLPARSER_HXX_
#include "xmlparser.hxx"
/*
* HTML Parser
*
*/
class HTMLParser : public XMLParser {
public:
explicit HTMLParser(const char* wc);
HTMLParser(const w_char* wordchars, int len);
virtual bool next_token(std::string&);
virtual ~HTMLParser();
};
#endif

View File

@ -0,0 +1,261 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "latexparser.hxx"
#ifndef W32
using namespace std;
#endif
static struct {
const char* pat[2];
int arg;
} PATTERN[] = {{{"\\(", "\\)"}, 0},
{{"$$", "$$"}, 0},
{{"$", "$"}, 0},
{{"\\begin{math}", "\\end{math}"}, 0},
{{"\\[", "\\]"}, 0},
{{"\\begin{displaymath}", "\\end{displaymath}"}, 0},
{{"\\begin{equation}", "\\end{equation}"}, 0},
{{"\\begin{equation*}", "\\end{equation*}"}, 0},
{{"\\cite", NULL}, 1},
{{"\\nocite", NULL}, 1},
{{"\\index", NULL}, 1},
{{"\\label", NULL}, 1},
{{"\\ref", NULL}, 1},
{{"\\pageref", NULL}, 1},
{{"\\autoref", NULL}, 1},
{{"\\parbox", NULL}, 1},
{{"\\begin{verbatim}", "\\end{verbatim}"}, 0},
{{"\\verb+", "+"}, 0},
{{"\\verb|", "|"}, 0},
{{"\\verb#", "#"}, 0},
{{"\\verb*", "*"}, 0},
{{"\\documentstyle", "\\begin{document}"}, 0},
{{"\\documentclass", "\\begin{document}"}, 0},
// { { "\\documentclass", NULL } , 1 },
{{"\\usepackage", NULL}, 1},
{{"\\includeonly", NULL}, 1},
{{"\\include", NULL}, 1},
{{"\\input", NULL}, 1},
{{"\\vspace", NULL}, 1},
{{"\\setlength", NULL}, 2},
{{"\\addtolength", NULL}, 2},
{{"\\settowidth", NULL}, 2},
{{"\\rule", NULL}, 2},
{{"\\hspace", NULL}, 1},
{{"\\vspace", NULL}, 1},
{{"\\\\[", "]"}, 0},
{{"\\pagebreak[", "]"}, 0},
{{"\\nopagebreak[", "]"}, 0},
{{"\\enlargethispage", NULL}, 1},
{{"\\begin{tabular}", NULL}, 1},
{{"\\addcontentsline", NULL}, 2},
{{"\\begin{thebibliography}", NULL}, 1},
{{"\\bibliography", NULL}, 1},
{{"\\bibliographystyle", NULL}, 1},
{{"\\bibitem", NULL}, 1},
{{"\\begin", NULL}, 1},
{{"\\end", NULL}, 1},
{{"\\pagestyle", NULL}, 1},
{{"\\pagenumbering", NULL}, 1},
{{"\\thispagestyle", NULL}, 1},
{{"\\newtheorem", NULL}, 2},
{{"\\newcommand", NULL}, 2},
{{"\\renewcommand", NULL}, 2},
{{"\\setcounter", NULL}, 2},
{{"\\addtocounter", NULL}, 1},
{{"\\stepcounter", NULL}, 1},
{{"\\selectlanguage", NULL}, 1},
{{"\\inputencoding", NULL}, 1},
{{"\\hyphenation", NULL}, 1},
{{"\\definecolor", NULL}, 3},
{{"\\color", NULL}, 1},
{{"\\textcolor", NULL}, 1},
{{"\\pagecolor", NULL}, 1},
{{"\\colorbox", NULL}, 2},
{{"\\fcolorbox", NULL}, 2},
{{"\\declaregraphicsextensions", NULL}, 1},
{{"\\psfig", NULL}, 1},
{{"\\url", NULL}, 1},
{{"\\eqref", NULL}, 1},
{{"\\vskip", NULL}, 1},
{{"\\vglue", NULL}, 1},
{{"\'\'", NULL}, 1}};
#define PATTERN_LEN (sizeof(PATTERN) / sizeof(PATTERN[0]))
LaTeXParser::LaTeXParser(const char* wordchars)
: TextParser(wordchars)
, pattern_num(0), depth(0), arg(0), opt(0) {
}
LaTeXParser::LaTeXParser(const w_char* wordchars, int len)
: TextParser(wordchars, len)
, pattern_num(0), depth(0), arg(0), opt(0) {
}
LaTeXParser::~LaTeXParser() {}
int LaTeXParser::look_pattern(int col) {
for (unsigned int i = 0; i < PATTERN_LEN; i++) {
const char* j = line[actual].c_str() + head;
const char* k = PATTERN[i].pat[col];
if (!k)
continue;
while ((*k != '\0') && (tolower(*j) == *k)) {
j++;
k++;
}
if (*k == '\0')
return i;
}
return -1;
}
/*
* LaTeXParser
*
* state 0: not wordchar
* state 1: wordchar
* state 2: comments
* state 3: commands
* state 4: commands with arguments
* state 5: % comment
*
*/
bool LaTeXParser::next_token(std::string& t) {
t.clear();
int i;
int slash = 0;
int apostrophe;
for (;;) {
// fprintf(stderr,"depth: %d, state: %d, , arg: %d, token:
// %s\n",depth,state,arg,line[actual]+head);
switch (state) {
case 0: // non word chars
if ((pattern_num = look_pattern(0)) != -1) {
if (PATTERN[pattern_num].pat[1]) {
state = 2;
} else {
state = 4;
depth = 0;
arg = 0;
opt = 1;
}
head += strlen(PATTERN[pattern_num].pat[0]) - 1;
} else if (line[actual][head] == '%') {
state = 5;
} else if (is_wordchar(line[actual].c_str() + head)) {
state = 1;
token = head;
} else if (line[actual][head] == '\\') {
if (line[actual][head + 1] == '\\' || // \\ (linebreak)
(line[actual][head + 1] == '$') || // \$ (dollar sign)
(line[actual][head + 1] == '%')) { // \% (percent)
head++;
break;
}
state = 3;
}
break;
case 1: // wordchar
apostrophe = 0;
if (!is_wordchar(line[actual].c_str() + head) ||
(line[actual][head] == '\'' && line[actual][head + 1] == '\'' &&
++apostrophe)) {
state = 0;
bool ok = alloc_token(token, &head, t);
if (apostrophe)
head += 2;
if (ok)
return true;
}
break;
case 2: // comment, labels, etc
if (((i = look_pattern(1)) != -1) &&
(strcmp(PATTERN[i].pat[1], PATTERN[pattern_num].pat[1]) == 0)) {
state = 0;
head += strlen(PATTERN[pattern_num].pat[1]) - 1;
}
break;
case 3: // command
if ((tolower(line[actual][head]) < 'a') ||
(tolower(line[actual][head]) > 'z')) {
state = 0;
head--;
}
break;
case 4: // command with arguments
if (slash && (line[actual][head] != '\0')) {
slash = 0;
head++;
break;
} else if (line[actual][head] == '\\') {
slash = 1;
} else if ((line[actual][head] == '{') ||
((opt) && (line[actual][head] == '['))) {
depth++;
opt = 0;
} else if (line[actual][head] == '}') {
depth--;
if (depth == 0) {
opt = 1;
arg++;
}
if (((depth == 0) && (arg == PATTERN[pattern_num].arg)) ||
(depth < 0)) {
state = 0; // XXX not handles the last optional arg.
}
} else if (line[actual][head] == ']')
depth--;
} // case
if (next_char(line[actual].c_str(), &head)) {
if (state == 5)
state = 0;
return false;
}
}
}

View File

@ -0,0 +1,65 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#ifndef LATEXPARSER_HXX_
#define LATEXPARSER_HXX_
#include "textparser.hxx"
/*
* HTML Parser
*
*/
class LaTeXParser : public TextParser {
int pattern_num; // number of comment
int depth; // depth of blocks
int arg; // arguments's number
int opt; // optional argument attrib.
public:
explicit LaTeXParser(const char* wc);
LaTeXParser(const w_char* wordchars, int len);
virtual ~LaTeXParser();
virtual bool next_token(std::string&);
private:
int look_pattern(int col);
};
#endif

View File

@ -0,0 +1,98 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "manparser.hxx"
#ifndef W32
using namespace std;
#endif
ManParser::ManParser(const char* wordchars)
: TextParser(wordchars) {
}
ManParser::ManParser(const w_char* wordchars, int len)
: TextParser(wordchars, len) {
}
ManParser::~ManParser() {}
bool ManParser::next_token(std::string& t) {
for (;;) {
switch (state) {
case 1: // command arguments
if (line[actual][head] == ' ')
state = 2;
break;
case 0: // dot in begin of line
if (line[actual][0] == '.') {
state = 1;
break;
} else {
state = 2;
}
// no break
case 2: // non word chars
if (is_wordchar(line[actual].c_str() + head)) {
state = 3;
token = head;
} else if ((line[actual][head] == '\\') &&
(line[actual][head + 1] == 'f') &&
(line[actual][head + 2] != '\0')) {
head += 2;
}
break;
case 3: // wordchar
if (!is_wordchar(line[actual].c_str() + head)) {
state = 2;
if (alloc_token(token, &head, t))
return true;
}
break;
}
if (next_char(line[actual].c_str(), &head)) {
state = 0;
return false;
}
}
}

View File

@ -0,0 +1,58 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#ifndef MANPARSER_HXX_
#define MANPARSER_HXX_
#include "textparser.hxx"
/*
* Manparse Parser
*
*/
class ManParser : public TextParser {
protected:
public:
explicit ManParser(const char* wc);
ManParser(const w_char* wordchars, int len);
virtual ~ManParser();
virtual bool next_token(std::string&);
};
#endif

View File

@ -0,0 +1,76 @@
/* ***** BEGIN LICENSE BLOCK *****
* Version: MPL 1.1/GPL 2.0/LGPL 2.1
*
* Copyright (C) 2002-2017 Németh László
*
* The contents of this file are subject to the Mozilla Public License Version
* 1.1 (the "License"); you may not use this file except in compliance with
* the License. You may obtain a copy of the License at
* http://www.mozilla.org/MPL/
*
* Software distributed under the License is distributed on an "AS IS" basis,
* WITHOUT WARRANTY OF ANY KIND, either express or implied. See the License
* for the specific language governing rights and limitations under the
* License.
*
* Hunspell is based on MySpell which is Copyright (C) 2002 Kevin Hendricks.
*
* Contributor(s): David Einstein, Davide Prina, Giuseppe Modugno,
* Gianluca Turconi, Simon Brouwer, Noll János, Bíró Árpád,
* Goldman Eleonóra, Sarlós Tamás, Bencsáth Boldizsár, Halácsy Péter,
* Dvornik László, Gefferth András, Nagy Viktor, Varga Dániel, Chris Halls,
* Rene Engelhard, Bram Moolenaar, Dafydd Jones, Harri Pitkänen
*
* Alternatively, the contents of this file may be used under the terms of
* either the GNU General Public License Version 2 or later (the "GPL"), or
* the GNU Lesser General Public License Version 2.1 or later (the "LGPL"),
* in which case the provisions of the GPL or the LGPL are applicable instead
* of those above. If you wish to allow use of your version of this file only
* under the terms of either the GPL or the LGPL, and not to allow others to
* use your version of this file under the terms of the MPL, indicate your
* decision by deleting the provisions above and replace them with the notice
* and other provisions required by the GPL or the LGPL. If you do not delete
* the provisions above, a recipient may use your version of this file under
* the terms of any one of the MPL, the GPL or the LGPL.
*
* ***** END LICENSE BLOCK ***** */
#include <cstdlib>
#include <cstring>
#include <cstdio>
#include <ctype.h>
#include "../hunspell/csutil.hxx"
#include "odfparser.hxx"
#ifndef W32
using namespace std;
#endif
static const char* PATTERN[][2] = {
{"<office:meta>", "</office:meta>"},
{"<office:settings>", "</office:settings>"},
{"<office:binary-data>", "</office:binary-data>"},
{"<!--", "-->"},
{"<[cdata[", "]]>"}, // XML comment
{"<", ">"}};
#define PATTERN_LEN (sizeof(PATTERN) / (sizeof(char*) * 2))
static const char* (*PATTERN2)[2] = NULL;
#define PATTERN_LEN2 0
ODFParser::ODFParser(const char* wordchars)
: XMLParser(wordchars) {
}
ODFParser::ODFParser(const w_char* wordchars, int len)
: XMLParser(wordchars, len) {
}
bool ODFParser::next_token(std::string& t) {
return XMLParser::next_token(PATTERN, PATTERN_LEN, PATTERN2, PATTERN_LEN2, t);
}
ODFParser::~ODFParser() {}

Some files were not shown because too many files have changed in this diff Show More