Test unicodesymbols for most supported input encodings with Kornel's addition to ctests.
Add required "forces" to unicodesymbols:
* utf8x does not support all characters supported by LyX
* several 8-bit encodings map characters to math-mode commands - force replacement in text-mode so that LyX can wrap them in \\ensuremath.
Fix a misalignment (wrong replacements) in the Cyrillic Unicode block.
Use \\mathscr for Mathematical Script characters in Mathematical Alphanumeric Characters (in line with the characters in other unicode blocks.
First run of Kornels patch for tests with all input encodings in lib/encodings.
Remove redundant sample files - keep only one sample and change the input encoding in the test script.
Put remaining failing test in "unreliableTests" for later sorting...
Do not use REVERSE SOLIDUS OPERATOR for backwards conversion of
\\\\textbackslash in LyX and tex2lyx.
Both, \\\\ (005C REVERSE SOLIDUS = backslash) and 0x29f5 map to
\\\\textbackslash but 005c is the preferred back-transformation.
Otherwise, using \\\\ in "mathematical text" leads to literal 0x29f5 in the LyX
source which leads to "missing character" errors with non-TeX fonts.
force=utf8 is required for most characters provided by add-on packgages
and (almost) all mathematical characters, because these are not
set up for inputencs utf8
unicodesymbols.py failed here (python 2.7 under Linux) before the simple fix
included in this commit.
Fix problems revealed by hand-compiling an examining the test samples in autotests/export/Unicode-characters/:
* new definitions
* fixed definitions
* "force=utf8" when required
* some IPA symbols fail without the "extraipa" package
* fix direction of "textcommaaboveright"
There are still many math symbols in lib/symbols that lack a corresponding
entry in lib/unicodesymbols, although a clear mapping exists. This commit
adds some of them (not all yet). In the future we should probably move the
information from both files into one database.
These are all in lib/symbols, but we did not yet know the corresponding unicode
numbers. unicodesymbols does still not contain all symbols from lib/symbols.
The parser that reads unicodesymbols uses backslashes to escape quotes, so
every backslash that is part of a LaTeX command needs to be escaped as well.
There are more candidates in the greek and cyrillic sections, but I don't
know those commands, so I did not touch them.
The lib/unicodesymbols part is based on work by Günter Milde:
Both, \r{A} and \AA (rsp. \r{a} and \aa) are equivalent standard LICR macros
for Aring/aring as well as the deprecated "angstrom sign" character (212B).
However, with \AA for 212B and \r{A} for 00C5, tex2lyx converts \AA to the
deprecated "angstrom sign" which is missing in many fonts including the
Unicode version of Latin Modern.
I added the normalize_c() calls so that tex2lyx prefers the precomposed forms
(these are better editable in LyX) and the deprecated flag.
LaTeXFeatures defines \textcommabelow and \textcommaabove based on a
generic \LyXTextAccent and declares TextCompositeCommands for the Baltic
letters in the T1 font encoding, using \textcommaabove for the small letter g
and \textcommabelow else.
This allows overwriting of the composite definition for other font encodings.
Especially, it does not interfere with the polish/baltic font encoding L7x
(supported by LatinModern and TeXGyre fonts) that provides pre-composed
glyphs.
Greek characters with perispomeni (tilde) accent were not properly shown
in the output document, because the "textgreek" feature re-defined \~ in
a way incompatible with lgrenc.def since version 0.8 (2013-05-13)
(package greek-fontenc).
The compatibility-definition is required for older versions of the font setup
(before the move of "lgrenc.def" from "babel" to "greek-fontenc").
It is now done with "ProvideTextCommand" to not overwrite the more complete
implementation in lgrenc.def.
With the compatibility definition, combined diacritics with tilde
must be input with the tilde first (e.g. \~>, not \>~).
"unicodesymbols" is changed accordingly.
Also, some LICRs for combining Greek diacritical characters were added to
Unicodesymbols.
This fixes bug #9615.
The "notermination" flag tells LyX, that terminating an LICR macro with {} is
not necessary. This is normally the case for all macros with non-alphabetical
name (e.g. \{).
However, combining diacritical characters are converted to *accent macros*,
which expect an argument (the base character).
In Unicode, the base character precedes the combining character,
in LaTeX the combining character precedes the base character.
LyX changes the order of the two characters to get this right,
e.g. "x" + "combining tilde" becomes "\~{x}".
In the special case there is no preceding character (e.g. at the start of the
document or a paragraph), Unicode shows the combining diacritical character
without base character.
The replacement is currently not "terminated" (e.g. "\~"), because of the
"notermination=text" flags in "unicodesymbols".
The accent macros take the *following* character as base character, which is
clearly not intended.
In case of a paragraph consisting of just one combining diacritical character,
LaTeX compilation fails with an error.
With the patch, LyX writes the accent macros with an empty argument,
e.g. "\~{}", the output is similar to the view in the GUI with the diacritical
character on its own, not on the follwoing character.
This is bug #9612. The patch is from Günter Milde. He wrote:
The patch uses the "long" macro names (\llless and \gggtr) to minimize
name-clash probability. (There is, e.g., a name clash of \lll with Babel's
polish.ldf (cf. bug #6197))
and fix wrong ones. This fixes the safe part of bug #8888. The symbols
provided by mdsymbol.sty have to wait, since mdsymbol.sty provides a huge
number of symbols, I don't have the time right now to process them all, and
a partial file format update does not make sense.
The macro is identical to \ldots in texted, but this way, tex2lyx can import both \ldots (as InsetSpecialChar) and \dots (as unicode glyph), while retaining the original distinction (which might get relevant with some special packages or via user redefinition of one of these macros).
The main part of the fix (unicodesymbols) is from Jürgen. This commit fixes
tree problems:
- \; etc. were also used in text mode, but are math only
- all of those glyphs need to be forced with utf8
- actually, \; etc. are not the correct macros, since the encoded spaces are
breakable, but the math spaces are all protected. The sapce symbols are not
defined in the utf8 encodings.
\textsubtilde is a combining character (0x0330), but 0x02f7 is not.
Apart from the wrong LaTeX output, having the same command for two symbols
confuses texc2lyx.
There were found with -dbg mathed ans entering a math inset.
I kept the AMS versions, except leadsto, which is only an approximation in AMS.
hbar was simply defined twice with identical definitions.
The stmaryrd package adds support for lots of math symbols, using a font
designed to accompany the computer modern fonts. The changes in detail:
- Fix generate_symbols_list.py to work with stmaryrd.sty. It loooks like it
was automatically translated from a perl version and never used.
- Generate the new symbols in lib/symbols using generate_symbols_list.py and
add some manual adjustments
- Generate stmary10.ttf by a simple ttf export from stmary10.sfd with fontforge
- Add license info for stmary10.ttf
- Create a test file with all symbols from stmaryrd.sty. Actually it would be
nice to have this for the other fonts as well.
- The mechanics: lyx2lyx, tex2lyx, font machinery etc.
Math commands need it as well as text commands. At the same time, this
further unifies the checking for termination and fixes cases of wrong
output (e.g. for 0x2005).
- 0x200c is equivalent to \textcompwordmark
- 0x2011 is equivalent to \nobreakdash-
- 0x2009 is a breakable thin space (the old definition was non-breakable)
- 0x202f is a non-breakable thin space
- add spaces 0x2007, 0x2008, 0x200a and 0x200b
git-svn-id: svn://svn.lyx.org/lyx/lyx-devel/trunk@40694 a592a061-630c-0410-9148-cb99ea01b6c8
The clash can however still happen if those packages are both loaded for other symbols.
git-svn-id: svn://svn.lyx.org/lyx/lyx-devel/trunk@39885 a592a061-630c-0410-9148-cb99ea01b6c8
Now you can also require a|b|c, and if any of the features is already used,
no other one will be loaded. The first feature wins if none is already used.
This is required for a part of bug #7811.
git-svn-id: svn://svn.lyx.org/lyx/lyx-devel/trunk@39884 a592a061-630c-0410-9148-cb99ea01b6c8
Essentially, it is not true that Cyrillic characters in math are to be typeset
in italic by default, and LyX already permits to style them as desired.
git-svn-id: svn://svn.lyx.org/lyx/lyx-devel/trunk@28981 a592a061-630c-0410-9148-cb99ea01b6c8
Cyrillic formatted text in mathmode
If some other letters from some script need to be typeset in italic,
it will suffice adding the flag "mathalpha" to the unicodesymbols file.
git-svn-id: svn://svn.lyx.org/lyx/lyx-devel/trunk@28731 a592a061-630c-0410-9148-cb99ea01b6c8