If Document>Settings>Language>Encoding is set to any value except "auto" or "default", we
expect the whole document to use this encoding. Wiht encodings from the CJK package, this means
one big "CJK" environment and no encoding switches.
Characters that are not handled by the CJK package need to be "forced" in lib/unicodesymbols.
This is completed for "euc-cn", the others will follow.
The textcomp Unicode support file "ts1enc.dfu" defines 0x204E Low Asterisk
as \textasteriskcentered. LyX should follow suit.
The ASTERISK OPERATOR (correctly) maps to the same macro,
the "deprecated" tag marks the upstream mapping as preferred choice.
Added "force=iso8859-7" for some characters:
The iso8859-7.def file for the Greek 8-bit input encoding
used \textbullet as placeholder for non-defined characters.
This is fixed in v1.7 2019/01/08.
Once the fixed version is in common use, all "force=iso8859-7"
tags can be removed.
A feature can now be required only for specific input or font encodings:
- <feature>=enc1;enc2... Require the feature <feature> only if the
character is used in one if the specified font
or input encodings.
- <feature>!=enc1;enc2... Require the feature <feature> only if the
character is used in a font or input encoding
that is not among the specified.
Use the command as defined by Babel. This allows us to use the (more
advanced) Babel command if provided instead of rolling our own.
I add a dummy file format change in case it turns out we need to
do something here for old documents (e.g. with user preamble definitions)
Use the LaTeX internal character representation (LICR) macros
provided by lgrenc.def (since version 0.8 from 2013-05-13)
in lib/unicodesymbols. This fixes the PDF bookmarks (except for the
legacy input encoding iso-8859-7) and solves the problem of a missing
"v" character in Libertine LGR fonts (see lyx-users from 2018-01-29).
The ctest unicodesymbols/008-greek-and-coptic_iso8859-7_pdf2" now fails
(due to #9681). This is not a regression, as it is already
"unreliable" (wrong output, Latin character instead of Greek).
Drop compatibility definition of \~ as perispomeni accent
(that was required with lgrenc.def < 0.8).
The xfrac package is the "state of the art" for "split-level" (nice) fractions.
Character replacements look consistent, scale properly and fit in the line.
Fixes#5220.
Test unicodesymbols for most supported input encodings with Kornel's addition to ctests.
Add required "forces" to unicodesymbols:
* utf8x does not support all characters supported by LyX
* several 8-bit encodings map characters to math-mode commands - force replacement in text-mode so that LyX can wrap them in \\ensuremath.
Fix a misalignment (wrong replacements) in the Cyrillic Unicode block.
Use \\mathscr for Mathematical Script characters in Mathematical Alphanumeric Characters (in line with the characters in other unicode blocks.
First run of Kornels patch for tests with all input encodings in lib/encodings.
Remove redundant sample files - keep only one sample and change the input encoding in the test script.
Put remaining failing test in "unreliableTests" for later sorting...
Do not use REVERSE SOLIDUS OPERATOR for backwards conversion of
\\\\textbackslash in LyX and tex2lyx.
Both, \\\\ (005C REVERSE SOLIDUS = backslash) and 0x29f5 map to
\\\\textbackslash but 005c is the preferred back-transformation.
Otherwise, using \\\\ in "mathematical text" leads to literal 0x29f5 in the LyX
source which leads to "missing character" errors with non-TeX fonts.
force=utf8 is required for most characters provided by add-on packgages
and (almost) all mathematical characters, because these are not
set up for inputencs utf8
unicodesymbols.py failed here (python 2.7 under Linux) before the simple fix
included in this commit.
Fix problems revealed by hand-compiling an examining the test samples in autotests/export/Unicode-characters/:
* new definitions
* fixed definitions
* "force=utf8" when required
* some IPA symbols fail without the "extraipa" package
* fix direction of "textcommaaboveright"
There are still many math symbols in lib/symbols that lack a corresponding
entry in lib/unicodesymbols, although a clear mapping exists. This commit
adds some of them (not all yet). In the future we should probably move the
information from both files into one database.
These are all in lib/symbols, but we did not yet know the corresponding unicode
numbers. unicodesymbols does still not contain all symbols from lib/symbols.
The parser that reads unicodesymbols uses backslashes to escape quotes, so
every backslash that is part of a LaTeX command needs to be escaped as well.
There are more candidates in the greek and cyrillic sections, but I don't
know those commands, so I did not touch them.
The lib/unicodesymbols part is based on work by Günter Milde:
Both, \r{A} and \AA (rsp. \r{a} and \aa) are equivalent standard LICR macros
for Aring/aring as well as the deprecated "angstrom sign" character (212B).
However, with \AA for 212B and \r{A} for 00C5, tex2lyx converts \AA to the
deprecated "angstrom sign" which is missing in many fonts including the
Unicode version of Latin Modern.
I added the normalize_c() calls so that tex2lyx prefers the precomposed forms
(these are better editable in LyX) and the deprecated flag.