* Force unicodesymbols conversion for all *-platex input encodings,
* except some characters that work well in utf8.
* Use platex if document language is "japanese" and input encoding is "utf8".
* Fix macro termination if \textcyrillic or \textgreek is not required
for Greek or Cyrillic letter.
* Replace "writeScriptChars" with conditionals in the character-output loop in
"Paragraph::latex" (solves "FIXME: modifying i here is not very nice...").
The font changing commands \textcyrillic and \textgreek are no longer
part of the textcommand in "lib/unicodesymbols" but added when required
in Paragraph::Private::latexSpecialChar.
A feature can now be required only for specific input or font encodings:
- <feature>=enc1;enc2... Require the feature <feature> only if the
character is used in one if the specified font
or input encodings.
- <feature>!=enc1;enc2... Require the feature <feature> only if the
character is used in a font or input encoding
that is not among the specified.
Use the command as defined by Babel. This allows us to use the (more
advanced) Babel command if provided instead of rolling our own.
I add a dummy file format change in case it turns out we need to
do something here for old documents (e.g. with user preamble definitions)
This commit does a bulk fix of incorrect annotations (comments) at the
end of namespaces.
The commit was generated by initially running clang-format, and then
from the diff of the result extracting the hunks corresponding to
fixes of namespace comments. The changes being applied and all the
results have been manually reviewed. The source code successfully
builds on macOS.
Further details on the steps below, in case they're of interest to
someone else in the future.
1. Checkout a fresh and up to date version of src/
git pull && git checkout -- src && git status src
2. Ensure there's a suitable .clang-format in place, i.e. with options
to fix the comment at the end of namespaces, including:
FixNamespaceComments: true
SpacesBeforeTrailingComments: 1
and that clang-format is >= 5.0.0, by doing e.g.:
clang-format -dump-config | grep Comments:
clang-format --version
3. Apply clang-format to the source:
clang-format -i $(find src -name "*.cpp" -or -name "*.h")
4. Create and filter out hunks related to fixing the namespace
git diff -U0 src > tmp.patch
grepdiff '^} // namespace' --output-matching=hunk tmp.patch > fix_namespace.patch
5. Filter out hunks corresponding to simple fixes into to a separate patch:
pcregrep -M -e '^diff[^\n]+\nindex[^\n]+\n--- [^\n]+\n\+\+\+ [^\n]+\n' \
-e '^@@ -[0-9]+ \+[0-9]+ @@[^\n]*\n-\}[^\n]*\n\+\}[^\n]*\n' \
fix_namespace.patch > fix_namespace_simple.patch
6. Manually review the simple patch and then apply it, after first
restoring the source.
git checkout -- src
patch -p1 < fix_namespace_simple.path
7. Manually review the (simple) changes and then stage the changes
git diff src
git add src
8. Again apply clang-format and filter out hunks related to any
remaining fixes to the namespace, this time filter with more
context. There will be fewer hunks as all the simple cases have
already been handled:
clang-format -i $(find src -name "*.cpp" -or -name "*.h")
git diff src > tmp.patch
grepdiff '^} // namespace' --output-matching=hunk tmp.patch > fix_namespace2.patch
9. Manually review/edit the resulting patch file to remove hunks for files
which need to be dealt with manually, noting the file names and
line numbers. Then restore files to as before applying clang-format
and apply the patch:
git checkout src
patch -p1 < fix_namespace2.patch
10. Manually fix the files noted in the previous step. Stage files,
review changes and commit.
The lib/unicodesymbols part is based on work by Günter Milde:
Both, \r{A} and \AA (rsp. \r{a} and \aa) are equivalent standard LICR macros
for Aring/aring as well as the deprecated "angstrom sign" character (212B).
However, with \AA for 212B and \r{A} for 00C5, tex2lyx converts \AA to the
deprecated "angstrom sign" which is missing in many fonts including the
Unicode version of Latin Modern.
I added the normalize_c() calls so that tex2lyx prefers the precomposed forms
(these are better editable in LyX) and the deprecated flag.
Now all const methods may be called without additional locking.
This is assumed by the threaded LaTeX export, which always useses a globally
unique instance for each encoding.
This branch implements string-wise metrics computation. The goal is to
have both good metrics computation (and font with proper kerning and
ligatures) and better performance than what we have with
force_paint_single_char. Moreover there has been some code
factorization in TextMetrics, where the same row-breaking algorithm
was basically implemented 3 times.
Globally, the new code is a bit shorter than the existing one, and it
is much cleaner. There is still a lot of potential for code removal,
especially in the RowPainter, which should be rewritten to use the new
Row information.
The bugs fixed and caused by this branch are tracked at ticket #9003:
http://www.lyx.org/trac/ticket/9003
What is done:
* Make TextMetrics methods operate on Row objects: breakRow and
setRowHeight instead of rowBreakPoint and rowHeight.
* Change breakRow operation to operate at strings level to compute
metrics The list of elements is stored in the row object in visual
ordering, not logical. This will eventually allow to get rid of the
Bidi class.
* rename getColumnNearX to getPosNearX (and change code accordingly).
It does not make sense to return a position relative to the start of
row, since nobody needs this.
* Re-implement cursorX and getPosNearX using row elements.
* Get rid of lyxrc.force_paint_single_char. This was a workaround that
is not necessary anymore.
* Implement proper string metrics computation (with cache). Remove
useless workarounds which disable kerning and ligatures.
* Draw also RtL text string-wise. This speeds-up drawing.
* Do not cut strings at selection boundary in RowPainter. This avoids
ligature/kerning breaking in latin text, and bad rendering problems
in Arabic.
* Remove homebrew Arabic and Hebrew support from Encoding.cpp. We now
rely on Qt to do handle complex scripts.
* Get rid of LyXRC::rtl_support, which does not have a real use case.
* Fix display of [] and {} delimiters in Arabic scripts.
This is handled by Qt now.
Note that a small optimization (do not draw text that is to the left
of WorkArea) is removed because it cannot be guaranteed to be exact
anymore. It was probably not very useful anyway, and would become
useless once the RowPainter is rewritten to use Row information.
Update 00README_STR_METRICS_BRANCH.
Instead of relying on character range (Hebrew or Arabic) or character
direction, use RLO unicode character (Right-to-Left override) to force
painting in the direction indicated by the current font. This should
be as close as we can to the old LyX behavior (and requires less
code).
If this code works as intended, it will be possible to remove a lot of
code from Encodings.cpp.
We rely on Qt built-in unicode support for handling Arabic and Hebrew
compose characters. This allows to avoid to use our homegrown
machinery.
This should provide a nice speedup at a low cost and
will eventually allow us to get rid of:
* most of our Arabic/Hebrew machinery in Encodings.cpp,
* Paragraph::transformChar,
* and probably more.
This fixes bug #8554 and some recently introduced busg:
- Encodings::fromLaTeXCommand() can now handle all combining characters,
not only the one letter ones
- The remainder returned from Encodings::fromLaTeXCommand() must never be
thrown away in tex2lyx, but output as ERT
- No special case for combining diacritical marks needed anymore in parse_text()
- No special cases for accents and IPA combining diacritical marks needed
anymore in parse_text()
- special tipa short cuts may only be recognized if the tipa package is loaded
- Use requirements returned by Encodings::fromLaTeXCommand() instead of
hardcoded registering of tipa and tipax
- Get rid of the name2 variable in parse_text(): We must use name, otherwise
the extra stuff that might have been put into name vanishes
Provide functions for translating to the LyX name
of an encoding from either a LaTeX name or an Iconv
name, with the possibility to specify the package.
This is in anticipation of changing to use the LyX
name of the encoding in the .lyx file format and
allowing multiple lib/encodings entries to have
the same LaTeX name (but different packages!).
The tex2lyx parser needs to worry about the iconv
name of the input encoding, so store that instead
of the latex name.
These encodings were not defined, since they must not be used as document
encodings (the characters {, } and \ may appear in high bytes, and latex
would be confused). However, they are supported by CJK.sty (which uses a
preprocessor to circumvent the limitations of the latex executable). These
encodings are now defined, but used for import in tex2lyx only.
The test case CJK.tex contained fake tests for shift-jis and big5 (the
japanese and chinese characters were entered using the utf8 encoding), and
therefore the wrong interpretation of these encoding looked as if it worked.
The comments about missing iconv support of shift-jis and big5 were wrong as
well (otherwise shift-jis-plain would not work either).