Warning was as follows:
src/tex2lyx/Parser.cpp:898:39: error: adding 'uint32_t' (aka 'unsigned int') to a string does not append to the string [-Werror,-Wstring-plus-int]
warning_message("ignoring a char: " + static_cast<uint32_t>(c));
~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~
src/tex2lyx/Parser.cpp:898:39: note: use array indexing to silence this warning
warning_message("ignoring a char: " + static_cast<uint32_t>(c));
^
& [ ]
1 error generated.
This adds native macros for subindexes (!level), |see and |seealso
as well as native support for ranges |( |) and pagination format
-- e.g., |textbf -- via the index dialog
Resolves#12478, #7232 and #5014
The feature is complete (incl. tex2lyx) except for
* file format change and lyx2lyx
* docbook/xhtml
* documentation
This commit does a bulk fix of incorrect annotations (comments) at the
end of namespaces.
The commit was generated by initially running clang-format, and then
from the diff of the result extracting the hunks corresponding to
fixes of namespace comments. The changes being applied and all the
results have been manually reviewed. The source code successfully
builds on macOS.
Further details on the steps below, in case they're of interest to
someone else in the future.
1. Checkout a fresh and up to date version of src/
git pull && git checkout -- src && git status src
2. Ensure there's a suitable .clang-format in place, i.e. with options
to fix the comment at the end of namespaces, including:
FixNamespaceComments: true
SpacesBeforeTrailingComments: 1
and that clang-format is >= 5.0.0, by doing e.g.:
clang-format -dump-config | grep Comments:
clang-format --version
3. Apply clang-format to the source:
clang-format -i $(find src -name "*.cpp" -or -name "*.h")
4. Create and filter out hunks related to fixing the namespace
git diff -U0 src > tmp.patch
grepdiff '^} // namespace' --output-matching=hunk tmp.patch > fix_namespace.patch
5. Filter out hunks corresponding to simple fixes into to a separate patch:
pcregrep -M -e '^diff[^\n]+\nindex[^\n]+\n--- [^\n]+\n\+\+\+ [^\n]+\n' \
-e '^@@ -[0-9]+ \+[0-9]+ @@[^\n]*\n-\}[^\n]*\n\+\}[^\n]*\n' \
fix_namespace.patch > fix_namespace_simple.patch
6. Manually review the simple patch and then apply it, after first
restoring the source.
git checkout -- src
patch -p1 < fix_namespace_simple.path
7. Manually review the (simple) changes and then stage the changes
git diff src
git add src
8. Again apply clang-format and filter out hunks related to any
remaining fixes to the namespace, this time filter with more
context. There will be fewer hunks as all the simple cases have
already been handled:
clang-format -i $(find src -name "*.cpp" -or -name "*.h")
git diff src > tmp.patch
grepdiff '^} // namespace' --output-matching=hunk tmp.patch > fix_namespace2.patch
9. Manually review/edit the resulting patch file to remove hunks for files
which need to be dealt with manually, noting the file names and
line numbers. Then restore files to as before applying clang-format
and apply the patch:
git checkout src
patch -p1 < fix_namespace2.patch
10. Manually fix the files noted in the previous step. Stage files,
review changes and commit.
writer2latex surrounds quotes by braces which we skip to avoid useless ERT.
I broke this in 25fe87e5 which made Parser::next_next_token() not recognize
that it needed to parse one more token. If we had a unit test for the Parser
class it would probably have detected that. Now we have at least a test for
the special quote.
If we call tex2lyx on a temporary file created from the clipboard, the
file is always in utf8 encoding, without any temporary changes, even if it
contains encoding changing LaTeX commands. Therefore, we must tell tex2lyx
to use a fixed utf8 encoding for the whole file, and this is done using the
new latexclipboard format. Previously, tex2lyx thought the encoding was
latin1.
As a side effect, the -e option is now also documented in the man page.
I added it because of a misunderstanding. If the encoding is really set too
late, to_utf8() in the Token constructor would probably fail, but a non-empty
putback buffer is no problem, since it was always created via to_utf8() ->
from_utf8().
This is achieved by not calling Parse::tokenize_one() anymore in
Parser::good(): The status of the input can be tested without performing the
actual tokenizing. Now there are only two methods that may prevent an encoding
change:next_token() and next_next_token().
Jean-Marc discovered a possible data loss caused by Parser::getChar().
This is now fixed, and Parser::getChar() is removed, since it is no longer
needed and easy to use it in the wrong way.
When switching catcodes (for example to verbatim), there are in general some tokens that have already been parsed according to the old catcodes. It is therefore needed to "forget" those token and reparse them.
The basic idea is to use idocstream::putback() to feed the characters back to the stream, but this does not work (probably because we disable buffering). Therefore, we implement a simple stream-like class that implements putback().
This is a refactoring commit, which may introduce minor regressions, but should go in the right direction.
* introduce Parser::verbatimEnvironment, which is a wrapper around verbatimStuff
* introduce output_ert, which inserts a string as ERT; this allows to factor out several code instances.
* rename handle_ert to output_ert_inset: this creates an inset and then calls output_ert
* remove handle_comment (use handle_ert instead)
* remove handle_backaspace (unused now)
* use the new methods in handle_listing
TODO:
* use for other verbatim-like cases (Sweave chunk, url...)
* maybe implement Parser::unparse to reparse tokens already parsed with the wrong catcodes (if needed)
- Implement catcode setting in Parser
- add a new Parser::verbatimStuff method that reads verbatim contents
- use this method to parse "verbatim" environment.
- use it to parse \verb too.
- rename Parser::verbatimEnvironment to ertEnvironment.
TODO:
- use for other verbatim-like cases (Sweave chunk, lstlisting...)
- factor out the function that outputs ERT (including line breaks)
- maybe implement Parser::unparse (if needed)
Provide functions for translating to the LyX name
of an encoding from either a LaTeX name or an Iconv
name, with the possibility to specify the package.
This is in anticipation of changing to use the LyX
name of the encoding in the .lyx file format and
allowing multiple lib/encodings entries to have
the same LaTeX name (but different packages!).
The tex2lyx parser needs to worry about the iconv
name of the input encoding, so store that instead
of the latex name.
These encodings were not defined, since they must not be used as document
encodings (the characters {, } and \ may appear in high bytes, and latex
would be confused). However, they are supported by CJK.sty (which uses a
preprocessor to circumvent the limitations of the latex executable). These
encodings are now defined, but used for import in tex2lyx only.
The test case CJK.tex contained fake tests for shift-jis and big5 (the
japanese and chinese characters were entered using the utf8 encoding), and
therefore the wrong interpretation of these encoding looked as if it worked.
The comments about missing iconv support of shift-jis and big5 were wrong as
well (otherwise shift-jis-plain would not work either).
Commit 7cfac95 got rid of empty lines that were created by removing \usepackage
statements. However, it added an additional newline in case the \usepackage
was not at the end of the line. This is now fixed.
The old fix was incomplete (\verb~\~ was translated to \verb~~ in roundtrip).
The real cause for this bug (and also the mistranslation of \href{...}{\}})
was the misbehaviour of Token::character() (see comment in Parser.h): This
method even returns a character if the category is catEscape, and this is not
wanted in most (all?) cases.
- Parser.cpp: \verb can have any character as delimiter (except of ASCII letters) not only '+', therefore partly revert [3943b887/lyxgit] and fix it for all cases