Different behaviour in regexp{..} for 'İ' and 'ß':
1.) lowercase routine for 'İ' gives 'İ', so that if we are searching
while ignoring case, the string '\dot{I}' is converted to '\dot{i}'.
In this case we have to change it to 'İ' (instead of 'i', as one would expect).
2.) If 'ß' is inserted via keybord on fresh created regexp box it appears as \lyxmathsym{ß},
if pasted from the lyx-screen it appears as \text{ß}
1.) Added for 'breve' and 'grave' accents
2.) Corrected handling for 'i'-accents (allowed \hat{i} _and_ \hat{\imath})
because of problems with ignoring case
3.) Spaces: Changed some indents in source
The problem is the handling of regex as using math-mode. That is
any accented character is converted to a math macro.
For instance "ä" --> "\\ddot{a}".
Outside of math or regex it is not converted (if used xetex flavour),
but there are other chars which are converted in math and in text (but differently)
For instance "ů"
in math --> "\\mathring{u}"
in text --> "\\r{u}"
TODO: determine the still not handled conversions.
It would be nice, if we could persuade math factory to not convert
these characters, but I was unable to find the place where the
conversion actually takes place.
1.) Fill the 'head'-member to easier recognize the macro. May be discarded
later, although it does not take too much run-time
2.) Add some comment
3.) Ignore any macro inside the regex.
1.) Make sure the environment is mentioned in the string for search
(Added the keyword \latexenvironment{...})
2.) Handle it similar to \textcolor{}
That way we can also search for 'conclusion*' or 'summary' etc
in Additional.lyx.
Some macros need:
1.) Take care of case sensitivity
2.) Better handling of used argument values
3.) Cleaner list-environment search
4.) Remove superfluous '~' if searching for description or labeling env
Sometime it happen that the selection contains area which was skipped
in splitOnKnownMacros().
So we check, if a shorter selection would give the same mach size.
Also
a.) try to speed up regex search using non-greedy mode (.*?)
b.) remove '\n' completely in searched strings if it is not surrounded with
aplanumerical chars
The problem here is, that selecting any subset of a \lettrine{}
line always creates an initials header. That makes it impossible
to our search engine to find strings, because the regex does not
contain that info. So we have to discard the leading \lettrine part
completely.
We place now a marker (\endarguments) to determine that removable
part.
This is not possible for '$', because of the latex-meaning to
start/end math inset.
Therefore, if not ignoring format, we still have to use
[\\][\$] in regex in order to find '$' in text.
Given the regex 'r.*r\b' and a string
"abc regular something cursor currently"
we expect to find "regular something cursor".
But while searching we may be confronted with input
"regular something cursor curr"
and so the searched string would be seen longer.
Testcase without this patch:
1.) open de/Additional.lyx
2.) goto 6.1 Astronomy & Astrophysics
3.) open the index
4.) find advaced
a.) not ignoring format
b.) regex = .+
c.) language of regex: English
4.) search next
The seach finds the next break (which is outside of the index)
The following try to display the selection leads to crash
This change is valid for findadv too.
Patterns like '.*' now are greedy, like it is normal in regex
Searching for whole words is corrected, but can be slow.
One can speed up the search with adapted pattern.
So for instance searching for words starting and ending with 'r'
the normal pattern is 'r.*r'. The speed-up pattern could be
'\br[^\s]*r\b'. This halves the search time.
Search results are now different to that of lyx2.3, because the greedy
'.*' is now really greedy.
To achive the same results, we have to use '.*?' instead.
A try to decrement the number of tests for a match.
Also a try to handle Hebrew documents. Unfortunatelly
the latex output is missing the language specification
(only the change of encoding is available there).
I failed to find a proper place to add the lang.
That means, searching for e.g. English text in Hebrew documents
is not satisfying.
The needed time to find a simple string dependes on the
paragraph length was O(n^2)
Now it is down to O(n).
Before:
To determine if the pattern matches we compared the
paragraph from current position to the its end.
Increment current position if no match
Now:
Check if the character at current position has at least
the needed features (text, color, language etc)
If not, Increment current position
else proceed as before
Also added missing math env alignat
Modified handling of longtable/tabular
Added a routine to count for valid chars. This is needed
for detection of word boundaries.
Due to detection conflicts
regex '.*' vs match of word-boundaries in MatchStringAdv::operator()
we need to use '\b' in regex explicitly. E.g. '\b.*\b'
The backward search works, but
1.) only in current paragraph (this is the same as before)
2.) only in the same language environment.
1.) Added \textmd to be ignored (sometimes it is used and sometimes not)
2.) Typo: multiline --> multline. Searching in 'multline' caused a crash
because processing all of the '{' and '}' in the content of this math
exceeded the size of the interval field.
1.) Handle some unclosed parentheses
Sometimes \shortcut is not correctly closed
2.) Added \ldots as known char
3.) Discard some shapes (circlepar, droppar, ...)
4.) Omit resulting empty string and use some value
which cannot be matched instead
That way we do not match the whole table but only the cell contents.
The problem I had was
1.) Document language Spanish
2.) Table (copied from English doc) => language English
3.) All cell contents Spanish
Now search for English text led to a selection of the whole
table, although there was no English content in any cell.
The problem was, that the different list ennvironments
did not look different in tha latex output used for
search.
So the input of "\item ..." did not give information
if it is description, lyxlist, enumeration or labeling.
In search modus we use now "\item{enumeration}" etc.
Exception: findadv-21, but it is not a regression,
because this one never passed.
The problem here is, that we cannot differentiate
between enumeration, itemize, description and labeling
environment here.
Now tests findadv-01 ... findadv-20 pass too.
keytest.py: Expanded time for controll keys (like \[Return])
findadv*: expanded time for normal keys
lyxfind.cpp: Handle math equations
As it is now, searching with format needs ALL the features set
in order to match the pattern.
What needs to be done is a GUI specifying which of the features are
important.
1.) language
2.) font (series, shape)
3.) markup, underline, strikeout
4.) color
Having this info, the implementation is easy. Set
some variables and be done
Further normalize the latex input in case of enabled format search.
It was not enough to split the latex input on \foreignlanguage and \textcolor
macros only.
Instead also macros like \textt, or \noun etc had to be accounted for.
This patch uses therefore a different algorithm.
In the latexified text:
* Check and handle contained regex properly
* Discard superfluos '{' preventing our search engine
to match with the search pattern
Our findadv expects something like
prefix + 'search'
so that the regex (which is latexified too)
can work on 'search'
(In the source, the prefix is denoted by lead_as_string)
The latex output contains structs like
\foreignlaguage(abc}{xx\textbf{boldxx\textcolor{blue}{blue 1 blue 2} XX}}
which would never match the simple prefix.
Now the above is converted to
\foreignlaguage(abc}{xx}\\
\foreignlaguage(abc}{\textbf{boldxx}}
\foreignlaguage(abc}{\textbf{\textcolor{blue}{blue 1 blue 2}}}\\
\foreignlaguage(abc}{\textbf{ XX}}
Of course, more than one language or color in an inset can be searched for now.
Modified language handling
Still, there are problems, because sometimes the search pattern
does not contain the the requested info. So the 'find' often fails
for strings inside a list environment.
The change is significant if the search format is not disabled.
We try to analyze the pattern string first to get needed features
for the search.
We try to analyse the searched string and if it does not
contain all expected featers (color, language, char style, char decoration)
Still some problems though
* Added textsl, texttt, uline, uuline, sout, xout to the list of possible
leading strings.
* Account for correct number of open braces in regex.
Now the search works for enbled format too.
This is hopefully the last amend
Adapt the positional references in regex supplied by user
so that for instance '([a-z]+)\s\1' to find identical words in sequence
is changed to '([a-z]+)\s\2'.
This is slightly better, but still not satisfying.
Enable format search
Given the latexified string
\emph{Fox jUMps}
and using emphasized regex '\w*', we find 'Fox'. That is OK.
But the next find finds ' ', which is not OK.
In contrast, searching with '\w+', we find the correct string 'jUMps'.
If searching for instance '.+' , the found string expanded
to the end of search buffer. So we have to replace
'.' with '[^\}]'.
Also all constructs like '[^abc]' had to be changed to '[^abc\}]'
to not go behind the actual format.
There is still problem using '*', but constructs usin '+' seem to work now.
('.*' finds everything from first char in correct format
to (including) end of next format change
while '.+' find _only_ characters in correct format)
The part of code that removed space at start of paragraph have been
there forever, but its intent is unclear. For example, cutting text at
the end of a paragraph will lead to remove space at the start of this
same paragraph.
The removal of this functionality is offset by a rewrite of DEPM that
makes it more thorough.
Fixes bug #10503.
This commit does a bulk fix of incorrect annotations (comments) at the
end of namespaces.
The commit was generated by initially running clang-format, and then
from the diff of the result extracting the hunks corresponding to
fixes of namespace comments. The changes being applied and all the
results have been manually reviewed. The source code successfully
builds on macOS.
Further details on the steps below, in case they're of interest to
someone else in the future.
1. Checkout a fresh and up to date version of src/
git pull && git checkout -- src && git status src
2. Ensure there's a suitable .clang-format in place, i.e. with options
to fix the comment at the end of namespaces, including:
FixNamespaceComments: true
SpacesBeforeTrailingComments: 1
and that clang-format is >= 5.0.0, by doing e.g.:
clang-format -dump-config | grep Comments:
clang-format --version
3. Apply clang-format to the source:
clang-format -i $(find src -name "*.cpp" -or -name "*.h")
4. Create and filter out hunks related to fixing the namespace
git diff -U0 src > tmp.patch
grepdiff '^} // namespace' --output-matching=hunk tmp.patch > fix_namespace.patch
5. Filter out hunks corresponding to simple fixes into to a separate patch:
pcregrep -M -e '^diff[^\n]+\nindex[^\n]+\n--- [^\n]+\n\+\+\+ [^\n]+\n' \
-e '^@@ -[0-9]+ \+[0-9]+ @@[^\n]*\n-\}[^\n]*\n\+\}[^\n]*\n' \
fix_namespace.patch > fix_namespace_simple.patch
6. Manually review the simple patch and then apply it, after first
restoring the source.
git checkout -- src
patch -p1 < fix_namespace_simple.path
7. Manually review the (simple) changes and then stage the changes
git diff src
git add src
8. Again apply clang-format and filter out hunks related to any
remaining fixes to the namespace, this time filter with more
context. There will be fewer hunks as all the simple cases have
already been handled:
clang-format -i $(find src -name "*.cpp" -or -name "*.h")
git diff src > tmp.patch
grepdiff '^} // namespace' --output-matching=hunk tmp.patch > fix_namespace2.patch
9. Manually review/edit the resulting patch file to remove hunks for files
which need to be dealt with manually, noting the file names and
line numbers. Then restore files to as before applying clang-format
and apply the patch:
git checkout src
patch -p1 < fix_namespace2.patch
10. Manually fix the files noted in the previous step. Stage files,
review changes and commit.
Updating all previews (even if only one has changed) is more costly
than I thought. Thanks to Guillaume for tracking down this
performance issue.
This reversion is related to the reversions at 358745d0 and
a7a14395. See also #7242 and #9855.
This reverts commit 29948eec26.
Updating all previews (even if only one has changed) is more costly
than I thought. Thanks to Guillaume for tracking down this
performance issue.
This reversion is related to the reversion at 358745d0.
See also #7242 and #9855.
This reverts commit 66f527e417.
WriteStream is now built from an otexstream instead of an odocstream, and
therefore counts lines in a TexRow. Calls to TexRow are added in relevant places
in math insets.
This finishes adding line tracking for math in the source panel and for forward
search.
* TexRow now computes rows from a DocIterator. In practice, the cursor
highlighting is now correct inside insets, it is no longer restricted to the
topmost level. It certainly also makes forward-search more precise.
* Added the option to disable a texrow when not needed, for perf.
* Fixed a bug where the last paragraph was not properly highlighted.
Limitations:
* TexRow still does not handle: math (e.g. multi-cell), sub-captions, inset
arguments.
These were all flagged by "(style) The scope of the variable 'x' can be reduced."
Narowing the scope improves readability, and if it is in a loop then the
compiler will be clever enough to produce efficient code, we do not need
manual optimization for POD types.
Newer boost versions use complicated type traits for boost::next and
boost::prior, which do not work with the RandomAccessList iterators.
The long term solution is to use std::next and std::prev, for now supply
simple replacements for compilers that do not support C++11 yet.
lyxfind.cpp(findNextChange, findPreviousChange, findChange, selectChange): factor the change-selection part out of the change-finding part
Text.cpp (acceptOrRejectChanges): call only selectChange
The problem is the use of cursor movement methods to update cursor.
Cursor::forwardPos() steps into insets, which is not always what we
want. The problem here is that there is a math inset just after the
accepted change, and that the cursor steps into it for some reason.
This code is a nightmare anyway.
Fixes: bug #9145
The two fixes here a obviously right, although it is not clear why they are sufficient to fix the bug. Anyway I cannot reproduce any crash with it.
* the first part just conditions a whole if/else to change_next_pos.changed(). Originally, only the if branch was concerned.
* the second part is to avoid calling CursorSlice::backwardPos() when position is 0. Doing this leads to an assertion.
each failure.
There are several places I was not sure what to do. These are marked
by comments beginning "LASSERT:" so they can be found easily. At the
moment, they are at:
Author.cpp:105: // LASSERT: What should we do here?
Author.cpp:121: // LASSERT: What should we do here?
Buffer.cpp:4525: // LASSERT: Is it safe to continue here, or should we just return?
Cursor.cpp:345: // LASSERT: Is it safe to continue here, or should we return?
Cursor.cpp:403: // LASSERT: Is it safe to continue here, or should we return?
Cursor.cpp:1143: // LASSERT: There have been several bugs around this code, that seem
CursorSlice.cpp:83: // LASSERT: This should only ever be called from an InsetMath.
CursorSlice.cpp:92: // LASSERT: This should only ever be called from an InsetMath.
LayoutFile.cpp:303: // LASSERT: Why would this fail?
Text.cpp:995: // LASSERT: Is it safe to continue here?
Using Cursor::setCursor or even BufferView::setCursor is often a bad
idea since it does not run DEPM. In this case (and other cases in
f&replace code) it is better to use BufferView::mouseSetCursor (which
should maybe be renamed...).
When using 'find' and a string is not found, this is not an error or a
surprising event. It is often expected (e.g. after searching through
the whole document for a certain string eventually you will get this
message). The exclamation mark should be reserved for messages that
are unexpected or that need extra attention, such as errors.