This is not possible for '$', because of the latex-meaning to
start/end math inset.
Therefore, if not ignoring format, we still have to use
[\\][\$] in regex in order to find '$' in text.
Given the regex 'r.*r\b' and a string
"abc regular something cursor currently"
we expect to find "regular something cursor".
But while searching we may be confronted with input
"regular something cursor curr"
and so the searched string would be seen longer.
Testcase without this patch:
1.) open de/Additional.lyx
2.) goto 6.1 Astronomy & Astrophysics
3.) open the index
4.) find advaced
a.) not ignoring format
b.) regex = .+
c.) language of regex: English
4.) search next
The seach finds the next break (which is outside of the index)
The following try to display the selection leads to crash
This change is valid for findadv too.
Patterns like '.*' now are greedy, like it is normal in regex
Searching for whole words is corrected, but can be slow.
One can speed up the search with adapted pattern.
So for instance searching for words starting and ending with 'r'
the normal pattern is 'r.*r'. The speed-up pattern could be
'\br[^\s]*r\b'. This halves the search time.
Search results are now different to that of lyx2.3, because the greedy
'.*' is now really greedy.
To achive the same results, we have to use '.*?' instead.
A try to decrement the number of tests for a match.
Also a try to handle Hebrew documents. Unfortunatelly
the latex output is missing the language specification
(only the change of encoding is available there).
I failed to find a proper place to add the lang.
That means, searching for e.g. English text in Hebrew documents
is not satisfying.
The needed time to find a simple string dependes on the
paragraph length was O(n^2)
Now it is down to O(n).
Before:
To determine if the pattern matches we compared the
paragraph from current position to the its end.
Increment current position if no match
Now:
Check if the character at current position has at least
the needed features (text, color, language etc)
If not, Increment current position
else proceed as before
Also added missing math env alignat
Modified handling of longtable/tabular
Added a routine to count for valid chars. This is needed
for detection of word boundaries.
Due to detection conflicts
regex '.*' vs match of word-boundaries in MatchStringAdv::operator()
we need to use '\b' in regex explicitly. E.g. '\b.*\b'
The backward search works, but
1.) only in current paragraph (this is the same as before)
2.) only in the same language environment.
1.) Added \textmd to be ignored (sometimes it is used and sometimes not)
2.) Typo: multiline --> multline. Searching in 'multline' caused a crash
because processing all of the '{' and '}' in the content of this math
exceeded the size of the interval field.
1.) Handle some unclosed parentheses
Sometimes \shortcut is not correctly closed
2.) Added \ldots as known char
3.) Discard some shapes (circlepar, droppar, ...)
4.) Omit resulting empty string and use some value
which cannot be matched instead
That way we do not match the whole table but only the cell contents.
The problem I had was
1.) Document language Spanish
2.) Table (copied from English doc) => language English
3.) All cell contents Spanish
Now search for English text led to a selection of the whole
table, although there was no English content in any cell.
The problem was, that the different list ennvironments
did not look different in tha latex output used for
search.
So the input of "\item ..." did not give information
if it is description, lyxlist, enumeration or labeling.
In search modus we use now "\item{enumeration}" etc.
Exception: findadv-21, but it is not a regression,
because this one never passed.
The problem here is, that we cannot differentiate
between enumeration, itemize, description and labeling
environment here.
Now tests findadv-01 ... findadv-20 pass too.
keytest.py: Expanded time for controll keys (like \[Return])
findadv*: expanded time for normal keys
lyxfind.cpp: Handle math equations
As it is now, searching with format needs ALL the features set
in order to match the pattern.
What needs to be done is a GUI specifying which of the features are
important.
1.) language
2.) font (series, shape)
3.) markup, underline, strikeout
4.) color
Having this info, the implementation is easy. Set
some variables and be done
Further normalize the latex input in case of enabled format search.
It was not enough to split the latex input on \foreignlanguage and \textcolor
macros only.
Instead also macros like \textt, or \noun etc had to be accounted for.
This patch uses therefore a different algorithm.
In the latexified text:
* Check and handle contained regex properly
* Discard superfluos '{' preventing our search engine
to match with the search pattern
Our findadv expects something like
prefix + 'search'
so that the regex (which is latexified too)
can work on 'search'
(In the source, the prefix is denoted by lead_as_string)
The latex output contains structs like
\foreignlaguage(abc}{xx\textbf{boldxx\textcolor{blue}{blue 1 blue 2} XX}}
which would never match the simple prefix.
Now the above is converted to
\foreignlaguage(abc}{xx}\\
\foreignlaguage(abc}{\textbf{boldxx}}
\foreignlaguage(abc}{\textbf{\textcolor{blue}{blue 1 blue 2}}}\\
\foreignlaguage(abc}{\textbf{ XX}}
Of course, more than one language or color in an inset can be searched for now.
Modified language handling
Still, there are problems, because sometimes the search pattern
does not contain the the requested info. So the 'find' often fails
for strings inside a list environment.
The change is significant if the search format is not disabled.
We try to analyze the pattern string first to get needed features
for the search.
We try to analyse the searched string and if it does not
contain all expected featers (color, language, char style, char decoration)
Still some problems though
* Added textsl, texttt, uline, uuline, sout, xout to the list of possible
leading strings.
* Account for correct number of open braces in regex.
Now the search works for enbled format too.
This is hopefully the last amend
Adapt the positional references in regex supplied by user
so that for instance '([a-z]+)\s\1' to find identical words in sequence
is changed to '([a-z]+)\s\2'.