1.) Use vector for borders, because any value may be too small
if there are plenty of accented characters in a paragraph
2.) use '[\S]' instead of '.' in regex for 'accre'. The regex would
otherwise find also patterns like '\ {some text}'
Different behaviour in regexp{..} for 'İ' and 'ß':
1.) lowercase routine for 'İ' gives 'İ', so that if we are searching
while ignoring case, the string '\dot{I}' is converted to '\dot{i}'.
In this case we have to change it to 'İ' (instead of 'i', as one would expect).
2.) If 'ß' is inserted via keybord on fresh created regexp box it appears as \lyxmathsym{ß},
if pasted from the lyx-screen it appears as \text{ß}
1.) Added for 'breve' and 'grave' accents
2.) Corrected handling for 'i'-accents (allowed \hat{i} _and_ \hat{\imath})
because of problems with ignoring case
3.) Spaces: Changed some indents in source
The problem is the handling of regex as using math-mode. That is
any accented character is converted to a math macro.
For instance "ä" --> "\\ddot{a}".
Outside of math or regex it is not converted (if used xetex flavour),
but there are other chars which are converted in math and in text (but differently)
For instance "ů"
in math --> "\\mathring{u}"
in text --> "\\r{u}"
TODO: determine the still not handled conversions.
It would be nice, if we could persuade math factory to not convert
these characters, but I was unable to find the place where the
conversion actually takes place.
1.) Fill the 'head'-member to easier recognize the macro. May be discarded
later, although it does not take too much run-time
2.) Add some comment
3.) Ignore any macro inside the regex.
1.) Make sure the environment is mentioned in the string for search
(Added the keyword \latexenvironment{...})
2.) Handle it similar to \textcolor{}
That way we can also search for 'conclusion*' or 'summary' etc
in Additional.lyx.
Some macros need:
1.) Take care of case sensitivity
2.) Better handling of used argument values
3.) Cleaner list-environment search
4.) Remove superfluous '~' if searching for description or labeling env
Sometime it happen that the selection contains area which was skipped
in splitOnKnownMacros().
So we check, if a shorter selection would give the same mach size.
Also
a.) try to speed up regex search using non-greedy mode (.*?)
b.) remove '\n' completely in searched strings if it is not surrounded with
aplanumerical chars
The problem here is, that selecting any subset of a \lettrine{}
line always creates an initials header. That makes it impossible
to our search engine to find strings, because the regex does not
contain that info. So we have to discard the leading \lettrine part
completely.
We place now a marker (\endarguments) to determine that removable
part.
This is not possible for '$', because of the latex-meaning to
start/end math inset.
Therefore, if not ignoring format, we still have to use
[\\][\$] in regex in order to find '$' in text.
Given the regex 'r.*r\b' and a string
"abc regular something cursor currently"
we expect to find "regular something cursor".
But while searching we may be confronted with input
"regular something cursor curr"
and so the searched string would be seen longer.
Testcase without this patch:
1.) open de/Additional.lyx
2.) goto 6.1 Astronomy & Astrophysics
3.) open the index
4.) find advaced
a.) not ignoring format
b.) regex = .+
c.) language of regex: English
4.) search next
The seach finds the next break (which is outside of the index)
The following try to display the selection leads to crash
This change is valid for findadv too.
Patterns like '.*' now are greedy, like it is normal in regex
Searching for whole words is corrected, but can be slow.
One can speed up the search with adapted pattern.
So for instance searching for words starting and ending with 'r'
the normal pattern is 'r.*r'. The speed-up pattern could be
'\br[^\s]*r\b'. This halves the search time.
Search results are now different to that of lyx2.3, because the greedy
'.*' is now really greedy.
To achive the same results, we have to use '.*?' instead.
A try to decrement the number of tests for a match.
Also a try to handle Hebrew documents. Unfortunatelly
the latex output is missing the language specification
(only the change of encoding is available there).
I failed to find a proper place to add the lang.
That means, searching for e.g. English text in Hebrew documents
is not satisfying.
The needed time to find a simple string dependes on the
paragraph length was O(n^2)
Now it is down to O(n).
Before:
To determine if the pattern matches we compared the
paragraph from current position to the its end.
Increment current position if no match
Now:
Check if the character at current position has at least
the needed features (text, color, language etc)
If not, Increment current position
else proceed as before
Also added missing math env alignat
Modified handling of longtable/tabular
Added a routine to count for valid chars. This is needed
for detection of word boundaries.
Due to detection conflicts
regex '.*' vs match of word-boundaries in MatchStringAdv::operator()
we need to use '\b' in regex explicitly. E.g. '\b.*\b'
The backward search works, but
1.) only in current paragraph (this is the same as before)
2.) only in the same language environment.
1.) Added \textmd to be ignored (sometimes it is used and sometimes not)
2.) Typo: multiline --> multline. Searching in 'multline' caused a crash
because processing all of the '{' and '}' in the content of this math
exceeded the size of the interval field.
1.) Handle some unclosed parentheses
Sometimes \shortcut is not correctly closed
2.) Added \ldots as known char
3.) Discard some shapes (circlepar, droppar, ...)
4.) Omit resulting empty string and use some value
which cannot be matched instead
That way we do not match the whole table but only the cell contents.
The problem I had was
1.) Document language Spanish
2.) Table (copied from English doc) => language English
3.) All cell contents Spanish
Now search for English text led to a selection of the whole
table, although there was no English content in any cell.
The problem was, that the different list ennvironments
did not look different in tha latex output used for
search.
So the input of "\item ..." did not give information
if it is description, lyxlist, enumeration or labeling.
In search modus we use now "\item{enumeration}" etc.
Exception: findadv-21, but it is not a regression,
because this one never passed.
The problem here is, that we cannot differentiate
between enumeration, itemize, description and labeling
environment here.
Now tests findadv-01 ... findadv-20 pass too.
keytest.py: Expanded time for controll keys (like \[Return])
findadv*: expanded time for normal keys
lyxfind.cpp: Handle math equations
As it is now, searching with format needs ALL the features set
in order to match the pattern.
What needs to be done is a GUI specifying which of the features are
important.
1.) language
2.) font (series, shape)
3.) markup, underline, strikeout
4.) color
Having this info, the implementation is easy. Set
some variables and be done
Further normalize the latex input in case of enabled format search.
It was not enough to split the latex input on \foreignlanguage and \textcolor
macros only.
Instead also macros like \textt, or \noun etc had to be accounted for.
This patch uses therefore a different algorithm.
In the latexified text:
* Check and handle contained regex properly
* Discard superfluos '{' preventing our search engine
to match with the search pattern
Our findadv expects something like
prefix + 'search'
so that the regex (which is latexified too)
can work on 'search'
(In the source, the prefix is denoted by lead_as_string)
The latex output contains structs like
\foreignlaguage(abc}{xx\textbf{boldxx\textcolor{blue}{blue 1 blue 2} XX}}
which would never match the simple prefix.
Now the above is converted to
\foreignlaguage(abc}{xx}\\
\foreignlaguage(abc}{\textbf{boldxx}}
\foreignlaguage(abc}{\textbf{\textcolor{blue}{blue 1 blue 2}}}\\
\foreignlaguage(abc}{\textbf{ XX}}
Of course, more than one language or color in an inset can be searched for now.
Modified language handling
Still, there are problems, because sometimes the search pattern
does not contain the the requested info. So the 'find' often fails
for strings inside a list environment.