1.) Fill the 'head'-member to easier recognize the macro. May be discarded
later, although it does not take too much run-time
2.) Add some comment
3.) Ignore any macro inside the regex.
1.) Make sure the environment is mentioned in the string for search
(Added the keyword \latexenvironment{...})
2.) Handle it similar to \textcolor{}
That way we can also search for 'conclusion*' or 'summary' etc
in Additional.lyx.
Some macros need:
1.) Take care of case sensitivity
2.) Better handling of used argument values
3.) Cleaner list-environment search
4.) Remove superfluous '~' if searching for description or labeling env
Sometime it happen that the selection contains area which was skipped
in splitOnKnownMacros().
So we check, if a shorter selection would give the same mach size.
Also
a.) try to speed up regex search using non-greedy mode (.*?)
b.) remove '\n' completely in searched strings if it is not surrounded with
aplanumerical chars
The problem here is, that selecting any subset of a \lettrine{}
line always creates an initials header. That makes it impossible
to our search engine to find strings, because the regex does not
contain that info. So we have to discard the leading \lettrine part
completely.
We place now a marker (\endarguments) to determine that removable
part.
This is not possible for '$', because of the latex-meaning to
start/end math inset.
Therefore, if not ignoring format, we still have to use
[\\][\$] in regex in order to find '$' in text.
Given the regex 'r.*r\b' and a string
"abc regular something cursor currently"
we expect to find "regular something cursor".
But while searching we may be confronted with input
"regular something cursor curr"
and so the searched string would be seen longer.
Testcase without this patch:
1.) open de/Additional.lyx
2.) goto 6.1 Astronomy & Astrophysics
3.) open the index
4.) find advaced
a.) not ignoring format
b.) regex = .+
c.) language of regex: English
4.) search next
The seach finds the next break (which is outside of the index)
The following try to display the selection leads to crash
This change is valid for findadv too.
Patterns like '.*' now are greedy, like it is normal in regex
Searching for whole words is corrected, but can be slow.
One can speed up the search with adapted pattern.
So for instance searching for words starting and ending with 'r'
the normal pattern is 'r.*r'. The speed-up pattern could be
'\br[^\s]*r\b'. This halves the search time.
Search results are now different to that of lyx2.3, because the greedy
'.*' is now really greedy.
To achive the same results, we have to use '.*?' instead.
A try to decrement the number of tests for a match.
Also a try to handle Hebrew documents. Unfortunatelly
the latex output is missing the language specification
(only the change of encoding is available there).
I failed to find a proper place to add the lang.
That means, searching for e.g. English text in Hebrew documents
is not satisfying.
The needed time to find a simple string dependes on the
paragraph length was O(n^2)
Now it is down to O(n).
Before:
To determine if the pattern matches we compared the
paragraph from current position to the its end.
Increment current position if no match
Now:
Check if the character at current position has at least
the needed features (text, color, language etc)
If not, Increment current position
else proceed as before
Also added missing math env alignat
Modified handling of longtable/tabular
Added a routine to count for valid chars. This is needed
for detection of word boundaries.
Due to detection conflicts
regex '.*' vs match of word-boundaries in MatchStringAdv::operator()
we need to use '\b' in regex explicitly. E.g. '\b.*\b'
The backward search works, but
1.) only in current paragraph (this is the same as before)
2.) only in the same language environment.
1.) Added \textmd to be ignored (sometimes it is used and sometimes not)
2.) Typo: multiline --> multline. Searching in 'multline' caused a crash
because processing all of the '{' and '}' in the content of this math
exceeded the size of the interval field.
1.) Handle some unclosed parentheses
Sometimes \shortcut is not correctly closed
2.) Added \ldots as known char
3.) Discard some shapes (circlepar, droppar, ...)
4.) Omit resulting empty string and use some value
which cannot be matched instead
That way we do not match the whole table but only the cell contents.
The problem I had was
1.) Document language Spanish
2.) Table (copied from English doc) => language English
3.) All cell contents Spanish
Now search for English text led to a selection of the whole
table, although there was no English content in any cell.
The problem was, that the different list ennvironments
did not look different in tha latex output used for
search.
So the input of "\item ..." did not give information
if it is description, lyxlist, enumeration or labeling.
In search modus we use now "\item{enumeration}" etc.
Exception: findadv-21, but it is not a regression,
because this one never passed.
The problem here is, that we cannot differentiate
between enumeration, itemize, description and labeling
environment here.
Now tests findadv-01 ... findadv-20 pass too.
keytest.py: Expanded time for controll keys (like \[Return])
findadv*: expanded time for normal keys
lyxfind.cpp: Handle math equations
As it is now, searching with format needs ALL the features set
in order to match the pattern.
What needs to be done is a GUI specifying which of the features are
important.
1.) language
2.) font (series, shape)
3.) markup, underline, strikeout
4.) color
Having this info, the implementation is easy. Set
some variables and be done
Further normalize the latex input in case of enabled format search.
It was not enough to split the latex input on \foreignlanguage and \textcolor
macros only.
Instead also macros like \textt, or \noun etc had to be accounted for.
This patch uses therefore a different algorithm.
In the latexified text:
* Check and handle contained regex properly
* Discard superfluos '{' preventing our search engine
to match with the search pattern
Our findadv expects something like
prefix + 'search'
so that the regex (which is latexified too)
can work on 'search'
(In the source, the prefix is denoted by lead_as_string)
The latex output contains structs like
\foreignlaguage(abc}{xx\textbf{boldxx\textcolor{blue}{blue 1 blue 2} XX}}
which would never match the simple prefix.
Now the above is converted to
\foreignlaguage(abc}{xx}\\
\foreignlaguage(abc}{\textbf{boldxx}}
\foreignlaguage(abc}{\textbf{\textcolor{blue}{blue 1 blue 2}}}\\
\foreignlaguage(abc}{\textbf{ XX}}
Of course, more than one language or color in an inset can be searched for now.
Modified language handling
Still, there are problems, because sometimes the search pattern
does not contain the the requested info. So the 'find' often fails
for strings inside a list environment.
The change is significant if the search format is not disabled.
We try to analyze the pattern string first to get needed features
for the search.
We try to analyse the searched string and if it does not
contain all expected featers (color, language, char style, char decoration)
Still some problems though
* Added textsl, texttt, uline, uuline, sout, xout to the list of possible
leading strings.
* Account for correct number of open braces in regex.
Now the search works for enbled format too.
This is hopefully the last amend
Adapt the positional references in regex supplied by user
so that for instance '([a-z]+)\s\1' to find identical words in sequence
is changed to '([a-z]+)\s\2'.
This is slightly better, but still not satisfying.
Enable format search
Given the latexified string
\emph{Fox jUMps}
and using emphasized regex '\w*', we find 'Fox'. That is OK.
But the next find finds ' ', which is not OK.
In contrast, searching with '\w+', we find the correct string 'jUMps'.
If searching for instance '.+' , the found string expanded
to the end of search buffer. So we have to replace
'.' with '[^\}]'.
Also all constructs like '[^abc]' had to be changed to '[^abc\}]'
to not go behind the actual format.
There is still problem using '*', but constructs usin '+' seem to work now.
('.*' finds everything from first char in correct format
to (including) end of next format change
while '.+' find _only_ characters in correct format)
The part of code that removed space at start of paragraph have been
there forever, but its intent is unclear. For example, cutting text at
the end of a paragraph will lead to remove space at the start of this
same paragraph.
The removal of this functionality is offset by a rewrite of DEPM that
makes it more thorough.
Fixes bug #10503.
This commit does a bulk fix of incorrect annotations (comments) at the
end of namespaces.
The commit was generated by initially running clang-format, and then
from the diff of the result extracting the hunks corresponding to
fixes of namespace comments. The changes being applied and all the
results have been manually reviewed. The source code successfully
builds on macOS.
Further details on the steps below, in case they're of interest to
someone else in the future.
1. Checkout a fresh and up to date version of src/
git pull && git checkout -- src && git status src
2. Ensure there's a suitable .clang-format in place, i.e. with options
to fix the comment at the end of namespaces, including:
FixNamespaceComments: true
SpacesBeforeTrailingComments: 1
and that clang-format is >= 5.0.0, by doing e.g.:
clang-format -dump-config | grep Comments:
clang-format --version
3. Apply clang-format to the source:
clang-format -i $(find src -name "*.cpp" -or -name "*.h")
4. Create and filter out hunks related to fixing the namespace
git diff -U0 src > tmp.patch
grepdiff '^} // namespace' --output-matching=hunk tmp.patch > fix_namespace.patch
5. Filter out hunks corresponding to simple fixes into to a separate patch:
pcregrep -M -e '^diff[^\n]+\nindex[^\n]+\n--- [^\n]+\n\+\+\+ [^\n]+\n' \
-e '^@@ -[0-9]+ \+[0-9]+ @@[^\n]*\n-\}[^\n]*\n\+\}[^\n]*\n' \
fix_namespace.patch > fix_namespace_simple.patch
6. Manually review the simple patch and then apply it, after first
restoring the source.
git checkout -- src
patch -p1 < fix_namespace_simple.path
7. Manually review the (simple) changes and then stage the changes
git diff src
git add src
8. Again apply clang-format and filter out hunks related to any
remaining fixes to the namespace, this time filter with more
context. There will be fewer hunks as all the simple cases have
already been handled:
clang-format -i $(find src -name "*.cpp" -or -name "*.h")
git diff src > tmp.patch
grepdiff '^} // namespace' --output-matching=hunk tmp.patch > fix_namespace2.patch
9. Manually review/edit the resulting patch file to remove hunks for files
which need to be dealt with manually, noting the file names and
line numbers. Then restore files to as before applying clang-format
and apply the patch:
git checkout src
patch -p1 < fix_namespace2.patch
10. Manually fix the files noted in the previous step. Stage files,
review changes and commit.