Commit Graph

391 Commits

Author SHA1 Message Date
Kornel Benko
8c67cb8c3a Amend f500a287 (FindAdv: Try to make regex search with format enabled somehow faster)
Remove 1 out of range access,
Estimate the search result of regular expression for further processing
2021-01-02 18:44:40 +01:00
Kornel Benko
f500a287d4 FindAdv: Try to make regex search with format enabled somehow faster 2021-01-01 21:53:07 +01:00
Kornel Benko
4e9dc856e4 FindAdv: Added handling for \w' in regex using non-ascii chars
Also fix some 'out of range accesses' (causing crash in debug-glibc-mode)
2020-12-31 17:00:49 +01:00
Kornel Benko
f7772849b9 FindAdv: Let lyx use QRegularExpression if available
This regex handling is part of QT5. For lyx which uses QT4
findafv will still work, but is not good for caseinsensitive matchings
in handling non ASCII characters
2020-12-30 13:00:03 +01:00
Kornel Benko
5a192d28f0 FindAdv: fix converting unicode-point to utf-8
I misinterpreted the unicode creation 'u8"\uF00xx"'.
The C++-compiler saw 'u8"\uF00x" "x"', but this was not intended.

The routine which mimicked is doing the right job now.
2020-12-29 09:59:44 +01:00
Kornel Benko
ab7ac800dc FindAdv: Allow compilation with c++20 2020-12-28 16:45:02 +01:00
Kornel Benko
c7bc46d707 Amend 3736bee4: Forgot to set the cassensitivity flag in regex statement 2020-12-27 12:16:37 +01:00
Kornel Benko
3736bee4b7 FindAdv: Use stdregex to handle case-insensitivity (if regex is used)
For search we used to lowercase for everything, but sonce the regex itself
should be left unchanged, this change was needed.
Works nice with ASCII, but fails miserably on on other UTF8 points (like Cyrillic chars)
2020-12-27 12:01:23 +01:00
Kornel Benko
b3d4271e78 Adv-Find: Try to use some more unicode chars (instead of latex macros) 2020-12-15 18:08:02 +01:00
Kornel Benko
d384136ff9 Find-Adv: A try to handle cyrillic chars also in regexp-mode 2020-12-14 20:43:39 +01:00
Kornel Benko
2d2e2f1c6d Adv-Search: Use some free unicodes as replacement for searched spaces
Without this, it is difficult to find backslashed macros if in regexp-mode.
2020-12-10 11:36:09 +01:00
Yuriy Skalko
c0a5987181 Better naming for enums 2020-12-01 00:46:21 +02:00
Yuriy Skalko
ecf62a8f21 Refactor OutputParams
Now all Inset hierarchy uses OutputParams.h included in Inset.h.
Forward declare some enums to reduce header dependencies.
2020-11-30 13:05:03 +02:00
Yuriy Skalko
7779316e6c Include standard <regex> 2020-11-29 15:27:28 +02:00
Yuriy Skalko
8cb728c2d7 Constify 2020-11-27 12:16:45 +02:00
Kornel Benko
be50eb507f Adv-Find: Add handling for \cdot (at least if using format-search 2020-11-23 21:55:13 +01:00
Yuriy Skalko
81a5e7927b Use default member initialization 2020-11-01 22:25:03 +02:00
Yuriy Skalko
196d9caeb0 Clean includes using the output of iwyu tool 2020-10-20 11:38:55 +03:00
Yuriy Skalko
d25c10ed81 Remove duplicate and unused header includes in .cpp files 2020-10-19 18:01:11 +03:00
Yuriy Skalko
919a06718a Constify 2020-10-12 15:06:16 +02:00
Yuriy Skalko
7d38a4d126 Loop refactoring 2020-10-09 09:04:20 +03:00
Yuriy Skalko
fe85162a29 Refactoring 2020-10-05 14:55:00 +02:00
Yuriy Skalko
715b8cda54 Refactoring based on cppcheck suggestions 2020-10-03 13:39:51 +02:00
Richard Kimberly Heck
17f9d6b192 Fix warnings 2020-08-23 05:48:15 -04:00
Kornel Benko
9da4390a9b FindAdv: Correct next test (keytest/findadv-16)
Provided that the LASSERT in src/mathed/InsetMathGrid.cpp:1824
is removed.
2020-05-29 20:04:57 +02:00
Kornel Benko
8028dce129 FindAdv: Correct some testcases 2020-05-29 14:22:34 +02:00
Kornel Benko
03ad03320e Findadv: Convert some messages from LYXERR0(...) to LYXERR(Debug::INFO, ...)
Due to a hint from Scott:
> I don't think it should be printed to LYXERR0. That goes to the terminal
> for the user. Even our lyx2lyx tests fail if they detect any information
> printed to the terminal (although we escape some warnings). If "find" is
> not a good debug level for it, maybe we can put it in "info" (the first
> debug level)?
2020-05-29 08:44:56 +02:00
Kornel Benko
b7cac34d96 Findadv: 2 more possible out of range access cases corrected 2020-05-26 18:49:50 +02:00
Kornel Benko
8dd2ac7171 Findadv: Do not use out of range index into a string
Thanks Scott. Crashing if using _GLIBCXX_DEBUG preprocessor setting
2020-05-26 15:58:23 +02:00
Richard Kimberly Heck
c506f304bc Fix a number of issues that were stopping compilation with MSVC 19.
Patch from Thibaut Cuvelier, modified slightly by me (mostly for style).
2020-05-04 19:45:58 -04:00
Kornel Benko
80cd116805 Fix indentation 2020-04-07 11:47:08 +02:00
Stephan Witt
225de7830e Remove useless assignment to a local variables never read later. 2020-02-18 08:55:00 +01:00
Kornel Benko
ae7a7fa882 Adv search: fix handling of multiple params of a latex command
Fix the case of possibly nested parentheses
2020-01-03 13:11:47 +01:00
Kornel Benko
49aaf95894 Fix handling of doRemove in advanced search
Amend 11c47ddf
2020-01-01 14:03:21 +01:00
Kornel Benko
48c7d9b028 Do not search in deleted text in change tracking mode 2019-12-29 17:42:18 +01:00
Jean-Marc Lasgouttes
c73d397d32 Do not use same name for members and arguments
Spotted by cppcheck.
2019-10-27 00:06:54 +02:00
Jean-Marc Lasgouttes
714113655a Follow some of the performance advice from cppcheck
Most of that is changing string to string const &.
2019-09-13 16:23:49 +02:00
Kornel Benko
8acbcebf11 Findadv: Add some missing accents.
They are defined in lib/unicodesymols, but were not handled yet.
2019-07-30 15:21:56 +02:00
Kornel Benko
ebc7105c36 FindAdv: Cosmetics
Remove parentheses from return statements,
add '_' to private members
2019-03-21 12:58:16 +01:00
Kornel Benko
e55244ccd8 FindAdv: Added remaining accents(2) dgrave, textdoublegrave, rcap, textroundcap 2019-03-20 23:22:29 +01:00
Jean-Marc Lasgouttes
1c755fefa5 Initialize hasTitle in Intervall constructor
I also moved around some things while I was at it.

Spotted by coverity scan.
2019-03-20 17:26:56 +01:00
Kornel Benko
d7354a1a09 FindAdv: Polishing
1.) Use vector for borders, because any value may be too small
  if there are plenty of accented characters in a paragraph
2.) use '[\S]' instead of '.' in regex for 'accre'. The regex would
  otherwise find also patterns like '\ {some text}'
2019-03-18 18:28:49 +01:00
Kornel Benko
9e825d5035 FindAdv: Added remaining accents cedilla, subring, subhat subtilde 2019-03-18 12:59:40 +01:00
Kornel Benko
13b3808aa0 FindAdv: Casting to satisfy Windows compiler
Thanks to Jean-Marc Lasgouttes
2019-03-18 09:38:34 +01:00
Kornel Benko
bf4394e282 FindAdv: Expand the list of handled chars for ogonek 2019-03-17 13:06:56 +01:00
Kornel Benko
9a1a806b60 FindAdv: Correct start of search if not using regex
Do not try to find pattern inside the leading string.
2019-03-16 11:26:20 +01:00
Kornel Benko
8e7c427c7c Amend 7ac04a2b: Count and display number of replaced strings in FindAdv
We have to know if the previous call to search was a single replace or not,
so that we can correctly initialize the numer of replaed strings.
2019-03-16 08:17:09 +01:00
Kornel Benko
4eacc492a3 Typo 2019-03-13 14:14:35 +01:00
Kornel Benko
7ac04a2b75 Fix #11505. Count and display number of replaced strings in FindAdv 2019-03-13 14:06:18 +01:00
Kornel Benko
c041439c51 FindAdv: Special handling for \dot{i} and 'ß'
Different behaviour in regexp{..} for 'İ' and 'ß':
1.) lowercase routine for 'İ' gives 'İ', so that if we are searching
  while ignoring case, the string '\dot{I}' is converted to '\dot{i}'.
  In this case we have to change it to 'İ' (instead of 'i', as one would expect).

2.) If 'ß' is inserted via keybord on fresh created regexp box it appears as \lyxmathsym{ß},
  if pasted from the lyx-screen it appears as \text{ß}
2019-03-10 00:29:56 +01:00
Kornel Benko
f848183fa8 FindAdv: Expand the list of handled chars for dot below and ring above 2019-03-08 22:44:00 +01:00
Kornel Benko
b702eda4ed FindAdv: Amend cd4ae51f
Prevent to match only part of a macro.
For instance, we want find '\imath' but not '\imathxxxx'
while checking for accents.
2019-03-04 14:37:10 +01:00
Kornel Benko
cd4ae51f77 FindAdv: Amend b21c8b21: Expand the list for handled latin characters
1.) Added for 'breve' and 'grave' accents
2.) Corrected handling for 'i'-accents (allowed \hat{i} _and_ \hat{\imath})
	because of problems with ignoring case
3.) Spaces: Changed some indents in source
2019-03-04 14:05:44 +01:00
Kornel Benko
99bacf006e FindAdv: Handle some more accented latin characters.
Also try to use UTF8 encoded chars instead of their
latex equivalent if possible.
2019-03-03 14:08:27 +01:00
Kornel Benko
b21c8b214d FindAdv: Expand the list for handled latin characters 2019-03-02 22:00:20 +01:00
Kornel Benko
3541a49db4 FindAdv: Try to add the possibility to search for accented characters in regex
The problem is the handling of regex as using math-mode. That is
any accented character is converted to a math macro.
For instance "ä" --> "\\ddot{a}".
Outside of math or regex it is not converted (if used xetex flavour),
but there are other chars which are converted in math and in text (but differently)
For instance "ů"
	in math --> "\\mathring{u}"
	in text --> "\\r{u}"

TODO: determine the still not handled conversions.
It would be nice, if we could persuade math factory to not convert
these characters, but I was unable to find the place where the
conversion actually takes place.
2019-03-02 15:42:38 +01:00
Kornel Benko
9d6b71c6b3 FindAdv: Use isAlnumASCII() instead of std::isalnum()
Thanks Jean-Marc
2019-02-28 13:00:12 +01:00
Kornel Benko
2c5c397afa Amend aaffcd0b: Remove some remnants ... 2019-02-27 10:33:25 +01:00
Kornel Benko
aaffcd0b39 FindAdv: Do not use data from included listing if in search mode
Fixes #11496 	"Find and replace (advanced)" is too slow
2019-02-27 10:17:56 +01:00
Kornel Benko
b1f93e0982 FindAdv: Try to use a better algorithm to find begin of a searched string 2019-02-26 23:00:31 +01:00
Kornel Benko
babb291ef3 FindAdv: Partially revert e69f7022
The slowness returns, but the search works again
2019-02-26 13:24:36 +01:00
Kornel Benko
e69f702275 FindAdv: Fix #11496 -- too slow find
Also added some more macros to handle
2019-02-25 12:12:19 +01:00
Kornel Benko
6a6b670bbd FindAdv: Correctly match '\[' and '\]' in regular expressions with format enabled
We have to check for instances of '{[}' and '{]}' and
omit removing the enclosing parentheses
2019-02-23 13:11:34 +01:00
Kornel Benko
44a06adb6c FindAdv: debug info
1.) Fill the 'head'-member to easier recognize the macro. May be discarded
 later, although it does not take too much run-time
2.) Add some comment
3.) Ignore any macro inside the regex.
2019-02-22 13:21:23 +01:00
Kornel Benko
01fd1f7679 FindAdv: Discard \parbox, \input macros
The languge of these macros does not matter. What's more,
without removing them we may obtain wrong matching.
2019-02-21 20:32:08 +01:00
Kornel Benko
8a0db92523 FindAdv: Handle \shortcut
Essentially remove the header and handle the language inside
\shortcut{...} appropriately
2019-02-21 14:45:41 +01:00
Kornel Benko
a298fc55d9 FindAdv: Added handling for latex environments
1.) Make sure the environment is mentioned in the string for search
  (Added the keyword \latexenvironment{...})
2.) Handle it similar to \textcolor{}

That way we can also search for 'conclusion*' or 'summary' etc
in Additional.lyx.
2019-02-20 14:14:50 +01:00
Kornel Benko
0a2dda0904 Findadv: Added handling for frontmatter macros
title, subtitle, author etc.
2019-02-19 23:11:09 +01:00
Kornel Benko
96ca66d664 FindAdv: Handle more cases
Some macros need:
1.) Take care of case sensitivity
2.) Better handling of used argument values
3.) Cleaner list-environment search
4.) Remove superfluous '~' if searching for description or labeling env
2019-02-18 00:40:55 +01:00
Kornel Benko
dfbe29317d FindAdv: Even more fine tuning 2019-02-16 18:39:10 +01:00
Kornel Benko
faf7f0666f FindAdv: More fine tuning 2019-02-13 13:41:57 +01:00
Kornel Benko
8d752b68e9 FindAdv: Fine tuning 2019-02-12 14:21:14 +01:00
Kornel Benko
a47dbed6bd FindAdv: Try to find real start of found match
Sometime it happen that the selection contains area which was skipped
in splitOnKnownMacros().
So we check, if a shorter selection would give the same mach size.
2019-02-11 13:13:28 +01:00
Kornel Benko
f19bd163de FindAdv: Add handling of begin{multicols}[...][...]{...}
Also
a.) try to speed up regex search using non-greedy mode (.*?)
b.) remove '\n' completely in searched strings if it is not surrounded with
	aplanumerical chars
2019-02-10 18:00:55 +01:00
Kornel Benko
7fa1244dd8 Findadv: Add handling for powerdot macros
With format enabled, these macros were hard to process:
	lyxslide, twocolumn
2019-02-07 13:35:47 +01:00
Kornel Benko
50550a215f Findadv: Handle \lettrine{} in initials.module
The problem here is, that selecting any subset of a \lettrine{}
line always creates an initials header. That makes it impossible
to our search engine to find strings, because the regex does not
contain that info. So we have to discard the leading \lettrine part
completely.
We place now a marker (\endarguments) to determine that removable
part.
2019-02-05 08:04:47 +01:00
Kornel Benko
187b518648 FindAdv: to please cppcheck ...
Initialize class elements
Removed unused method
Added 'explicit' keyword
Optimize handling for sizes ( \tiny, \small, etc)
2018-12-18 06:53:58 +01:00
Kornel Benko
a1a7c21871 FindAdv: Amend 4276e1b0 2018-12-17 10:33:23 +01:00
Kornel Benko
4276e1b01e FindAdv: Handle also sizes of characters 2018-12-16 14:50:38 +01:00
Kornel Benko
2682170556 FindAdv: Comments 2018-12-14 19:51:24 +01:00
Kornel Benko
358626b735 FindAdv: Add handling spaces, dots, quotes ...
Treate spaces, dots and quotes as ordinary characters
Also discard length values for hspace,vspace and mspace
2018-12-13 17:12:57 +01:00
Kornel Benko
8a29bdb3d1 FindAdv: Added code, href, url and footnote to handled search formats
Remark: Inside code{} and footnote{} are the language settings ignored.
2018-12-11 17:27:50 +01:00
Kornel Benko
cd94180492 FindAdv: Simplify search for chars '&', '%', '#' and '_'
This is not possible for '$', because of the latex-meaning to
start/end math inset.
Therefore, if not ignoring format, we still have to use
[\\][\$] in regex in order to find '$' in text.
2018-12-05 13:36:43 +01:00
Kornel Benko
1cd80ff6c8 FindAdv: Eliminate a corner case in the binary search
Given the regex 'r.*r\b' and a string
"abc regular something cursor currently"
we expect to find "regular something cursor".
But while searching we may be confronted with input
"regular something cursor curr"
and so the searched string would be seen longer.
2018-11-27 19:10:27 +01:00
Kornel Benko
8549fbb326 FindAdv: Avoid crash finding char at end of inset
Testcase without this patch:
1.) open de/Additional.lyx
2.) goto 6.1 Astronomy & Astrophysics
3.) open the index
4.) find advaced
	a.) not ignoring format
	b.) regex = .+
	c.) language of regex: English
	4.) search next
The seach finds the next break (which is outside of the index)
The following try to display the selection leads to crash
2018-11-26 12:37:18 +01:00
Kornel Benko
578a4b6fb0 find: This change was not intended, amend e96a9d6329 2018-11-25 18:25:14 +01:00
Kornel Benko
e96a9d6329 Find: Use greedy behaviour
This change is valid for findadv too.
Patterns like '.*' now are greedy, like it is normal in regex
Searching for whole words is corrected, but can be slow.
One can speed up the search with adapted pattern.
So for instance searching for words starting and ending with 'r'
the normal pattern is 'r.*r'. The speed-up pattern could be
'\br[^\s]*r\b'. This halves the search time.

Search results are now different to that of lyx2.3, because the greedy
'.*' is now really greedy.
To achive the same results, we have to use '.*?' instead.
2018-11-25 17:51:20 +01:00
Jean-Marc Lasgouttes
e2a3dd1167 Fix compilation with msvc 2015
Without this, the compiler does not know whether 0 is a size_t or char
const *.
2018-11-24 19:17:31 +01:00
Kornel Benko
e9e3c50c65 FindAdv: Optimization
A try to decrement the number of tests for a match.

Also a try to handle Hebrew documents. Unfortunatelly
the latex output is missing the language specification
(only the change of encoding is available there).
I failed to find a proper place to add the lang.
That means, searching for e.g. English text in Hebrew documents
is not satisfying.
2018-11-20 14:36:11 +01:00
Kornel Benko
17ee4cafb1 FindAdv: Enable search for different languages in Korean documents too
The problem here was that for european languages only the encoding
was visible in latex output. Now also the language is provided.
2018-11-18 10:40:42 +01:00
Kornel Benko
0964ffb266 FindAdv: Remove left over comment character
Sometimes language spec starts with "% ". This happens in Japaneese documents
containig English text at start of paragraph.
2018-11-16 12:12:06 +01:00
Kornel Benko
06c05430d9 FindAdv: Added lyx-function search-ignore
Enable/disable ignoring the specified type
	language: e.g. british, slovak, latin, ...
	color:	blue, red, ...
	sectioning: part, chapter, ..
	font:
		series: bold, ...
		shape: upright, italic, slanted
		family: serif, monospace ...
	markup: enphasize, noun
	underline:
	strike:

Examples:
	search-ignore language true
	search-ignore shape true
2018-11-15 14:20:50 +01:00
Kornel Benko
702c495e98 FindAdv: Significantly increase the search speed
The needed time to find a simple string dependes on the
paragraph length was O(n^2)
Now it is down to O(n).
Before:
	To determine if the pattern matches we compared the
	paragraph from current position to the its end.
	Increment current position if no match
Now:
	Check if the character at current position has at least
	the needed features (text, color, language etc)
	If not, Increment current position
	else proceed as before
2018-11-13 12:11:33 +01:00
Kornel Benko
636bb6c2d9 FindAdv: Polishing search with regex containing '.'
Also added missing math env alignat
Modified handling of longtable/tabular
Added a routine to count for valid chars. This is needed
for detection of word boundaries.

Due to detection conflicts
	regex '.*' vs match of word-boundaries in MatchStringAdv::operator()
we need to use '\b' in regex explicitly. E.g. '\b.*\b'

The backward search works, but
1.) only in current paragraph (this is the same as before)
2.) only in the same language environment.
2018-11-12 12:28:31 +01:00
Jean-Marc Lasgouttes
7055bb0098 Change IgnoreFormats to a proper class
Instantiate a global variabble holding the formats and allow to modify
it using the helper function setIgnoreFormat.
2018-11-09 16:05:09 +00:00
Kornel Benko
f5d5777a86 FindAdv: Polishing
1.) Added \textmd to be ignored (sometimes it is used and sometimes not)
2.) Typo: multiline --> multline. Searching in 'multline' caused a crash
	because processing all of the '{' and '}' in the content of this math
	exceeded the size of the interval field.
2018-11-09 13:49:05 +01:00
Kornel Benko
0c05432284 FindAdv: Polishing
1.) Handle some unclosed parentheses
	Sometimes \shortcut is not correctly closed
2.) Added \ldots as known char
3.) Discard some shapes (circlepar, droppar, ...)
4.) Omit resulting empty string and use some value
	which cannot be matched instead
2018-11-08 09:59:51 +01:00
Kornel Benko
88428123ea FindAdv: Optimize for long matches
Still, if the matched string is at a rear part of a very long
paragraph, the search is way too slow.
2018-11-07 13:14:50 +01:00
Kornel Benko
9d304c0a1d FindAdv: Discard table decorations
That way we do not match the whole table but only the cell contents.
The problem I had was
1.) Document language Spanish
2.) Table (copied from English doc) => language English
3.) All cell contents Spanish

Now search for English text led to a selection of the whole
table, although there was no English content in any cell.
2018-11-07 09:35:16 +01:00
Kornel Benko
4f1cd00b02 Findadv: Initialize the position of first unprocessed open parentheses
Not initializing caused some wrong matches.
2018-11-06 15:28:43 +01:00