Fix a few edge-cases which in the lyx2lyx conversion to format 249

(multi-encoding -> utf8); specifically, the language was being          
incorrectly identified in certain insets, which of course led to
encoding problems.
This fixes part of bug 3613 (http://bugzilla.lyx.org/show_bug.cgi?id=3613).


git-svn-id: svn://svn.lyx.org/lyx/lyx-devel/trunk@19151 a592a061-630c-0410-9148-cb99ea01b6c8
This commit is contained in:
Dov Feldstern 2007-07-20 02:10:28 +00:00
parent 6a8b25ba51
commit 11441c8560
3 changed files with 42 additions and 1 deletions

View File

@ -51,6 +51,26 @@ developers mailing list, we do have some possible solutions for this.
The effects of this will be more pronounced for RTL (Hebrew, Arabic, Farsi) The effects of this will be more pronounced for RTL (Hebrew, Arabic, Farsi)
users --- though they affect users of other languages as well. users --- though they affect users of other languages as well.
- Inset encodings and Conversion from earlier LyX versions
One of the main new features in version 1.5.0 is Unicode. As part of the
transition, lyx2lyx (the scripts used for converting back and forth between
different versions of the lyx files) converts old .lyx files, which may use
a number of different encodings, to UTF-8. This conversion depends on
correctly identifying the language of the text. There were previously some
edge-cases (insets embedded in different-language text type scenarios) in
which the language was incorrectly identified, which caused some text to
appear incorrectly after having upgraded from older versions. This has now been
fixed. Unfortunately, however, the fix cannot be applied to files which have
already been converted past format 249. So if you have already converted
your old files (using a development version or release candidate), this fix
won't help, unless you still have the originals lying around (and haven't
yet made too many changes to the newer versions ;) ).
Generally, it is probably wise to keep a backup of the old version of your
files, at least until you are sure that the upgrade went smoothly (which it
almost always will).
Note: There may later be an updated list of known issues online at Note: There may later be an updated list of known issues online at
http://wiki.lyx.org/LyX/ReleaseNotes http://wiki.lyx.org/LyX/ReleaseNotes

View File

@ -1,6 +1,11 @@
LyX file-format changes LyX file-format changes
----------------------- -----------------------
2007-07-20 Dov Feldstern <dov@lyx.org>
* format *not* incremented; fixed format 249 conversion, so that it now
correctly deals with encodings in footnotes (part of bug 3613)
2007-06-26 Uwe Stöhr <uwestoehr@web.de> and Dov Feldstern <dov@lyx.org> 2007-06-26 Uwe Stöhr <uwestoehr@web.de> and Dov Feldstern <dov@lyx.org>
* format incremented to 276: switching exsting language 'arabic' to * format incremented to 276: switching exsting language 'arabic' to

View File

@ -246,10 +246,13 @@ document.encoding must be set to the old value (format 248) in both cases.
We do this here and not in LyX.py because it is far easier to do the We do this here and not in LyX.py because it is far easier to do the
necessary parsing in modern formats than in ancient ones. necessary parsing in modern formats than in ancient ones.
""" """
inset_types = ["Foot", "Note"]
if document.cjk_encoding != '': if document.cjk_encoding != '':
return return
encoding_stack = [document.encoding] encoding_stack = [document.encoding]
inset_stack = []
lang_re = re.compile(r"^\\lang\s(\S+)") lang_re = re.compile(r"^\\lang\s(\S+)")
inset_re = re.compile(r"^\\begin_inset\s(\S+)")
if document.inputencoding == "auto" or document.inputencoding == "default": if document.inputencoding == "auto" or document.inputencoding == "default":
for i in range(len(document.body)): for i in range(len(document.body)):
result = lang_re.match(document.body[i]) result = lang_re.match(document.body[i])
@ -264,6 +267,10 @@ necessary parsing in modern formats than in ancient ones.
encoding_stack[-1] = lang[language][3] encoding_stack[-1] = lang[language][3]
elif find_token(document.body, "\\begin_layout", i, i + 1) == i: elif find_token(document.body, "\\begin_layout", i, i + 1) == i:
document.warning("Adding nested encoding %s." % encoding_stack[-1], 3) document.warning("Adding nested encoding %s." % encoding_stack[-1], 3)
if len(inset_stack) > 0 and inset_stack[-1] in inset_types:
from lyx2lyx_lang import lang
encoding_stack.append(lang[document.language][3])
else:
encoding_stack.append(encoding_stack[-1]) encoding_stack.append(encoding_stack[-1])
elif find_token(document.body, "\\end_layout", i, i + 1) == i: elif find_token(document.body, "\\end_layout", i, i + 1) == i:
document.warning("Removing nested encoding %s." % encoding_stack[-1], 3) document.warning("Removing nested encoding %s." % encoding_stack[-1], 3)
@ -272,6 +279,15 @@ necessary parsing in modern formats than in ancient ones.
document.warning("Malformed LyX document: Unexpected `\\end_layout'.") document.warning("Malformed LyX document: Unexpected `\\end_layout'.")
else: else:
del encoding_stack[-1] del encoding_stack[-1]
elif find_token(document.body, "\\begin_inset", i, i + 1) == i:
inset_result = inset_re.match(document.body[i])
if inset_result:
inset_type = inset_result.group(1)
inset_stack.append(inset_type)
else:
inset_stack.append("")
elif find_token(document.body, "\\end_inset", i, i + 1) == i:
del inset_stack[-1]
if encoding_stack[-1] != document.encoding: if encoding_stack[-1] != document.encoding:
if forward: if forward:
# This line has been incorrectly interpreted as if it was # This line has been incorrectly interpreted as if it was