Fix a few edge-cases which in the lyx2lyx conversion to format 249

(multi-encoding -> utf8); specifically, the language was being incorrectly identified in certain insets, which of course led to encoding problems. This fixes part of bug 3613 (http://bugzilla.lyx.org/show_bug.cgi?id=3613). git-svn-id: svn://svn.lyx.org/lyx/lyx-devel/trunk@19151 a592a061-630c-0410-9148-cb99ea01b6c8
2024-11-25 02:49:46 +00:00 · 2007-07-20 02:10:28 +00:00 · 2007-07-20 02:10:28 +00:00 · 11441c8560
commit 11441c8560
parent 6a8b25ba51
3 changed files with 42 additions and 1 deletions
--- a/20
+++ b/20
@ -51,6 +51,26 @@ developers mailing list, we do have some possible solutions for this.
 The effects of this will be more pronounced for RTL (Hebrew, Arabic, Farsi) 
 users --- though they affect users of other languages as well.
 - Inset encodings and Conversion from earlier LyX versions
 One of the main new features in version 1.5.0 is Unicode. As part of the
 transition, lyx2lyx (the scripts used for converting back and forth between
 different versions of the lyx files) converts old .lyx files, which may use 
 a number of different encodings, to UTF-8. This conversion depends on
 correctly identifying the language of the text. There were previously some
 edge-cases (insets embedded in different-language text type scenarios) in 
 which the language was incorrectly identified, which caused some text to 
 appear incorrectly after having upgraded from older versions. This has now been
 fixed. Unfortunately, however, the fix cannot be applied to files which have
 already been converted past format 249.  So if you have already converted 
 your old files (using a development version or release candidate), this fix
 won't help, unless you still have the originals lying around (and haven't 
 yet made too many changes to the newer versions ;) ).
 Generally, it is probably wise to keep a backup of the old version of your 
 files, at least until you are sure that the upgrade went smoothly (which it 
 almost always will).
 Note: There may later be an updated list of known issues online at
 	http://wiki.lyx.org/LyX/ReleaseNotes
--- a/development/FORMAT
+++ b/development/FORMAT
@ -1,6 +1,11 @@
 LyX file-format changes
 -----------------------
 2007-07-20 Dov Feldstern <dov@lyx.org>
 	* format *not* incremented; fixed format 249 conversion, so that it now
 		correctly deals with encodings in footnotes (part of bug 3613)
 2007-06-26 Uwe Stöhr <uwestoehr@web.de> and Dov Feldstern <dov@lyx.org>
 	* format incremented to 276: switching exsting language 'arabic' to 
--- a/lib/lyx2lyx/lyx_1_5.py
+++ b/lib/lyx2lyx/lyx_1_5.py
@ -246,10 +246,13 @@ document.encoding must be set to the old value (format 248) in both cases.
 We do this here and not in LyX.py because it is far easier to do the
 necessary parsing in modern formats than in ancient ones.
 """
    inset_types = ["Foot", "Note"]
    if document.cjk_encoding != '':
        return
    encoding_stack = [document.encoding]
    inset_stack = []
    lang_re = re.compile(r"^\\lang\s(\S+)")
    inset_re = re.compile(r"^\\begin_inset\s(\S+)")
    if document.inputencoding == "auto" or document.inputencoding == "default":
        for i in range(len(document.body)):
            result = lang_re.match(document.body[i])
@ -264,7 +267,11 @@ necessary parsing in modern formats than in ancient ones.
                    encoding_stack[-1] = lang[language][3]
            elif find_token(document.body, "\\begin_layout", i, i + 1) == i:
                document.warning("Adding nested encoding %s." % encoding_stack[-1], 3)
-                encoding_stack.append(encoding_stack[-1])
+                if len(inset_stack) > 0 and inset_stack[-1] in inset_types:
                    from lyx2lyx_lang import lang
                    encoding_stack.append(lang[document.language][3])
                else:
                    encoding_stack.append(encoding_stack[-1])
            elif find_token(document.body, "\\end_layout", i, i + 1) == i:
                document.warning("Removing nested encoding %s." % encoding_stack[-1], 3)
                if len(encoding_stack) == 1:
@ -272,6 +279,15 @@ necessary parsing in modern formats than in ancient ones.
                    document.warning("Malformed LyX document: Unexpected `\\end_layout'.")
                else:
                    del encoding_stack[-1]
            elif find_token(document.body, "\\begin_inset", i, i + 1) == i:
                inset_result = inset_re.match(document.body[i])
                if inset_result:
                    inset_type = inset_result.group(1)
                    inset_stack.append(inset_type)
                else: 
                    inset_stack.append("")
            elif find_token(document.body, "\\end_inset", i, i + 1) == i:
                del inset_stack[-1]
            if encoding_stack[-1] != document.encoding:
                if forward:
                    # This line has been incorrectly interpreted as if it was