Fix a few edge-cases which in the lyx2lyx conversion to format 249

(multi-encoding -> utf8); specifically, the language was being incorrectly identified in certain insets, which of course led to encoding problems. This fixes part of bug 3613 (http://bugzilla.lyx.org/show_bug.cgi?id=3613). git-svn-id: svn://svn.lyx.org/lyx/lyx-devel/trunk@19151 a592a061-630c-0410-9148-cb99ea01b6c8
2024-11-28 20:45:47 +00:00 · 2007-07-20 02:10:28 +00:00 · 2007-07-20 02:10:28 +00:00 · 11441c8560
commit 11441c8560
parent 6a8b25ba51
3 changed files with 42 additions and 1 deletions
--- a/20
+++ b/20
@ -51,6 +51,26 @@ developers mailing list, we do have some possible solutions for this.
 The effects of this will be more pronounced for RTL (Hebrew, Arabic, Farsi) 
 users --- though they affect users of other languages as well.

+- Inset encodings and Conversion from earlier LyX versions
+
+One of the main new features in version 1.5.0 is Unicode. As part of the
+transition, lyx2lyx (the scripts used for converting back and forth between
+different versions of the lyx files) converts old .lyx files, which may use 
+a number of different encodings, to UTF-8. This conversion depends on
+correctly identifying the language of the text. There were previously some
+edge-cases (insets embedded in different-language text type scenarios) in 
+which the language was incorrectly identified, which caused some text to 
+appear incorrectly after having upgraded from older versions. This has now been
+fixed. Unfortunately, however, the fix cannot be applied to files which have
+already been converted past format 249.  So if you have already converted 
+your old files (using a development version or release candidate), this fix
+won't help, unless you still have the originals lying around (and haven't 
+yet made too many changes to the newer versions ;) ).
+
+Generally, it is probably wise to keep a backup of the old version of your 
+files, at least until you are sure that the upgrade went smoothly (which it 
+almost always will).
+

 Note: There may later be an updated list of known issues online at
 	http://wiki.lyx.org/LyX/ReleaseNotes
--- a/development/FORMAT
+++ b/development/FORMAT
@ -1,6 +1,11 @@
 LyX file-format changes
 -----------------------

+2007-07-20 Dov Feldstern <dov@lyx.org>
+
+	* format *not* incremented; fixed format 249 conversion, so that it now
+		correctly deals with encodings in footnotes (part of bug 3613)
+
 2007-06-26 Uwe Stöhr <uwestoehr@web.de> and Dov Feldstern <dov@lyx.org>

 	* format incremented to 276: switching exsting language 'arabic' to 
--- a/lib/lyx2lyx/lyx_1_5.py
+++ b/lib/lyx2lyx/lyx_1_5.py
@ -246,10 +246,13 @@ document.encoding must be set to the old value (format 248) in both cases.
 We do this here and not in LyX.py because it is far easier to do the
 necessary parsing in modern formats than in ancient ones.
 """
+    inset_types = ["Foot", "Note"]
    if document.cjk_encoding != '':
        return
    encoding_stack = [document.encoding]
+    inset_stack = []
    lang_re = re.compile(r"^\\lang\s(\S+)")
+    inset_re = re.compile(r"^\\begin_inset\s(\S+)")
    if document.inputencoding == "auto" or document.inputencoding == "default":
        for i in range(len(document.body)):
            result = lang_re.match(document.body[i])
@ -264,6 +267,10 @@ necessary parsing in modern formats than in ancient ones.
                    encoding_stack[-1] = lang[language][3]
            elif find_token(document.body, "\\begin_layout", i, i + 1) == i:
                document.warning("Adding nested encoding %s." % encoding_stack[-1], 3)
+                if len(inset_stack) > 0 and inset_stack[-1] in inset_types:
+                    from lyx2lyx_lang import lang
+                    encoding_stack.append(lang[document.language][3])
+                else:
                    encoding_stack.append(encoding_stack[-1])
            elif find_token(document.body, "\\end_layout", i, i + 1) == i:
                document.warning("Removing nested encoding %s." % encoding_stack[-1], 3)
@ -272,6 +279,15 @@ necessary parsing in modern formats than in ancient ones.
                    document.warning("Malformed LyX document: Unexpected `\\end_layout'.")
                else:
                    del encoding_stack[-1]
+            elif find_token(document.body, "\\begin_inset", i, i + 1) == i:
+                inset_result = inset_re.match(document.body[i])
+                if inset_result:
+                    inset_type = inset_result.group(1)
+                    inset_stack.append(inset_type)
+                else: 
+                    inset_stack.append("")
+            elif find_token(document.body, "\\end_inset", i, i + 1) == i:
+                del inset_stack[-1]
            if encoding_stack[-1] != document.encoding:
                if forward:
                    # This line has been incorrectly interpreted as if it was