Translation Difficulties (Topic)

Staredit Network > Forums > SC1 UMS Mapmaking Assistance > Topic: Translation Difficulties

Translation Difficulties

Jan 4 2020, 4:28 am
By: Kolokol

Pages: 1

Jan 4 2020, 4:28 am Kolokol Post #1

Currently, I am translating Legacy of the Confederation into Russian. I have encountered one problem with doing so, however. When playing the maps in SC BW, the Cyrillic text appears as just a bunch of unintelligible symbols. When viewing the map in SCMDraft, the Cyrillic text appears just fine. What is causing this problem, and how can I fix it? The attached file is a screenshot of the problem manifesting during the mission briefing. I am using SC BW version 1.16.1, by the way.

Attachments:

TranslationTrouble.png
Hits: 8 Size: 179.4kb

None.

Nov 16 2020, 7:05 am martosss Post #2

I can confirm I am experiencing the same issue and I've managed to solve it, but it's a rather ugly solution - in SCM Draft you have to enter all your strings in ANSI format. How you do it:

1) open notepad++(what I use to edit text, any other text editor with encoding options should work)
2) open the menu "Encoding" and make sure that "utf-8" is highlighted. That means that text that you enter will be stored in utf-8 format.
2) enter the russian text you want to display. Note that this text will be stored in UTF-8 format and is displayed to you in utf-8 format.
3) go to the menu Encoding => UTF-8 should be highlighted .. click on ANSI. This converts the utf text into ANSI. In other words, this keeps the bits 010101 the same, but display them using different rules(different encoding scheme) - displaying each byte(8 bits, a sequence of 8 boolean values( 0-s or 1-s)) as a separate character. For example the text "тест кирилица utf"(A)(utf-cyr, has 17 characters, stored as 29 bytes) should be converted to "С‚РµСЃС‚ РєРёСЂРёР»РёС†Р° utf"(B)(ANSI-29 characters, stored as 29 bytes-garbage text) ( Note that I'm not sure whether or how these two strings will be displayed on the staredit forum, but for me they look VERY differently)
4) Now copy that text (B) and paste it into SCM Draft. I renamed a firebat to the the (B) text. In game the firebat name is displayed with proper cyrillic characters - magic!

Do this for all strings that contain cyrillic. I imagine it's gonna be a long night for you.

The reason that works:
- SCM Draft doesn't understand utf-8 => if you type utf-8 characters they get stored in the map file using your local codepage settings(russian 1251, 1 byte per character), not with utf-8 rules(2 bytes per character), but with your local encoding table settings(in other words, your System locale)(that is found in Start menu => Control panel => Language => Change date, time, or number formats => Administrative => Change system locale - look at the language, then google search "locale language" and check the list to see yours... I'm using Bulgarian, so it's codepage 1251... Russian is also 1251, so it's using the same rules and will store russian characters that you type in the same way as bulgarian characters that I type. The problem is that each character is stored as 1 byte, possibly different from the 2-byte format that UTF-8 uses. Some characters maybe the same(latin characters like "utf 8" that are the same in both utf-8 format and in the 1251 codepage), but the russian part is garbled and unreadable. In end effect you store that string using only 17 bytes, because you have a string of 17 characters, each of which is translated into 1 byte using your local codepage. This happens because SCM Draft is stupid and can't read utf-8 text.
- SCM Draft understands ANSI - if you convert your text (17 characters of utf-8) into ANSI, each "russian" character is broken down into 2 bytes(because the utf-8 representation of russian has 2 bytes per character) and since there are 12 russian characters you end up with 29 bytes:
-- 17(utr-8) characters have 12 russian and 5 non-russian characters.
-- 29 ANSI characters have 12 x 2(russian) + 5 x 1(non-russian) characters
So you store 29 characters, exactly how the bytes should be stored.

Starcraft can read utf-8 and will read those 29 bytes as utf-8. Thus, it displays the proper text. However, SCM Draft doesn't and this is a problem if you want to edit text in SCM Draft(to improve formatting), since after copying it's an unreadable mess - you can only read it in game.

I've had trouble with translating maps from korean due to the same issue - text is saved in korean, but when I open the map in scm draft it shows that text not in utf-8, but in my local language(1251, which doesn't have korean and treats characters differently) and everything is garbled(well, not everything, but all korean text is). Farty1billion helped me udnerstand the issue and gave me a quick solution - this website he made to convert text.

I hope I've answered your question and shed light on the issue. it's tricky and annoying that SCM Draft doesn't support utf-8 while SCR does.

None.

Nov 16 2020, 7:55 am IlyaSnopchenko Post #3

The Curious

'bout time something was done about Unicode-compatible SCMDraft... I still can't get this feature to work. Not that I need it now, but it's rather unpleasant to have something broken.

Also, just add the font files with Cyrillic characters to the mod (LotC has one anyway). I ripped them long, long ago from a pirate translated version and they served me well over the years.

Trial and error... mostly error.

Nov 16 2020, 4:58 pm martosss Post #4

Ilya, we're talking about normal SC, not mods, but the situation should be identical for mods. Up to now i've been able to solve all my issues:
1) extract korean text from scm draft(encoding EUC-KOR) - this is a bit more tricky. You need to:
1.1) open notepad and set encoding to your own own scheme(in my case Bulgarian 1251)
1.2) copy foreign text(in my case korean) from SCM Draft - since it's not using my computer's codepage, what you'll be copying is garbage text, but don't worry, we'll decrypt it soon.
1.3) paste foreign text from SCM draft into notepad++ - 0-s and 1-s are transfered exactly as you copied them and displayed the same way you see them in scm draft - as complete garbage. The important part is that the underlying bits(010010101) stay the same. Now all that remains is to decrypt them into something that we can understand(and translte with google)
1.4) In notepad, go to the Encoding menu and select the codepage that the text originates from, in my case I choose EUC-KOR(949 if i remember correctly), because the text i'm trying to copy is korean. And voilla - the seemingly garbage text should appear as perfectly normal korean.

2) Importing foreign text into scm draft:
2.1) in notepad++ choose Encoding utf-8(that's the default)
2.2) type your text.
2.3) in notepad++, go to Encoding and change it from utf-8 to your operating system's locale, in my case it's Bulgarian(1251). This will produce garbage, but it's the kind of garbage that Starcraft understands.
2.4) Copy this garbage and paste it into SCM Draft. SCM Draft views each character using the same ecnoding scheme that your computer has(in my case Bulgarian 1251) and stores them, and since they're coming from the same encoding scheme, the information is stored in the same way in SCM Draft as it was in notepad++. That means when you save your map, the file that is created will have the same 0-s and 1-s that notepad stores in the txt file. Those 0-s and 1-s are displayed as Russian/Bulgarian if you use utf-8, but are displayed as garbage if you instead treat them as coming from Bulgarian/Russian encoding(1251).
In the end, you transfered utf-8 text from notepad into SCM Draft, keeping the bits unchanged. SCM-Draft will save it properly to the map and Starcraft will display it, as Starcraft understands utf-8. The only problem is that SCM-Draft can't understand utf-8 so it will display that text as garbage, because it's not using utf-8 to display it, it uses your local encoding page. Only Latin characters/numerals will be displayed correctly, since they're stored in the same way in all codepages(can't go wrong there).
So as an alternative what you can do is type Russian/Bulgarian using the latin characters - eto taka HanpuMep

the text will seem(more or less) to be Russian/Bulgarian, but will in fact be utf-8 latin. The problem is when there are characters that latin doesn't have... in which case you have to find a smart way to display them.
Good luck and I hope you can fix your problem, I'm open for questions if you need help understanding/solving some problem with encoding.

None.

Nov 16 2020, 7:54 pm IlyaSnopchenko Post #5

The Curious

I had no idea there was a way of using Cyrillic text in Classic without modifications. All I was ever getting was buckwheat unless I had the appropriate font files, huh.

Live and learn, they say.

Trial and error... mostly error.

Nov 16 2020, 8:46 pm martosss Post #6

Oh, you might be right, this approach works for Remastered since it supports utf-8, but I'm not sure if 1.16.1 supports utf-8. If it doesn't support it(which is my suspicion), then that approach won't work, you need to test it for 1.16.1. I only have SCR.

None.

Options

Pages: 1

Back to forum
Please log in to reply to this topic or to report it.
Members in this topic: None.

Global Shoutbox

[04:47 am]