Hard Light Productions Forums

Modding, Mission Design, and Coding => FS2 Open Coding - The Source Code Project (SCP) => Test Builds => Topic started by: m!m on November 24, 2017, 01:03:50 pm

Title: Unicode text support
Post by: m!m on November 24, 2017, 01:03:50 pm

The text encoding FSO is using has always been a bit weird. In order to actually display special characters like the German umlauts FSO needed a special font file which had support for these characters. These font files also weren't the same for all localizations which lead to a lot of problems if you wanted to play the game in German but only had the English font files. The TrueType font rendering feature improved that situation a bit by adding the ability to render UTF-8 encoded Unicode strings but FSO still assumed that every string was in the special encoding FSO used so no one could use that ability.

That is what these test builds are about. They introduce the ability to let FSO process UTF-8 encoded files and handle them properly in the entire engine.

Test builds for all platforms: http://swc.fs2downloads.com/builds/test/unicodeSupport/
Pull request: https://github.com/scp-fs2open/fs2open.github.com/pull/1416

Since this feature changes the way the engine handles text pretty extensively I chose to introduce a new "Unicode mode" for FSO. In this mode FSO expects that every text data it handles must be UTF-8 encoded Unicode strings. It also completely disables support for the old VFNT bitmap fonts.

You can enable Unicode mode by using the $Unicode mode option in the mod table. It has to appear after the location where the engine expects $Window title.

Now that you have enabled unicode mode you will probably encounter some issues. The standard retail files are all Latin1 encoded and FSO can't read that encoding anymore (it expects UTF-8). Since FSO is a nice program it will automagically detect if a file uses Latin1 encoding and then convert that data to UTF-8. Since that is not the desired data format FSO will show a warning to let you know that you should really convert the encoding.

Converting the encoding can either be done with a command line utility like iconv or a text editor like Notepad++ which has support for reencoding a file. The files you must convert from a retail install are string.tbl (this one requires a few other small changes), tstrings.tbl and weapons.tbl. The weapons table is only an issue because it uses some special characters in a comment but FSO still reads those characters and doesn't understand them.

The string table contains one entry that previously used a special character. This is the entry with the number 385. Replace the %c with © symbol to restore the old behavior. Another issue with this table is that it contains syntax errors that were not recognized by the retail parsing code. These parser errors were corrected in FSO but since we never break retail compatibility there is a workaround which fixes that for retail data but that workaround does not work with UTF-8 encoded files since it assumes that the text data is Latin-1 encoded. I already fixed these issues for my tests and uploaded those files in the test mod below.

While I was testing these changes I needed a test mod which should be a good starting point for your tests: http://www.mediafire.com/file/pw5788071fljm1m/unicodeTest.7z

Theoretically, this should also allow translations into languages with radically different characters like Japanese.

Please test these changes and let me know if you find something that breaks the new code.

Title: Re: Unicode text support
Post by: Novachen on November 24, 2017, 02:14:59 pm

Oh yes... that is very interesting for my german translations, especially for modifications that uses TTFs. I will take a look into this tomorrow.

Title: Re: Unicode text support
Post by: Novachen on November 25, 2017, 08:29:50 am

So, my first report.

With the supplemented mod everything seems to work with the german language setting in my playtests. Did not encounter an error in a playthrough of one mission.

After that, i started some further tests with a full fleshed out modification that uses some additional FSO features. So i have adapted this to my Between the Ashes: Slaves of Chaos translation.

The good news:
- The special characters are shown correctly in the menus. Even in the fiction viewer, news terminal and the sector map. It seems that the fiction viewer texts have to be converted into UTF-8 without BOM instead of pure UTF-8, otherwise there is a visible rectangle at the start of the text.

The bad news:
- After i start a mission, my game crashes after a few seconds. So i can not check if this one would work with custom and mission specific hud gauges also. The error message is not helpful in my case to see, where the actual problem is.

Quote

Error: Caught std::exception in main(): 'Not enough space'!
File: freespace.cpp
Line: 7947

ntdll.dll! ZwWaitForSingleObject + 12 bytes
KERNELBASE.dll! WaitForSingleObject + 18 bytes
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
KERNEL32.DLL! BaseThreadInitThunk + 36 bytes
ntdll.dll! RtlGetAppContainerNamedObjectPath + 253 bytes
ntdll.dll! RtlGetAppContainerNamedObjectPath + 205 bytes

In Debug Build i got also this one:

Quote

Assert: "(size_t)handle < GL_buffer_objects.size()"
File: gropengltnl.cpp
Line: 210

ntdll.dll! ZwWaitForSingleObject + 12 bytes
KERNELBASE.dll! WaitForSingleObject + 18 bytes
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>

[ This info is in the clipboard so you can paste it somewhere now ]

Use Debug to break into Debugger, Exit will close the application.

ntdll.dll! ZwWaitForSingleObject + 12 bytes
KERNELBASE.dll! WaitForSingleObject + 18 bytes
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>

Title: Re: Unicode text support
Post by: m!m on November 25, 2017, 08:43:20 am

Quote from: Novachen on November 25, 2017, 08:29:50 am

The good news:
- The special characters are shown correctly in the menus. Even in the fiction viewer, news terminal and the sector map. It seems that the fiction viewer texts have to be converted into UTF-8 without BOM instead of pure UTF-8, otherwise there is a visible rectangle at the start of the text.

FSO should handle the BOM properly. Well, I added code for handling it but apparently that doesn't work.

Quote from: Novachen on November 25, 2017, 08:29:50 am

The bad news:
- After i start a mission, my game crashes after a few seconds. So i can not check if this one would work with custom and mission specific hud gauges also. The error message is not helpful in my case to see, where the actual problem is.

Yeah, that error message isn't very useful (although I know where the exception is generated which is a start).

What changes did you make to your BtA mod for this? I have downloaded your BtA translation mod so I only need the files you changed.

Title: Re: Unicode text support
Post by: Novachen on November 25, 2017, 09:21:29 am

Quote

What changes did you make to your BtA mod for this? I have downloaded your BtA translation mod so I only need the files you changed.

I created a special character version of this translation back before the 3.8 release.

Extract that into the "FreiRaum_BtA SoC" folder.

It includes a sector map with special characters (only the sector view itself, not the invidiual planet views), one headline in the news room with special characters as well all the Hud gauges and mission specific gauges in the mission files itself with special characters.

I think the best missions to test should be "Murmeltiertag" (bta1_m3_03) and "Mit einem Knall erlöschen" (bta1_m3_04) because these show special character gauges right after the beginning of the mission. Even every one i have tried crashed.

[attachment stolen by Russian hackers]

Title: Re: Unicode text support
Post by: m!m on November 25, 2017, 09:46:10 am

That was very helpful, thank you. I found the cause of the error but I have to think about how to fix it properly.

The technical reason is that Axems message box script had an effect where the text is displayed character by character to simulate a scroll in effect. The problem with that is that sometimes it slices a mutli-byte UTF-8 sequence in half which results in an invalid UTF-8 encoding sequence. I think I will add some code to validate the strings passed to the Lua API to make the error message easier to understand. Since the Lua version we are using does not have support for handling UTF-8 encoded strings I will also add a library with functions for iterating over these UTF-8 codepoints.

Title: Re: Unicode text support
Post by: m!m on November 25, 2017, 03:35:21 pm

The issue you found in BtA should be fixed now. The builds in the first post have been updated.

Without any changes to the mod data the new builds will still show an error message but that should be a little more helpful for mod developers since it actually tells them what is wrong. I fixed Axems message box script by using some new functions that I added to the FSO scripting API. I attached the new file to this post.

What mission did have the fiction viewer issue you reported earlier? FSO should be able to handle UTF-8 files with a BOM and if that is not working then I would like to fix that.

[attachment stolen by Russian hackers]

Title: Re: Unicode text support
Post by: Novachen on November 25, 2017, 05:08:01 pm

Quote from: m!m on November 25, 2017, 03:35:21 pm

What mission did have the fiction viewer issue you reported earlier? FSO should be able to handle UTF-8 files with a BOM and if that is not working then I would like to fix that.

Convert the h01.txt (News Room Headline) and/or m03.txt (Pre-Mission Text) in the fiction folder into UTF-8 with BOM and you can see a square at the beginning of the text in the News Room (available in the lower left in the Main Menu) or alternative in the Mission "Stierkampf" (bta1_m1_03.fs2). It is not there if this file is converted into UTF-8 without BOM.

Quote

The issue you found in BtA should be fixed now. The builds in the first post have been updated.

I will check that out, especially there are some features in the missions itself i want to test out with it.

Title: Re: Unicode text support
Post by: Novachen on November 25, 2017, 07:05:05 pm

So another report.

I have tested several missions and i think that i tested every mission that use special features.
The system console in "Texas Seven" worked flawlessly as the Artillery HUD in "Murmeltiertag" and the tower defense section in "Angel".

Some missions with special HUD gauges needed only small changes in terms of entry lenghts, because the special characters need more bytes than in the original FS2. Would be interesting if proper translations of this kind of stuff is even possible in total different languages like Russian or Chinese where every character need more space by default. For languages like German this does not seem to be a big deal in general.

But i noticed a small strange error in the fiction viewer. In the first line of a text, things like the points over the characters like ä are not visible or seem to be more only half visible if you zoom in, but that is not the case in all the other lines.

I marked two examples in the screenshots. They should look the same, but they do not.

[attachment stolen by Russian hackers]

Title: Re: Unicode text support
Post by: m!m on November 26, 2017, 04:10:06 am

I fixed the BOM issue in the fiction viewer, the builds in the first post have been updated. The cause was that the fiction viewer reads the file text without going through the parse system which handled the BOM skipping for all other files.

I cannot reproduce the issue you are showing in your screenshots but he text in my fiction viewer is much larger than in your screenshot so there may be a configuration difference. The font configuration specifies a top and bottom offset so it could be that you need to adjust those values by a pixel or two since the ä points are pretty high above the text in the font BtA is using.

Title: Re: Unicode text support
Post by: karajorma on November 26, 2017, 07:27:54 am

I did attempt to try this but I got bogged down in the fact that Diaspora 1.1.2 on Knossos (or 1.1.1 via the installer + patch) both crash out as soon as you set the language to German or French for no reason I could figure out.

Title: Re: Unicode text support
Post by: Novachen on November 26, 2017, 09:17:58 am

Quote from: m!m on November 26, 2017, 04:10:06 am

I cannot reproduce the issue you are showing in your screenshots but he text in my fiction viewer is much larger than in your screenshot so there may be a configuration difference.

How can that be? Do we not use the same fonts.tbl? I did not change mine and use the same as in the attachment file.

I tried it in several resolutions, 1280x720 up to 2560x1440 (my default resolution)... the text got smaller with a higher resolution, but the issue was always the same.

But so you do not have the same issue? The first line is correct in the news article on your configuration? So you can actually read the "Schöpfer" with ö points?

Quote from: karajorma on November 26, 2017, 07:27:54 am

I did attempt to try this but I got bogged down in the fact that Diaspora 1.1.2 on Knossos (or 1.1.1 via the installer + patch) both crash out as soon as you set the language to German or French for no reason I could figure out.

I was never aware of that Diaspora have multi language files in the first place. I thought it was always a only english game that i have to translate someday in the future :D
I will take a look into this. Maybe i can find something out.

EDIT:
Checked Diaspora out. Something is wrong with the strings.tbl in the other languages. I used my translated FSPort strings.tbl for that only for testing purposes.
But it do not work, because Diaspora uses Bitmap Fonts in .vf files and no Font files in .ttf and so it is not supported by this build.

The Diaspora fonts are needed in .ttf for that.
Actually i do not know the visual difference between the font01.vf from Diaspora to the font01.vf from FreeSpace2.
But if you use the german font01.vf you can change the Game into German and it shows all corresponding special characters as they should.

Even i hope there will be a .ttf version of all used Fonts from Freespace and its mods in the future, because that would make translation much easier in all languages without any setbacks (in fiction viewer etc.)

Title: Re: Unicode text support
Post by: karajorma on November 26, 2017, 05:34:18 pm

Thanks to m|m I got the problem sorted. I'll post the files in a bit. I'm mostly interested in getting Chinese language supported at the moment but it would be really cool to have someone check over our existing French and German translations to see if they are correct.

Title: Re: Unicode text support
Post by: m!m on November 27, 2017, 04:17:39 am

Quote from: Novachen on November 26, 2017, 09:17:58 am

Quote from: m!m on November 26, 2017, 04:10:06 am
I cannot reproduce the issue you are showing in your screenshots but he text in my fiction viewer is much larger than in your screenshot so there may be a configuration difference.

How can that be? Do we not use the same fonts.tbl? I did not change mine and use the same as in the attachment file.

I tried it in several resolutions, 1280x720 up to 2560x1440 (my default resolution)... the text got smaller with a higher resolution, but the issue was always the same.

But so you do not have the same issue? The first line is correct in the news article on your configuration? So you can actually read the "Schöpfer" with ö points?

Actually, the word "Schöpfer" was in the second line for me so it was impossible for me to see your issue. I reduces the size of the newsroom font to 16 at which point I could reproduce the issue. I then changed the top offset to -3 which resolved the issue again so this seems to be an issue with the font table.

EDIT: I updated the builds again. The only major change is that the parser code now checks if a special character with the byte value -128 does not appear in the file since that character is used by FSO to detect the end of the file. This probably limits the characters that are possible to be used but there is nothing I can do about that now.

Title: Re: Unicode text support
Post by: jr2 on November 27, 2017, 09:19:26 am

Is the plan to change the way FSO checks for the end of a file in the future? I'm assuming OSes or programming languages have ways of signalling EOF without using some sort of special character in the file contents?

Title: Re: Unicode text support
Post by: m!m on November 27, 2017, 09:47:08 am

The usual C method of detecting the end of a string is the null byte but for some reason that is not done in the parse system of FSO. :v: probably had a good reason for designing the system this way but I couldn't find any information about what that reason might have been.

Title: Re: Unicode text support
Post by: karajorma on November 28, 2017, 12:57:08 am

I spent ages trying to get a working Diaspora strings.tbl but they keep on corrupting themselves. Can you take a look at the one in Diaspora SVN for me? Just commit something I can simply open in notepad++ and work on without it constantly breaking.

Title: Re: Unicode text support
Post by: m!m on November 28, 2017, 02:49:49 am

I fixed all the issues FSO reported. There were some lines in there that were completely unreadable so I replaced those with the strings from retail FSO.

Title: Re: Unicode text support
Post by: karajorma on November 28, 2017, 06:59:35 am

Cool. I can compare it against the old one to see what they should have been.

EDIT: It's still complaining that French is corrupt when I run it with the newest build posted.

Title: Re: Unicode text support
Post by: m!m on November 29, 2017, 04:06:43 am

I cannot reproduce your issue with my current development status. I pushed those changes to the test branch so the builds should be updated in ~1 hour.

Title: Re: Unicode text support
Post by: karajorma on November 29, 2017, 04:13:58 am

Okay, I'll give that a try and if it doesn't work I'll have to bite the bullet and build it myself. It would be kinda nice if we could improve the message from simply saying "The table is ****ed and so are you" though.

Title: Re: Unicode text support
Post by: m!m on November 29, 2017, 04:18:43 am

What is the error FSO is displaying? I made all the error messages I introduces as informative as possible so maybe it is one of the preexisting error messages.

Title: Re: Unicode text support
Post by: karajorma on November 29, 2017, 04:29:10 am

If I change the language in fs2_open.ini to French I get

Error: strings.tbl is corrupt
File: localize.cpp
Line: 336

And the option to debug.

As you can see that's not that informative about where the file is corrupt. And it works perfectly well with all the other languages I tried.

Title: Re: Unicode text support
Post by: m!m on November 29, 2017, 04:47:54 am

The error in the French table was caused by some misplaced quotes so it was no problem specific to the Unicode builds. To make the error messag a bit more informative I changed the error dialog so that it includes the filename and line number.

I think these changes have been tested enough to merge them into master.

Title: Re: Unicode text support
Post by: m!m on December 04, 2017, 04:23:56 am

The changes have now been merged and are available in the newest nightly.

Title: Re: Unicode text support
Post by: Nikogori on December 10, 2017, 04:19:37 am

I tried to add Japanese text into ships.tbl and encountered encoding error.
I'm using fs2_open_3_8_1_20171206_8168099_x64_SSE2-FASTDBG.exe and fs2_open_3_8_1_unicodeSupport_x64_SSE2-FASTDBG.exe.
(https://i.imgur.com/aLxyB97.png)

Actually most characters are fine. Only some of them cause problem.
(https://i.imgur.com/3coZHAE.jpg)

At first it seems completely random. However, I found a pattern when I made this list:

Code: [Select]

%E3%80%81
%E3%80%82
%E3%80%8C
%E3%80%8D
%E6%8A%80
%E3%80%80
%E8%80%90
%E6%80%A7
%E6%9C%80
%E9%80%9F
%E3%83%80

Every 6-letter UTF-8 character code which contains "80" gives encoding error.
I picked up some arbitrary characters for testing. Seemingly my hypothesis holds true.

I guess other languages (such as Chinese and Korean) will have the same problem.
I hope this information helps. Keep up the good work.

Title: Re: Unicode text support
Post by: m!m on December 10, 2017, 05:23:16 am

80 is the unsigned hexadecimal representation of -128 which unfortunately is the exact byte value FSO uses for marking the end of the file in the internal representation of the file contents. I found this issue while developing these changes (which is also why you are seeing that error) and have since fixed the issue and submitted the changes for code review: https://github.com/scp-fs2open/fs2open.github.com/pull/1529

Once that is merged you should be able to use all Unicode characters you want but until then not all Unicode characters are supported.

EDIT: The changes have been merged and should appear in the next nightly.

Title: Re: Unicode text support
Post by: Nikogori on December 10, 2017, 06:48:18 am

Quote

Once that is merged you should be able to use all Unicode characters you want but until then not all Unicode characters are supported.

EDIT: The changes have been merged and should appear in the next nightly.

Thank you. I'll have a look when nightly arrives.

Title: Re: Unicode text support
Post by: chief1983 on December 11, 2017, 12:06:12 pm

The nightly build and uploaded, but failed to post to the forums. In the meantime you can grab it here in its release folder (http://swc.fs2downloads.com/builds/nightly/20171211_91c9417/).

Title: Re: Unicode text support
Post by: Nikogori on December 12, 2017, 08:30:49 am

Yay it works! Thank you, thank you very much. This will be a great news for Japanese FreeSpace2 fandom.

(https://i.imgur.com/F8X1UyO.jpg)

There are one or two minor problems but I'll report them later.

Title: Re: Unicode text support
Post by: jr2 on December 12, 2017, 09:15:26 am

*prepares for sudden influx of Japanese anime modders* ;) This is awesome (even though I don't read Japanese, I imagine if I did, it would be much preferable to have FreeSpace in my own language).

Title: Re: Unicode text support
Post by: Nikogori on December 13, 2017, 07:46:49 am

I'd like to report a couple of problems I could find so far. I understand there's a workaround for each of these but it would be great if FS2 Open can handle them.

“Token too long” error is a little too unforgiving. There are only 14 letters in the line but the error message says “Length = 42”. I believe this is UTF-8 related problem... Some Japanese characters (including hiragana, katakana and kanji) take 3 bytes in UTF-8.

Code: [Select]

Warning: Training-2.fs2(line 832):
Warning: Token too long: [プライマリウェポンを選択しろ].  Length = 42.  Max is 31.

File: parselo.cpp
Line: 301

I don't know what “MAX_BRIEF_LINE_LEN” means but it seems Japanese briefing text is too long to display (actually they are not that long). Cutting them in half can fix this error.

(https://i.imgur.com/KdsRcOJ.png)

FS2 automatically starts a new line when it encountered space (the one which appears when you hit “space” key) in Japanese text. Obviously you can avoid this by not using space at all.

(https://i.imgur.com/QriaBrv.jpg)

Title: Re: Unicode text support
Post by: m!m on December 13, 2017, 08:52:59 am

Quote from: Nikogori on December 13, 2017, 07:46:49 am

I'd like to report a couple of problems I could find so far. I understand there's a workaround for each of these but it would be great if FS2 Open can handle them.

“Token too long” error is a little too unforgiving. There are only 14 letters in the line but the error message says “Length = 42”. I believe this is UTF-8 related problem... Some Japanese characters (including hiragana, katakana and kanji) take 3 bytes in UTF-8.

Code: [Select]
Warning: Training-2.fs2(line 832): Warning: Token too long: [プライマリウェポンを選択しろ]. Length = 42. Max is 31. File: parselo.cpp Line: 301

As you correctly stated, the issue here is that the amount of characters you see is less than the amount of bytes it takes to store the characters. Unfortunately, there is no general solution for this apart from reducing the length of the string. There are a lot of places in the parsing code where a fixed size array is used for storing a string and there simply isn't enough space to store the entire string. That was no problem before since ASCII text is stored pretty efficiently but if you need characters that are encoded using 3 or 4 bytes then you run out of space pretty fast.

There is a way of fixing this by using a variable size string but that type is sufficiently different from the current code that it doesn't work everywhere without major changes.

Quote from: Nikogori on December 13, 2017, 07:46:49 am

I don't know what “MAX_BRIEF_LINE_LEN” means but it seems Japanese briefing text is too long to display (actually they are not that long). Cutting them in half can fix this error.

<snip>

FS2 automatically starts a new line when it encountered space (the one which appears when you hit “space” key) in Japanese text. Obviously you can avoid this by not using space at all.

<snip>

FSO probably doesn't know how to properly find word separators in Japanese text since that code is only written with Western languages in mind. It is possible to fix this by using an external library (http://site.icu-project.org/) which knows how to split non-English text properly but that requires adding that library to FSO and integrating it into the code which is additional work. I hope that I can get around to that eventually but for now you probably need to help FSO a bit by adding spaces at the right places even if that makes the text look weird.

At least the first issue should be possible to fix without extensive changes to FSO (I hope) so could you upload your translation mod here so I can take a look at it?

Title: Re: Unicode text support
Post by: Nikogori on December 13, 2017, 09:56:47 am

Thank you for your quick reply.

Quote

FSO probably doesn't know how to properly find word separators in Japanese text since that code is only written with Western languages in mind. It is possible to fix this by using an external library which knows how to split non-English text properly but that requires adding that library to FSO and integrating it into the code which is additional work. I hope that I can get around to that eventually but for now you probably need to help FSO a bit by adding spaces at the right places even if that makes the text look weird.

I assume FSO recognizes Japanese text as a single huge word (because usually it contains no space). I forgot to say you can avoid this by using full-width space or non-breaking space(Alt+0160).

I'll upload my translation mod tomorrow as it is already late in Japan. Sorry.

Title: Re: Unicode text support
Post by: Nikogori on December 14, 2017, 08:24:16 am

This is my Japanese localization mod. Actually most of these files are extracted from old Japanese patch (http://stardogs.netgamers.jp/index.php?FreeSpace%20Series%2FFreeSpace%202%2F%C6%FC%CB%DC%B8%EC%B2%BD%A5%D7%A5%ED%A5%B8%A5%A7%A5%AF%A5%C8).

Download link (https://www.dropbox.com/s/5os98cc30kxih1j/FS2_JP.zip?dl=1)

Title: Re: Unicode text support
Post by: m!m on December 14, 2017, 08:34:11 am

Thanks. I will take a look at the warnings to see if I can make some of those arrays dynamic.

Title: Re: Unicode text support
Post by: Arkblade on December 22, 2017, 09:03:19 am

in japanese mod, speech are broken even if use japanese TTS voice.
it seem to japanese word are recognize Sign or something that not proper japanese.
however it may or may not japanese TTS voices bug...

Title: Re: Unicode text support
Post by: AdmiralRalwood on December 22, 2017, 06:40:29 pm

It probably needs to be converted from UTF-8 to Windows's wide character format before being passed to TTS.

Title: Re: Unicode text support
Post by: m!m on December 23, 2017, 02:58:02 am

The code does "convert" the UTF-8 data to a wchar_t array by assigning the byte values of each character to a wide character :nono:

The correct way to convert the data is to use MultiByteToWideChar.

Title: Re: Unicode text support
Post by: m!m on December 25, 2017, 03:11:35 am

The newest nightly now contains a fix to how FSO passes the speech text to the WIndows API which should fix the issue with Japanese text.

Title: Re: Unicode text support
Post by: Arkblade on December 25, 2017, 08:20:08 pm

Quote from: m!m on December 25, 2017, 03:11:35 am

The newest nightly now contains a fix to how FSO passes the speech text to the WIndows API which should fix the issue with Japanese text.

thank you!

Title: Re: Unicode text support
Post by: Arkblade on January 28, 2018, 12:41:40 pm

Have you made progress? The major problem seems like he presented.

Quote from: Nikogori on December 13, 2017, 07:46:49 am

I'd like to report a couple of problems I could find so far. I understand there's a workaround for each of these but it would be great if FS2 Open can handle them.

“Token too long” error is a little too unforgiving. There are only 14 letters in the line but the error message says “Length = 42”. I believe this is UTF-8 related problem... Some Japanese characters (including hiragana, katakana and kanji) take 3 bytes in UTF-8.

Code: [Select]
Warning: Training-2.fs2(line 832): Warning: Token too long: [プライマリウェポンを選択しろ]. Length = 42. Max is 31. File: parselo.cpp Line: 301
I don't know what “MAX_BRIEF_LINE_LEN” means but it seems Japanese briefing text is too long to display (actually they are not that long). Cutting them in half can fix this error.

(https://i.imgur.com/KdsRcOJ.png)

FS2 automatically starts a new line when it encountered space (the one which appears when you hit “space” key) in Japanese text. Obviously you can avoid this by not using space at all.

(https://i.imgur.com/QriaBrv.jpg)

Title: Re: Unicode text support
Post by: m!m on January 29, 2018, 03:07:05 am

Unfortunately I did not have time to work on resolving this issue.

Title: Re: Unicode text support
Post by: AdmiralRalwood on January 29, 2018, 12:25:16 pm

Quote from: Nikogori on December 13, 2017, 07:46:49 am

“Token too long” error is a little too unforgiving. There are only 14 letters in the line but the error message says “Length = 42”. I believe this is UTF-8 related problem... Some Japanese characters (including hiragana, katakana and kanji) take 3 bytes in UTF-8.

Code: [Select]
Warning: Training-2.fs2(line 832): Warning: Token too long: [プライマリウェポンを選択しろ]. Length = 42. Max is 31. File: parselo.cpp Line: 301

In this particular instance (assuming line 832 is in the same as in the original mission file, and/or that I've correctly retranslated your too-long token back into English), there's no real reason for objective text to be limited to NAME_LENGTH since they're dynamically allocated anyway (and the final buffer they get put into for display purposes is 256 bytes long). So, in this specific situation, the maximum allowed length could theoretically be increased without really affecting anything else. The one potential problem is that the code doesn't work right for more than two lines of text, but given that we're dealing with text that takes up more bytes without taking up significantly more width, some sort of allowance could probably be made.

Quote from: Nikogori on December 13, 2017, 07:46:49 am

FS2 automatically starts a new line when it encountered space (the one which appears when you hit “space” key) in Japanese text. Obviously you can avoid this by not using space at all.

Well, yes, if there's only one space in a long string of text, wordwrapping will generally force a linebreak there. Now, granted, I'm no Japanese expert, but surely you can just... add more spaces, so there are other, more natural places for wordwrapping to break lines? If not, well, can you maybe use a non-breaking space (U+00A0)?

Title: Re: Unicode text support
Post by: Nikogori on February 02, 2018, 05:06:12 am

Quote from: Arkblade on January 28, 2018, 12:41:40 pm

Have you made progress? The major problem seems like he presented.

As I said before, these “problems” can be easily avoided. There is workaround for each of them.

“Token too long” error is a bit annoying, though. Sometimes it is very difficult to explain mission objectives in a 10-letter text...

Quote from: AdmiralRalwood on January 29, 2018, 12:25:16 pm

Well, yes, if there's only one space in a long string of text, wordwrapping will generally force a linebreak there. Now, granted, I'm no Japanese expert, but surely you can just... add more spaces, so there are other, more natural places for wordwrapping to break lines? If not, well, can you maybe use a non-breaking space (U+00A0)?

I'd like to use existing old Japanese localization patch. Sadly it contains hundreds of “spaces” and it will take some time to remove/replace all of them.
However, if you guys believe it is not easy to let FSO handle them properly, of course I can do this task manually. I guess there are other Japanese players who want to join.

I've been busy working on my own project (I'm writing a blog about Orbiter simulator). I'll be back on this localization project as soon as my current project is finished (Hopefully it will be done by April).

Title: Re: Unicode text support
Post by: jr2 on February 03, 2018, 08:28:14 pm

Quote from: Nikogori on February 02, 2018, 05:06:12 am

Quote from: Arkblade on January 28, 2018, 12:41:40 pm
Have you made progress? The major problem seems like he presented.

As I said before, these “problems” can be easily avoided. There is workaround for each of them.

“Token too long” error is a bit annoying, though. Sometimes it is very difficult to explain mission objectives in a 10-letter text...

Quote from: AdmiralRalwood on January 29, 2018, 12:25:16 pm
Well, yes, if there's only one space in a long string of text, wordwrapping will generally force a linebreak there. Now, granted, I'm no Japanese expert, but surely you can just... add more spaces, so there are other, more natural places for wordwrapping to break lines? If not, well, can you maybe use a non-breaking space (U+00A0)?

I'd like to use existing old Japanese localization patch. Sadly it contains hundreds of “spaces” and it will take some time to remove/replace all of them.
However, if you guys believe it is not easy to let FSO handle them properly, of course I can do this task manually. I guess there are other Japanese players who want to join.

I've been busy working on my own project (I'm writing a blog about Orbiter simulator). I'll be back on this localization project as soon as my current project is finished (Hopefully it will be done by April).

Should be able to do a Find and Replace All (Use Notepad++ (https://notepad-plus-plus.org/), and do a Search > Replace > Replace All). With a bit of fiddling you should be able to find something that will work to do what you want.

Np++ should have localization for your (and most) languages.

Title: Re: Unicode text support
Post by: Nikogori on May 31, 2018, 11:42:45 am

Unicode text doesn't show up when I use recent Nightly. Is there anything I should add to mod file?
(https://i.imgur.com/YhU05zo.jpg)

I'm using Japanese localization mod (https://www.dropbox.com/s/5os98cc30kxih1j/FS2_JP.zip?dl=1) and unicodeTest mod.

I have tried some of the Nightly builds and this is the result:

fs2_open_3_8_1_20180113_830fd70_x64_SSE2.exe or older – OK
fs2_open_3_8_1_20180116_2779629_x64_SSE2.exe or newer – doesn't work

Title: Re: Unicode text support
Post by: Nikogori on May 31, 2018, 11:52:43 am

Quote from: Nikogori on December 13, 2017, 07:46:49 am

I don't know what “MAX_BRIEF_LINE_LEN” means but it seems Japanese briefing text is too long to display (actually they are not that long). Cutting them in half can fix this error.

(https://i.imgur.com/KdsRcOJ.png)

I found that I can avoid this error by starting a new line.

It seems FSO doesn't like lengthy sentence which contains no space (use of “space” is not common in most Asian languages. Usually we don't separate words in a sentence). All I have to do is separate them by adding space or CR/LF.
If you don't do this, briefing text doesn't appear properly or the game will crash immediately.

Title: Re: Unicode text support
Post by: m!m on May 31, 2018, 11:58:23 am

Quote from: Nikogori on May 31, 2018, 11:42:45 am

Unicode text doesn't show up when I use recent Nightly. Is there anything I should add to mod file?
<snip>

I'm using Japanese localization mod (https://www.dropbox.com/s/5os98cc30kxih1j/FS2_JP.zip?dl=1) and unicodeTest mod.

I have tried some of the Nightly builds and this is the result:

fs2_open_3_8_1_20180113_830fd70_x64_SSE2.exe or older – OK
fs2_open_3_8_1_20180116_2779629_x64_SSE2.exe or newer – doesn't work

I can not reproduce this issue with the latest master version (which should also be the newest nightly). Please post your fs2_open.log file. Instructions on how to do this can be found in this post.

Title: Re: Unicode text support
Post by: Nikogori on May 31, 2018, 03:29:22 pm

Attached fs2_open.log is created when I encountered the problem above.

That being said, I believe I found a solution... Because everything is working fine now (with the latest Nightly). :confused:

This is all I did:

delete cache folder from mod folder.
run the game with "Run in window" option.
Unicode text is back!

I guess I did something wrong. Sorry.

[attachment stolen by Russian hackers]

Title: Re: Unicode text support
Post by: m!m on May 31, 2018, 03:41:27 pm

That is very odd. Maybe there was something wrong with the shaders in the cache folder but "Run in window" should have no effect on that. Does the text disappear if you remove that option again?

Title: Re: Unicode text support
Post by: Nikogori on June 01, 2018, 02:24:52 pm

delete cache folder
run the game with debug build ("Run in window" is on)

This is what seems to be needed to use Japanese text mod. Maybe "Run in window" is irrelevant?

Once Unicode text is back, everything works fine without debug build or "Run in window" option.
However, if I delete cache folder one more time, Unicode text will disappear again...