Author Topic: Unicode text support  (Read 810 times)

0 Members and 1 Guest are viewing this topic.

Offline m!m

  • 210
The text encoding FSO is using has always been a bit weird. In order to actually display special characters like the German umlauts FSO needed a special font file which had support for these characters. These font files also weren't the same for all localizations which lead to a lot of problems if you wanted to play the game in German but only had the English font files. The TrueType font rendering feature improved that situation a bit by adding the ability to render UTF-8 encoded Unicode strings but FSO still assumed that every string was in the special encoding FSO used so no one could use that ability.

That is what these test builds are about. They introduce the ability to let FSO process UTF-8 encoded files and handle them properly in the entire engine.

Test builds for all platforms: http://swc.fs2downloads.com/builds/test/unicodeSupport/
Pull request: https://github.com/scp-fs2open/fs2open.github.com/pull/1416

Since this feature changes the way the engine handles text pretty extensively I chose to introduce a new "Unicode mode" for FSO. In this mode FSO expects that every text data it handles must be UTF-8 encoded Unicode strings. It also completely disables support for the old VFNT bitmap fonts.

You can enable Unicode mode by using the $Unicode mode option in the mod table. It has to appear after the location where the engine expects $Window title.

Now that you have enabled unicode mode you will probably encounter some issues. The standard retail files are all Latin1 encoded and FSO can't read that encoding anymore (it expects UTF-8). Since FSO is a nice program it will automagically detect if a file uses Latin1 encoding and then convert that data to UTF-8. Since that is not the desired data format FSO will show a warning to let you know that you should really convert the encoding.

Converting the encoding can either be done with a command line utility like iconv or a text editor like Notepad++ which has support for reencoding a file. The files you must convert from a retail install are string.tbl (this one requires a few other small changes), tstrings.tbl and weapons.tbl. The weapons table is only an issue because it uses some special characters in a comment but FSO still reads those characters and doesn't understand them.

The string table contains one entry that previously used a special character. This is the entry with the number 385. Replace the %c with © symbol to restore the old behavior. Another issue with this table is that it contains syntax errors that were not recognized by the retail parsing code. These parser errors were corrected in FSO but since we never break retail compatibility there is a workaround which fixes that for retail data but that workaround does not work with UTF-8 encoded files since it assumes that the text data is Latin-1 encoded. I already fixed these issues for my tests and uploaded those files in the test mod below.


While I was testing these changes I needed a test mod which should be a good starting point for your tests: http://www.mediafire.com/file/pw5788071fljm1m/unicodeTest.7z

Theoretically, this should also allow translations into languages with radically different characters like Japanese.


Please test these changes and let me know if you find something that breaks the new code.

 

Offline Novachen

  • 27
  • The one and only Capella supernova.
Oh yes... that is very interesting for my german translations, especially for modifications that uses TTFs. I will take a look into this tomorrow.
Female FS2 pilot since 1999.

German Translations created by me:
Between the Ashes: Slaves of Chaos, FreeSpace Port, The Destiny of Peace, Silent Threat: Reborn, Awakenings (in development),

Other projects:
FSPort Mission Upgrade, Out of the Shadow: Nova Safiya Edition (Remake of the Out of the Shadow campaign, in development)

If you want to know, what my nickname means, -chen is a german diminutive term, so you can translate Novachen as something like Little Nova or Novalet.
Even my original meaning of this name is more like "Sweet pretty deadly (Super)Nova" ;).

 

Offline Novachen

  • 27
  • The one and only Capella supernova.
So, my first report.

With the supplemented mod everything seems to work with the german language setting in my playtests. Did not encounter an error in a playthrough of one mission.


After that, i started some further tests with a full fleshed out modification that uses some additional FSO features. So i have adapted this to my Between the Ashes: Slaves of Chaos translation.

The good news:
- The special characters are shown correctly in the menus. Even in the fiction viewer, news terminal and the sector map. It seems that the fiction viewer texts have to be converted into UTF-8 without BOM instead of pure UTF-8, otherwise there is a visible rectangle at the start of the text.


The bad news:
- After i start a mission, my game crashes after a few seconds. So i can not check if this one would work with custom and mission specific hud gauges also. The error message is not helpful in my case to see, where the actual problem is.


Quote
Error: Caught std::exception in main(): 'Not enough space'!
File: freespace.cpp
Line: 7947

ntdll.dll! ZwWaitForSingleObject + 12 bytes
KERNELBASE.dll! WaitForSingleObject + 18 bytes
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2.exe! <no symbol>
KERNEL32.DLL! BaseThreadInitThunk + 36 bytes
ntdll.dll! RtlGetAppContainerNamedObjectPath + 253 bytes
ntdll.dll! RtlGetAppContainerNamedObjectPath + 205 bytes

In Debug Build i got also this one:
Quote
Assert: "(size_t)handle < GL_buffer_objects.size()"
File: gropengltnl.cpp
Line: 210

ntdll.dll! ZwWaitForSingleObject + 12 bytes
KERNELBASE.dll! WaitForSingleObject + 18 bytes
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>

[ This info is in the clipboard so you can paste it somewhere now ]


Use Debug to break into Debugger, Exit will close the application.

ntdll.dll! ZwWaitForSingleObject + 12 bytes
KERNELBASE.dll! WaitForSingleObject + 18 bytes
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>
fs2_open_3_8_1_unicodeSupport_SSE2-FASTDBG.exe! <no symbol>
« Last Edit: November 25, 2017, 09:44:30 am by Novachen »
Female FS2 pilot since 1999.

German Translations created by me:
Between the Ashes: Slaves of Chaos, FreeSpace Port, The Destiny of Peace, Silent Threat: Reborn, Awakenings (in development),

Other projects:
FSPort Mission Upgrade, Out of the Shadow: Nova Safiya Edition (Remake of the Out of the Shadow campaign, in development)

If you want to know, what my nickname means, -chen is a german diminutive term, so you can translate Novachen as something like Little Nova or Novalet.
Even my original meaning of this name is more like "Sweet pretty deadly (Super)Nova" ;).

 

Offline m!m

  • 210
The good news:
- The special characters are shown correctly in the menus. Even in the fiction viewer, news terminal and the sector map. It seems that the fiction viewer texts have to be converted into UTF-8 without BOM instead of pure UTF-8, otherwise there is a visible rectangle at the start of the text.
FSO should handle the BOM properly. Well, I added code for handling it but apparently that doesn't work.

The bad news:
- After i start a mission, my game crashes after a few seconds. So i can not check if this one would work with custom and mission specific hud gauges also. The error message is not helpful in my case to see, where the actual problem is.
Yeah, that error message isn't very useful (although I know where the exception is generated which is a start).

What changes did you make to your BtA mod for this? I have downloaded your BtA translation mod so I only need the files you changed.

 

Offline Novachen

  • 27
  • The one and only Capella supernova.
Quote
What changes did you make to your BtA mod for this? I have downloaded your BtA translation mod so I only need the files you changed.

I created a special character version of this translation back before the 3.8 release.

Extract that into the "FreiRaum_BtA SoC" folder.

It includes a sector map with special characters (only the sector view itself, not the invidiual planet views), one headline in the news room with special characters as well all the Hud gauges and mission specific gauges in the mission files itself with special characters.

I think the best missions to test should be "Murmeltiertag" (bta1_m3_03) and "Mit einem Knall erlöschen" (bta1_m3_04) because these show special character gauges right after the beginning of the mission. Even every one i have tried crashed.
Female FS2 pilot since 1999.

German Translations created by me:
Between the Ashes: Slaves of Chaos, FreeSpace Port, The Destiny of Peace, Silent Threat: Reborn, Awakenings (in development),

Other projects:
FSPort Mission Upgrade, Out of the Shadow: Nova Safiya Edition (Remake of the Out of the Shadow campaign, in development)

If you want to know, what my nickname means, -chen is a german diminutive term, so you can translate Novachen as something like Little Nova or Novalet.
Even my original meaning of this name is more like "Sweet pretty deadly (Super)Nova" ;).

 

Offline m!m

  • 210
That was very helpful, thank you. I found the cause of the error but I have to think about how to fix it properly.

The technical reason is that Axems message box script had an effect where the text is displayed character by character to simulate a scroll in effect. The problem with that is that sometimes it slices a mutli-byte UTF-8 sequence in half which results in an invalid UTF-8 encoding sequence. I think I will add some code to validate the strings passed to the Lua API to make the error message easier to understand. Since the Lua version we are using does not have support for handling UTF-8 encoded strings I will also add a library with functions for iterating over these UTF-8 codepoints.

 

Offline m!m

  • 210
The issue you found in BtA should be fixed now. The builds in the first post have been updated.

Without any changes to the mod data the new builds will still show an error message but that should be a little more helpful for mod developers since it actually tells them what is wrong. I fixed Axems message box script by using some new functions that I added to the FSO scripting API. I attached the new file to this post.

What mission did have the fiction viewer issue you reported earlier? FSO should be able to handle UTF-8 files with a BOM and if that is not working then I would like to fix that.

 

Offline Novachen

  • 27
  • The one and only Capella supernova.
What mission did have the fiction viewer issue you reported earlier? FSO should be able to handle UTF-8 files with a BOM and if that is not working then I would like to fix that.

Convert the h01.txt (News Room Headline) and/or m03.txt (Pre-Mission Text) in the fiction folder into UTF-8 with BOM and you can see a square at the beginning of the text in the News Room (available in the lower left in the Main Menu) or alternative in the Mission "Stierkampf" (bta1_m1_03.fs2). It is not there if this file is converted into UTF-8 without BOM.

Quote
The issue you found in BtA should be fixed now. The builds in the first post have been updated.

I will check that out, especially there are some features in the missions itself i want to test out with it.
Female FS2 pilot since 1999.

German Translations created by me:
Between the Ashes: Slaves of Chaos, FreeSpace Port, The Destiny of Peace, Silent Threat: Reborn, Awakenings (in development),

Other projects:
FSPort Mission Upgrade, Out of the Shadow: Nova Safiya Edition (Remake of the Out of the Shadow campaign, in development)

If you want to know, what my nickname means, -chen is a german diminutive term, so you can translate Novachen as something like Little Nova or Novalet.
Even my original meaning of this name is more like "Sweet pretty deadly (Super)Nova" ;).

 

Offline Novachen

  • 27
  • The one and only Capella supernova.
So another report.

I have tested several missions and i think that i tested every mission that use special features.
The system console in "Texas Seven" worked flawlessly as the Artillery HUD in "Murmeltiertag" and the tower defense section in "Angel".

Some missions with special HUD gauges needed only small changes in terms of entry lenghts, because the special characters need more bytes than in the original FS2. Would be interesting if proper translations of this kind of stuff is even possible in total different languages like Russian or Chinese where every character need more space by default. For languages like German this does not seem to be a big deal in general.

But i noticed a small strange error in the fiction viewer. In the first line of a text, things like the points over the characters like ä are not visible or seem to be more only half visible if you zoom in, but that is not the case in all the other lines.

I marked two examples in the screenshots. They should look the same, but they do not.
« Last Edit: November 25, 2017, 08:14:06 pm by Novachen »
Female FS2 pilot since 1999.

German Translations created by me:
Between the Ashes: Slaves of Chaos, FreeSpace Port, The Destiny of Peace, Silent Threat: Reborn, Awakenings (in development),

Other projects:
FSPort Mission Upgrade, Out of the Shadow: Nova Safiya Edition (Remake of the Out of the Shadow campaign, in development)

If you want to know, what my nickname means, -chen is a german diminutive term, so you can translate Novachen as something like Little Nova or Novalet.
Even my original meaning of this name is more like "Sweet pretty deadly (Super)Nova" ;).

 

Offline m!m

  • 210
I fixed the BOM issue in the fiction viewer, the builds in the first post have been updated. The cause was that the fiction viewer reads the file text without going through the parse system which handled the BOM skipping for all other files.

I cannot reproduce the issue you are showing in your screenshots but he text in my fiction viewer is much larger than in your screenshot so there may be a configuration difference. The font configuration specifies a top and bottom offset so it could be that you need to adjust those values by a pixel or two since the ä points are pretty high above the text in the font BtA is using.

 

Offline karajorma

  • King Louie - Jungle VIP
  • Administrator
  • 214
    • Karajorma's Freespace FAQ
I did attempt to try this but I got bogged down in the fact that Diaspora 1.1.2 on Knossos (or 1.1.1 via the installer + patch) both crash out as soon as you set the language to German or French for no reason I could figure out.
Karajorma's Freespace FAQ. It's almost like asking me yourself.

[ Diaspora ] - [ Seeds Of Rebellion ] - [ Mind Games ]

 

Offline Novachen

  • 27
  • The one and only Capella supernova.
I cannot reproduce the issue you are showing in your screenshots but he text in my fiction viewer is much larger than in your screenshot so there may be a configuration difference.

How can that be? Do we not use the same fonts.tbl? I did not change mine and use the same as in the attachment file.

I tried it in several resolutions, 1280x720 up to 2560x1440 (my default resolution)... the text got smaller with a higher resolution, but the issue was always the same.

But so you do not have the same issue? The first line is correct in the news article on your configuration? So you can actually read the "Schöpfer" with ö points?

I did attempt to try this but I got bogged down in the fact that Diaspora 1.1.2 on Knossos (or 1.1.1 via the installer + patch) both crash out as soon as you set the language to German or French for no reason I could figure out.

I was never aware of that Diaspora have multi language files in the first place. I thought it was always a only english game that i have to translate someday in the future  :D
I will take a look into this. Maybe i can find something out.


EDIT:
Checked Diaspora out. Something is wrong with the strings.tbl in the other languages. I used my translated FSPort strings.tbl for that only for testing purposes.
But it do not work, because Diaspora uses Bitmap Fonts in .vf files and no Font files in .ttf and so it is not supported by this build.

The Diaspora fonts are needed in .ttf for that.
Actually i do not know the visual difference between the font01.vf from Diaspora to the font01.vf from FreeSpace2.
But if you use the german font01.vf you can change the Game into German and it shows all corresponding special characters as they should.

Even i hope there will be a .ttf version of all used Fonts from Freespace and its mods in the future, because that would make translation much easier in all languages without any setbacks (in fiction viewer etc.)
« Last Edit: November 26, 2017, 12:05:06 pm by Novachen »
Female FS2 pilot since 1999.

German Translations created by me:
Between the Ashes: Slaves of Chaos, FreeSpace Port, The Destiny of Peace, Silent Threat: Reborn, Awakenings (in development),

Other projects:
FSPort Mission Upgrade, Out of the Shadow: Nova Safiya Edition (Remake of the Out of the Shadow campaign, in development)

If you want to know, what my nickname means, -chen is a german diminutive term, so you can translate Novachen as something like Little Nova or Novalet.
Even my original meaning of this name is more like "Sweet pretty deadly (Super)Nova" ;).

 

Offline karajorma

  • King Louie - Jungle VIP
  • Administrator
  • 214
    • Karajorma's Freespace FAQ
Thanks to m|m I got the problem sorted. I'll post the files in a bit. I'm mostly interested in getting Chinese language supported at the moment but it would be really cool to have someone check over our existing French and German translations to see if they are correct.
Karajorma's Freespace FAQ. It's almost like asking me yourself.

[ Diaspora ] - [ Seeds Of Rebellion ] - [ Mind Games ]

 

Offline m!m

  • 210
I cannot reproduce the issue you are showing in your screenshots but he text in my fiction viewer is much larger than in your screenshot so there may be a configuration difference.

How can that be? Do we not use the same fonts.tbl? I did not change mine and use the same as in the attachment file.

I tried it in several resolutions, 1280x720 up to 2560x1440 (my default resolution)... the text got smaller with a higher resolution, but the issue was always the same.

But so you do not have the same issue? The first line is correct in the news article on your configuration? So you can actually read the "Schöpfer" with ö points?

Actually, the word "Schöpfer" was in the second line for me so it was impossible for me to see your issue. I reduces the size of the newsroom font to 16 at which point I could reproduce the issue. I then changed the top offset to -3 which resolved the issue again so this seems to be an issue with the font table.

EDIT: I updated the builds again. The only major change is that the parser code now checks if a special character with the byte value -128 does not appear in the file since that character is used by FSO to detect the end of the file. This probably limits the characters that are possible to be used but there is nothing I can do about that now.
« Last Edit: November 27, 2017, 08:22:52 am by m!m »

 

Online jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
Is the plan to change the way FSO checks for the end of a file in the future?  I'm assuming OSes or programming languages have ways of signalling EOF without using some sort of special character in the file contents?

 

Offline m!m

  • 210
The usual C method of detecting the end of a string is the null byte but for some reason that is not done in the parse system of FSO. :v: probably had a good reason for designing the system this way but I couldn't find any information about what that reason might have been.

 

Offline karajorma

  • King Louie - Jungle VIP
  • Administrator
  • 214
    • Karajorma's Freespace FAQ
I spent ages trying to get a working Diaspora strings.tbl but they keep on corrupting themselves. Can you take a look at the one in Diaspora SVN for me? Just commit something I can simply open in notepad++ and work on without it constantly breaking.
Karajorma's Freespace FAQ. It's almost like asking me yourself.

[ Diaspora ] - [ Seeds Of Rebellion ] - [ Mind Games ]

 

Offline m!m

  • 210
I fixed all the issues FSO reported. There were some lines in there that were completely unreadable so I replaced those with the strings from retail FSO.

 

Offline karajorma

  • King Louie - Jungle VIP
  • Administrator
  • 214
    • Karajorma's Freespace FAQ
Cool. I can compare it against the old one to see what they should have been.

EDIT: It's still complaining that French is corrupt when I run it with the newest build posted.
« Last Edit: November 28, 2017, 11:05:05 pm by karajorma »
Karajorma's Freespace FAQ. It's almost like asking me yourself.

[ Diaspora ] - [ Seeds Of Rebellion ] - [ Mind Games ]

 

Offline m!m

  • 210
I cannot reproduce your issue with my current development status. I pushed those changes to the test branch so the builds should be updated in ~1 hour.