Author Topic: Unicode text support  (Read 21333 times)

0 Members and 1 Guest are viewing this topic.

Offline karajorma

  • King Louie - Jungle VIP
  • Administrator
  • 214
    • Karajorma's Freespace FAQ
Okay, I'll give that a try and if it doesn't work I'll have to bite the bullet and build it myself. It would be kinda nice if we could improve the message from simply saying "The table is ****ed and so are you" though.
Karajorma's Freespace FAQ. It's almost like asking me yourself.

[ Diaspora ] - [ Seeds Of Rebellion ] - [ Mind Games ]

 

Offline m!m

  • 211
What is the error FSO is displaying? I made all the error messages I introduces as informative as possible so maybe it is one of the preexisting error messages.

 

Offline karajorma

  • King Louie - Jungle VIP
  • Administrator
  • 214
    • Karajorma's Freespace FAQ
If I change the language in fs2_open.ini to French I get

Error: strings.tbl is corrupt
File: localize.cpp
Line: 336

And the option to debug.

As you can see that's not that informative about where the file is corrupt. And it works perfectly well with all the other languages I tried. 
Karajorma's Freespace FAQ. It's almost like asking me yourself.

[ Diaspora ] - [ Seeds Of Rebellion ] - [ Mind Games ]

 

Offline m!m

  • 211
The error in the French table was caused by some misplaced quotes so it was no problem specific to the Unicode builds. To make the error messag a bit more informative I changed the error dialog so that it includes the filename and line number.

I think these changes have been tested enough to merge them into master.

 

Offline m!m

  • 211
The changes have now been merged and are available in the newest nightly.

 
I tried to add Japanese text into ships.tbl and encountered encoding error.
I'm using fs2_open_3_8_1_20171206_8168099_x64_SSE2-FASTDBG.exe and fs2_open_3_8_1_unicodeSupport_x64_SSE2-FASTDBG.exe.


Actually most characters are fine. Only some of them cause problem.


At first it seems completely random. However, I found a pattern when I made this list:

Code: [Select]
%E3%80%81
%E3%80%82
%E3%80%8C
%E3%80%8D
%E6%8A%80
%E3%80%80
%E8%80%90
%E6%80%A7
%E6%9C%80
%E9%80%9F
%E3%83%80

Every 6-letter UTF-8 character code which contains "80" gives encoding error.
I picked up some arbitrary characters for testing. Seemingly my hypothesis holds true.

I guess other languages (such as Chinese and Korean) will have the same problem.
I hope this information helps. Keep up the good work.

 

Offline m!m

  • 211
80 is the unsigned hexadecimal representation of -128 which unfortunately is the exact byte value FSO uses for marking the end of the file in the internal representation of the file contents. I found this issue while developing these changes (which is also why you are seeing that error) and have since fixed the issue and submitted the changes for code review: https://github.com/scp-fs2open/fs2open.github.com/pull/1529

Once that is merged you should be able to use all Unicode characters you want but until then not all Unicode characters are supported.

EDIT: The changes have been merged and should appear in the next nightly.
« Last Edit: December 10, 2017, 05:40:50 am by m!m »

 
Quote
Once that is merged you should be able to use all Unicode characters you want but until then not all Unicode characters are supported.

EDIT: The changes have been merged and should appear in the next nightly.

Thank you. I'll have a look when nightly arrives.

 

Offline chief1983

  • Still lacks a custom title
  • Moderator
  • 212
  • ⬇️⬆️⬅️⬅️🅰➡️⬇️
    • Minecraft
    • Skype
    • Steam
    • Twitter
    • Fate of the Galaxy
The nightly build and uploaded, but failed to post to the forums.  In the meantime you can grab it here in its release folder.
Fate of the Galaxy - Now Hiring!  Apply within | Diaspora | SCP Home | Collada Importer for PCS2
Karajorma's 'How to report bugs' | Mantis
#freespace | #scp-swc | #diaspora | #SCP | #hard-light on EsperNet

"You may not sell or otherwise commercially exploit the source or things you created based on the source." -- Excerpt from FSO license, for reference

Nuclear1:  Jesus Christ zack you're a little too hamyurger for HLP right now...
iamzack:  i dont have hamynerge i just want ptatoc hips D:
redsniper:  Platonic hips?!
iamzack:  lays

 
Yay it works! Thank you, thank you very much. This will be a great news for Japanese FreeSpace2 fandom.



There are one or two minor problems but I'll report them later.

 

Offline jr2

  • The Mail Man
  • 212
  • It's prounounced jayartoo 0x6A7232
    • Steam
*prepares for sudden influx of Japanese anime modders*   ;)  This is awesome (even though I don't read Japanese, I imagine if I did, it would be much preferable to have FreeSpace in my own language).

 
I'd like to report a couple of problems I could find so far. I understand there's a workaround for each of these but it would be great if FS2 Open can handle them.

“Token too long” error is a little too unforgiving. There are only 14 letters in the line but the error message says “Length = 42”. I believe this is UTF-8 related problem... Some Japanese characters (including hiragana, katakana and kanji) take 3 bytes in UTF-8.

Code: [Select]
Warning: Training-2.fs2(line 832):
Warning: Token too long: [プライマリウェポンを選択しろ].  Length = 42.  Max is 31.

File: parselo.cpp
Line: 301

I don't know what “MAX_BRIEF_LINE_LEN” means but it seems Japanese briefing text is too long to display (actually they are not that long). Cutting them in half can fix this error.



FS2 automatically starts a new line when it encountered space (the one which appears when you hit “space” key) in Japanese text. Obviously you can avoid this by not using space at all.


 

Offline m!m

  • 211
I'd like to report a couple of problems I could find so far. I understand there's a workaround for each of these but it would be great if FS2 Open can handle them.

“Token too long” error is a little too unforgiving. There are only 14 letters in the line but the error message says “Length = 42”. I believe this is UTF-8 related problem... Some Japanese characters (including hiragana, katakana and kanji) take 3 bytes in UTF-8.

Code: [Select]
Warning: Training-2.fs2(line 832):
Warning: Token too long: [プライマリウェポンを選択しろ].  Length = 42.  Max is 31.

File: parselo.cpp
Line: 301
As you correctly stated, the issue here is that the amount of characters you see is less than the amount of bytes it takes to store the characters. Unfortunately, there is no general solution for this apart from reducing the length of the string. There are a lot of places in the parsing code where a fixed size array is used for storing a string and there simply isn't enough space to store the entire string. That was no problem before since ASCII text is stored pretty efficiently but if you need characters that are encoded using 3 or 4 bytes then you run out of space pretty fast.

There is a way of fixing this by using a variable size string but that type is sufficiently different from the current code that it doesn't work everywhere without major changes.

I don't know what “MAX_BRIEF_LINE_LEN” means but it seems Japanese briefing text is too long to display (actually they are not that long). Cutting them in half can fix this error.

<snip>

FS2 automatically starts a new line when it encountered space (the one which appears when you hit “space” key) in Japanese text. Obviously you can avoid this by not using space at all.

<snip>
FSO probably doesn't know how to properly find word separators in Japanese text since that code is only written with Western languages in mind. It is possible to fix this by using an external library which knows how to split non-English text properly but that requires adding that library to FSO and integrating it into the code which is additional work. I hope that I can get around to that eventually but for now you probably need to help FSO a bit by adding spaces at the right places even if that makes the text look weird.

At least the first issue should be possible to fix without extensive changes to FSO (I hope) so could you upload your translation mod here so I can take a look at it?

 
Thank you for your quick reply.

Quote
FSO probably doesn't know how to properly find word separators in Japanese text since that code is only written with Western languages in mind. It is possible to fix this by using an external library which knows how to split non-English text properly but that requires adding that library to FSO and integrating it into the code which is additional work. I hope that I can get around to that eventually but for now you probably need to help FSO a bit by adding spaces at the right places even if that makes the text look weird.

I assume FSO recognizes Japanese text as a single huge word (because usually it contains no space). I forgot to say you can avoid this by using full-width space or non-breaking space(Alt+0160).

I'll upload my translation mod tomorrow as it is already late in Japan. Sorry.

 
This is my Japanese localization mod. Actually most of these files are extracted from old Japanese patch.

Download link

 

Offline m!m

  • 211
Thanks. I will take a look at the warnings to see if I can make some of those arrays dynamic.

 
in japanese mod, speech are broken even if use japanese TTS voice.
it seem to japanese word are recognize Sign or something that not proper japanese.
however it may or may not japanese TTS voices bug...
« Last Edit: December 22, 2017, 09:22:04 am by Arkblade »

 

Offline AdmiralRalwood

  • 211
  • The Cthulhu programmer himself!
    • Skype
    • Steam
    • Twitter
It probably needs to be converted from UTF-8 to Windows's wide character format before being passed to TTS.
Ph'nglui mglw'nafh Codethulhu GitHub wgah'nagl fhtagn.

schrödinbug (noun) - a bug that manifests itself in running software after a programmer notices that the code should never have worked in the first place.

When you gaze long into BMPMAN, BMPMAN also gazes into you.

"I am one of the best FREDders on Earth" -General Battuta

<Aesaar> literary criticism is vladimir putin

<MageKing17> "There's probably a reason the code is the way it is" is a very dangerous line of thought. :P
<MageKing17> Because the "reason" often turns out to be "nobody noticed it was wrong".
(the very next day)
<MageKing17> this ****ing code did it to me again
<MageKing17> "That doesn't really make sense to me, but I'll assume it was being done for a reason."
<MageKing17> **** ME
<MageKing17> THE REASON IS PEOPLE ARE STUPID
<MageKing17> ESPECIALLY ME

<MageKing17> God damn, I do not understand how this is breaking.
<MageKing17> Everything points to "this should work fine", and yet it's clearly not working.
<MjnMixael> 2 hours later... "God damn, how did this ever work at all?!"
(...)
<MageKing17> so
<MageKing17> more than two hours
<MageKing17> but once again we have reached the inevitable conclusion
<MageKing17> How did this code ever work in the first place!?

<@The_E> Welcome to OpenGL, where standards compliance is optional, and error reporting inconsistent

<MageKing17> It was all working perfectly until I actually tried it on an actual mission.

<IronWorks> I am useful for FSO stuff again. This is a red-letter day!
* z64555 erases "Thursday" and rewrites it in red ink

<MageKing17> TIL the entire homing code is held up by shoestrings and duct tape, basically.

 

Offline m!m

  • 211
The code does "convert" the UTF-8 data to a wchar_t array by assigning the byte values of each character to a wide character :nono:

The correct way to convert the data is to use MultiByteToWideChar.

 

Offline m!m

  • 211
The newest nightly now contains a fix to how FSO passes the speech text to the WIndows API which should fix the issue with Japanese text.