Hard Light Productions Forums

Modding, Mission Design, and Coding => FS2 Open Coding - The Source Code Project (SCP) => Topic started by: Evergreen on December 02, 2010, 07:51:03 am

Title: Inquiry: Unicode support?
Post by: Evergreen on December 02, 2010, 07:51:03 am
Hi there,

just a quick inquiry: Is anyone still working on unicode-support?
Title: Re: Inquiry: Unicode support?
Post by: The E on December 02, 2010, 12:08:04 pm
No.
Title: Re: Inquiry: Unicode support?
Post by: chief1983 on December 02, 2010, 12:36:00 pm
Don't we already have a patch that added it though?  From the japanese localization.
Title: Re: Inquiry: Unicode support?
Post by: The E on December 02, 2010, 12:40:13 pm
Noone did any work to integrate that stuff though.

At any rate, given the limitations of the font files we're using, I am sort of dubious whether or not it would actually work well.
Title: Re: Inquiry: Unicode support?
Post by: Nuke on December 02, 2010, 06:30:40 pm
i wouldnt mind having support for a more common font format. the raster format we use seems to be somewhat archaic. ive used sdl_ttf for font rendering, and it seems pretty easy to work with.
Title: Re: Inquiry: Unicode support?
Post by: rsaxvc on December 04, 2010, 02:30:47 am
It shouldn't be too hard to bring in UTF8 support. Font rendering is sort-of a different problem.
Title: Re: Inquiry: Unicode support?
Post by: Nuke on December 04, 2010, 01:29:53 pm
yea i guess, but id bring both changes at once. i mean what good is unicode if youre stuck with a limit of 256 characters with the existing raster font format (not really sure about the limits of the format). the point would be to support non-western languages like japanese and chinese, where 8bit characters wont work and the rendering of unicode would require a better font format.
Title: Re: Inquiry: Unicode support?
Post by: chief1983 on December 04, 2010, 03:32:38 pm
If they can be done separately, do them separately just to make sure Part A doesn't break anything itself.
Title: Re: Inquiry: Unicode support?
Post by: rsaxvc on December 04, 2010, 06:13:32 pm
Potentially, you could render anything over the current limit(256? 128?) with a known symbol.

I think the bigger problem is going to be ustrlen() vs strlen(). For UTF8, when allocating/copying a string, you need the length in bytes, but when rendering, you need the number of characters, which is often different. I would recommend UTF8, as it will likely be the least amount of work to support.
Title: Re: Inquiry: Unicode support?
Post by: Goober5000 on December 04, 2010, 09:15:35 pm
The zeroth step would be to convert most of the strings in FSO to use std::string (via SCP_string), which itself is non-trivial.
Title: Re: Inquiry: Unicode support?
Post by: karajorma on December 05, 2010, 01:27:20 am
Well we only actually need to convert the strings that would get seen, which is a smaller subset.

But yeah, it's still a big task.
Title: Re: Inquiry: Unicode support?
Post by: rsaxvc on December 05, 2010, 01:42:55 am
The zeroth step would be to convert most of the strings in FSO to use std::string (via SCP_string), which itself is non-trivial.

Why? I mean, if we're moving toward std::string anyhow, this is a good time to do it. But I'm not sure it is required.
Title: Re: Inquiry: Unicode support?
Post by: Goober5000 on December 05, 2010, 02:16:01 am
We'll have to convert the strings to either std::string or wchar.  Wchar is more confusing, and thus more liable to cause mistakes.  Furthermore, we've already made a bit of progress on converting some functions to use std::string.
Title: Re: Inquiry: Unicode support?
Post by: rsaxvc on December 05, 2010, 02:41:04 am
If we use UTF8, then the char[]s will still work fine.

If we use UTF16 or UTF32, then the char[]s will need to be changed over.
Title: Re: Inquiry: Unicode support?
Post by: Goober5000 on December 05, 2010, 03:31:35 am
No, because UTF-8 is not the same as char.  Go look up how UTF-8 works.  If a character is not in the ASCII character set, it will require multiple consecutive bytes to encode one character.  And that will screw up the char[] functions which expect characters and bytes to be equivalent.

If that's not clear to you, then think of how strlen() would work on a character array that includes multibyte characters.
Title: Re: Inquiry: Unicode support?
Post by: karajorma on December 05, 2010, 03:55:07 am
This would definitely break multiplayer compatibility as anything that sends strings would now be sending twice the number of bytes it used to.

We might even have to rewrite some of the packets if they now go over MAX_PACKET_SIZE
Title: Re: Inquiry: Unicode support?
Post by: rsaxvc on December 05, 2010, 04:34:09 am
I understand how UTF-8 works. I actually pointed out the issues with strlen/ustrlen on utf8 earlier in this thread.

Probably the first order of business would be to pick encodings. Each one has their own issues/problems, so it may be useful to pick different encodings for different tasks(Internal representation/network io/file io). Some of the encodings will break more or less compatibility with the way things are now, unless we want to try and support multiple encodings for things like tbl files, which is always a mess and I'd recommend against it.

The good:
UTF8 - Most ascii like. Often the smallest. Best text editor / tool support, at least on unix.
UTF32 - A character always fits in 1 UTF32 word.

The bad:
UTF8 - Multiword encoding. This can be a source of bugs and extra code-complexity.
UTF16 - Multiword encoding. This can be a source of bugs and extra code-complexity.
UTF32 - Usually 2-4 times larger than it needs to be(not great for network io).

*Multiword(I may have made up this phrase) encoding means that it may take multiple unicode words (UTF8/UTF16) to make up one full unicode character. For example, Korean takes more than 2 bytes per character to store in UTF8. In the same way, there are some languages whose characters take 4 bytes in UTF16. This discrepancy between number of UTF words and UTF characters can be an issue.
Title: Re: Inquiry: Unicode support?
Post by: Goober5000 on December 05, 2010, 01:23:35 pm
I understand how UTF-8 works. I actually pointed out the issues with strlen/ustrlen on utf8 earlier in this thread.
So you did.  I apologize for the mis-assumption; while replying to several threads, I lost track of the post history of this one.

Quote
Probably the first order of business would be to pick encodings.
I think we can pretty safely choose UTF-8.  The only way we'd avoid multiword encoding is if we went with UTF-32, but that wastes far too much space for our purposes.  So given that we'll have to deal with multiword encoding anyway, UTF-8 is the best choice.

It occurred to me that since we're investigating wxWidgets for wxFRED, we should think seriously about using wxString for FSO too.  WxString supports Unicode (http://docs.wxwidgets.org/trunk/overview_unicode.html) and is itself based on std::string, so it may be useful to redefine SCP_string in terms of wxString rather than std::string.
Title: Re: Inquiry: Unicode support?
Post by: The E on December 05, 2010, 01:30:39 pm
Yes. One thing we should do anyway is to base the standalone server UI on wx. Just to start integrating wx into the codebase.
Title: Re: Inquiry: Unicode support?
Post by: Evergreen on December 07, 2010, 03:54:47 pm
The game was localized for several languages. Until Unicode-support is underway, maybe switchable language based on the localized character sets could be offered as a work-around?
This would be a real treat for SCP-users building missions in non-english languages.
Title: Re: Inquiry: Unicode support?
Post by: rsaxvc on December 07, 2010, 08:41:46 pm
Does a standalone server need a UI? You could make it a little text console, and also open a local port for a UI to hook into.
Title: Re: Inquiry: Unicode support?
Post by: karajorma on December 07, 2010, 08:56:27 pm
Didn't someone already port the standalone to wx?
Title: Re: Inquiry: Unicode support?
Post by: chief1983 on December 08, 2010, 12:45:13 am
Yes, but I think there were some issues pointed out with the code and it was never integrated.  I believe I still have it somewhere though.
Title: Re: Inquiry: Unicode support?
Post by: karajorma on December 08, 2010, 01:09:28 am
Which issues? Might be simpler to fix them than to start again from nothing.
Title: Re: Inquiry: Unicode support?
Post by: chief1983 on December 08, 2010, 09:41:28 am
I don't know the specific issues.  But I can try to find the code again if it's not on the forums anymore.