Non-Latin in the "name" table

Simon Cozens · October 2016

I have a bunch of stupid questions about OpenType tables. Let's kick off with one about localisation in the name table: What character encoding should be used for non-Latin entries? The specification says:

Note that OS/2 and Windows both require that all name strings be defined in Unicode. Thus all 'name' table strings for platform ID = 3 (Windows) will require two bytes per character. Macintosh fonts require single byte strings.

"Unicode" isn't an encoding, but "requires two bytes per character" suggests to me this is in UTF16 when platformID=3. Could or should that be made more explicit?

But the one about the Macintosh completely confuses me. Let's say I have a Japanese string. The spec suggests I could have a string which is for the Mac platform (platformID=1) and then I choose the appropriate language ID for the platform (languageID=11). Now I have to choose the encoding ID, which on this platform is a "script manager code." Looking in the table, there's a script manager code for Japanese which means I set encodingID=1. Now I need to encode the string itself in a "single byte string" encoding of Japanese, which as far as I'm aware doesn't exist.

What's going on here? Why do the Macintosh script manager codes even exist? Perennial question: does anything actually use them in practice, or should all non-Latin stuff be restricted to platformID=3 entries?

Denis A Serikov · October 2016

Platform ID=3 only. Photoshop for Windows do not use this names.

Illustrator:

Image: https://us.v-cdn.net/5019405/uploads/editor/h9/545oy4p80zxk.png

Photoshop:

Image: https://us.v-cdn.net/5019405/uploads/editor/4j/iunc9nqzd0f7.png

(Win 8.1)

Khaled Hosny · October 2016

I tend to just forget about Macintosh names since this is mostly pre-OS X stuff. However, I’ve been told that some obsolete applications on Mac (old versions of Word so far) require Macintosh names, in this case I’d include only MacRoman names and forget about anything else, localized names for Macintosh platform is just broken.

Khaled Hosny · October 2016

You can also check FontTools, IIRC it has code to decode (most of?) the non-Unicode name entries.

Simon Cozens · October 2016

What I don't understand is how this could have ever made sense. Why specify that you can write Mac platform strings in Mongolian or Devanagari if there wasn't even a way to correctly encode those strings?

Georg Seifert · October 2016

It is possible to encode all that scripts. It is just not 8 bit.
MacOS provides a function to convert a string (NSString or CFString) to different encodings:

CFStringCreateExternalRepresentation

external_string_encodings

Simon Cozens · October 2016

Georg Seifert said:

It is possible to encode all that scripts. It is just not 8 bit.

In which case "Macintosh fonts require single byte strings" is completely wrong.

Georg Seifert · October 2016

Yes. I'll check that to be sure.

Denis Moyogo Jacquerye · October 2016

What I don't understand is how this could have ever made sense. Why specify that you can write Mac platform strings in Mongolian or Devanagari if there wasn't even a way to correctly encode those strings?

@Simon Cozens Have a look at http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/
These are legacy Macintosh encodings, before OS X. The ones listed in OT specs can be found there, at least 0 to 11 and 21 to 29. Unless you’re making a font for System 7, Mac OS 8 or Mac OS 9, don’t use these.

That said you may need a couple of names platformID=1, encodingID=0 for some Mac applications to be happy like mentioned before but the other encodingID are useless.

Remember that this part of the specs was written when Unicode was still a new thing.

Non-Latin in the "name" table

Comments

Categories