Non-Latin in the "name" table
Simon Cozens
Posts: 752
I have a bunch of stupid questions about OpenType tables. Let's kick off with one about localisation in the
"Unicode" isn't an encoding, but "requires two bytes per character" suggests to me this is in UTF16 when platformID=3. Could or should that be made more explicit?
But the one about the Macintosh completely confuses me. Let's say I have a Japanese string. The spec suggests I could have a string which is for the Mac platform (platformID=1) and then I choose the appropriate language ID for the platform (languageID=11). Now I have to choose the encoding ID, which on this platform is a "script manager code." Looking in the table, there's a script manager code for Japanese which means I set encodingID=1. Now I need to encode the string itself in a "single byte string" encoding of Japanese, which as far as I'm aware doesn't exist.
What's going on here? Why do the Macintosh script manager codes even exist? Perennial question: does anything actually use them in practice, or should all non-Latin stuff be restricted to platformID=3 entries?
name
table: What character encoding should be used for non-Latin entries? The specification says:Note that OS/2 and Windows both require that all name strings be defined in Unicode. Thus all 'name' table strings for platform ID = 3 (Windows) will require two bytes per character. Macintosh fonts require single byte strings.
"Unicode" isn't an encoding, but "requires two bytes per character" suggests to me this is in UTF16 when platformID=3. Could or should that be made more explicit?
But the one about the Macintosh completely confuses me. Let's say I have a Japanese string. The spec suggests I could have a string which is for the Mac platform (platformID=1) and then I choose the appropriate language ID for the platform (languageID=11). Now I have to choose the encoding ID, which on this platform is a "script manager code." Looking in the table, there's a script manager code for Japanese which means I set encodingID=1. Now I need to encode the string itself in a "single byte string" encoding of Japanese, which as far as I'm aware doesn't exist.
What's going on here? Why do the Macintosh script manager codes even exist? Perennial question: does anything actually use them in practice, or should all non-Latin stuff be restricted to platformID=3 entries?
Tagged:
1
Comments
-
Platform ID=3 only. Photoshop for Windows do not use this names.
Illustrator:
Photoshop:
(Win 8.1)0 -
I tend to just forget about Macintosh names since this is mostly pre-OS X stuff. However, I’ve been told that some obsolete applications on Mac (old versions of Word so far) require Macintosh names, in this case I’d include only MacRoman names and forget about anything else, localized names for Macintosh platform is just broken.
1 -
You can also check FontTools, IIRC it has code to decode (most of?) the non-Unicode name entries.
0 -
What I don't understand is how this could have ever made sense. Why specify that you can write Mac platform strings in Mongolian or Devanagari if there wasn't even a way to correctly encode those strings?0
-
It is possible to encode all that scripts. It is just not 8 bit.
MacOS provides a function to convert a string (NSString or CFString) to different encodings:
0 -
Georg Seifert said:It is possible to encode all that scripts. It is just not 8 bit.
In which case "Macintosh fonts require single byte strings" is completely wrong.
0 -
Yes. I'll check that to be sure.0
-
What I don't understand is how this could have ever made sense. Why specify that you can write Mac platform strings in Mongolian or Devanagari if there wasn't even a way to correctly encode those strings?@Simon Cozens Have a look at http://unicode.org/Public/MAPPINGS/VENDORS/APPLE/
These are legacy Macintosh encodings, before OS X. The ones listed in OT specs can be found there, at least 0 to 11 and 21 to 29. Unless you’re making a font for System 7, Mac OS 8 or Mac OS 9, don’t use these.
That said you may need a couple of names platformID=1, encodingID=0 for some Mac applications to be happy like mentioned before but the other encodingID are useless.
Remember that this part of the specs was written when Unicode was still a new thing.0
Categories
- All Categories
- 43 Introductions
- 3.7K Typeface Design
- 806 Font Technology
- 1.1K Technique and Theory
- 622 Type Business
- 446 Type Design Critiques
- 543 Type Design Software
- 30 Punchcutting
- 137 Lettering and Calligraphy
- 84 Technique and Theory
- 53 Lettering Critiques
- 489 Typography
- 304 History of Typography
- 115 Education
- 70 Resources
- 500 Announcements
- 80 Events
- 105 Job Postings
- 149 Type Releases
- 165 Miscellaneous News
- 271 About TypeDrawers
- 53 TypeDrawers Announcements
- 117 Suggestions and Bug Reports