Unicode 1.0 Semantics

The OT spec defines encoding ID 0 for the Unicode Platform (PID=0) as "Unicode 1.0 semantics". If I'm writing a 'name' table with an entry for PID=0 / EID=0, how would I encode (or, more to the point, restrict) the characters??

I've found this "reconstructed" UnicodeData file that was reverse-engineered by Ken Whistler in 2004, but that's all I have to go on ...

Would simply dis-allowing characters that are not in Ken's reconstruction be sufficient to abide by "Unicode 1.0 semantics"? ... or is there something more that I need to do??

Another issue: 

If all that is called for is to disallow characters, then ... what is the difference between Encoding ID's 4 and 6 for Platform 0?

* EID=4 is "Unicode 2.0 and onwards semantics, Unicode full repertoire ('cmap' subtable formats 0, 4, 6, 10, 12)" and

* EID=6 is "Unicode full repertoire ('cmap' subtable formats 0, 4, 6, 10, 12, 13)"

Aside from allowing cmap subtable 13 for EID=6, is there any difference between these two? Do I need to (groan) plow through an ancient Unicode standard?

BTW, I've found significant number of Open Source fonts that still use EID 0 and 4 for the Unicode platform. However, they pale in comparison with cmap tables for the Windows platform. Windows/Unicode BMP (PID=3, EID=1) is by far the most prevalent (96.7% of the fonts have that for the 'cmap' table), while only 16.7% carry a Windows/Unicode Full 'cmap' table.


  • Without knowing why you would want to write a 'name' table with an entry for PID=0/EID=0, it should be noted that EIDs 0, 1, and 2 are all considered deprecated "for all practical purposes in current fonts": https://docs.microsoft.com/en-us/typography/opentype/spec/name#platform-specific-encoding-and-language-ids-unicode-platform-platform-id--0

    Regarding the difference between EIDs 4 and 6 under PID 0: from the same portion of the OpenType specification linked to above: "A new encoding ID for the Unicode platform is also sometimes assigned when new 'cmap' subtable formats are added to the specification, so as to allow for compatibility with existing parsers. For example, when 'cmap' subtable formats 10 and 12 were added to the specification, encoding ID 4 was added as well, and when 'cmap' subtable format 13 was added to the specification, encoding ID 6 was added."

    So PID=0/EID=6 can be used as an indicator that the font was made some time after cmap subtable format 13 was added to the specification.
  • ClintGossClintGoss Posts: 17
    Thank you so much Joshua! ... you've saved me from an extensive session of hair-pulling.
Sign In or Register to comment.