A client is requesting that I add the glyph schwagrave and Schwagrave to a custom font I made for them. Neither of these glyphs has proper unicode. What would be best practice to provide an encoding?
An encoding is required otherwise the unencoded glyph will generate an error message when making an Accessable PDF.
I suppose I could use the Private Use Area of Unicode, but was wondering if there was a better alternative.
0
Comments
1. Using dynamic mark positioning with anchors in the GPOS mark feature. This has the benefit of alllowing arbitrary combinations of base glyphs and marks, e.g. schwa+dieresiscomb, schwa+brevecomb, etc. as well as the desired schwa+gravecomb.
2. Using precomposed diacritic glyphs mapped from the input sequence in the GSUB ccmp feature. This has the benefit of more easily integrating into kerning with appropriate distance of the mark from preceding letters such as T V W Y.
For either solution, the font will need to support the combining mark characters (U+0300 for the combining grave), and you will likely want .cap variants of the combining marks for use above uppercase letters.
Unicode People Should Do Something 😆
Some folks with affected languages have taken the Unicode approach as an offense against their language and culture, and attempted workarounds that seem doomed to failure (e.g. https://en.wikipedia.org/wiki/Tamil_All_Character_Encoding)
For example, I don't think it's true that Accessible PDFs require atomic encodings for characters if you put the source "words" into ActualText attributes. If your PDF creation library isn't doing that, that's where the problem is; not with Unicode.
So I'd be interested in which situations these African font users are seeing where having an atomic encoding would make a difference - because those situations are software bugs.
My client wants a prcomposed glyph that they can select from the Glyph palette. The font will only be used in a tight universe of users so I will proceed with using a PUA codepoint so the glyph behaves when making an Accesible PDF document
If the purpose of an Accessible PDF is to e.g. support screen readers for vision impaired users, then using PUA encoding might only technically overcome the PDF-creation hurdle: it won’t make the resulting PDF actually accessible, since a screen reader will have no way to know how to interpret the non-standard codepoint.
sub Schwa grave by Schwagrave;
sub schwa grave by schwagrave;
} ccmp;
Thomas is right.
Madness is doing the same thing over and over again and expecting a different result. Hopefully Unicode can evolve.
sub Schwa grave by Schwagrave;
sub schwa grave by schwagrave;
} ccmp;
It may be tempting to double-encode the /grave/ glyph as U+0060 and U+0300, but combining marks are usually zero-width, so better handled as separate glyphs.
This year, CLDR also formed a new subcommittee for Digitally Disadvantaged Languages. Unfortunately, their meeting schedule regularly conflicts with other things for me, so I have not been able to be actively involved.
This is Fred Smeijers’ Quadraat typeface, which the LRB has been using since soon after it was first released by FontFont in the 1990s, in PS Type 1 format. It’s nice to see that as well as beng updated to OT format it has been extended with additional diacritics such as ẹ (U+1EB9) and ọ (U+1ECD), but note that the combining acute accent is misplaced after the latter in the sequence ọ́. While precomposed ccmp mappings can be convenient for known targets such as James needs to support, dynamic GPOS positioning is far more flexible in being able to handle arbitrary and unanticipated diacritics such as this.]
I suppose this is because these combinations are more difficult to handle using base+mark sequences, but it's only a guess. Maybe John Hudson or Denis Jacquerye could provide the exact information.
One example of diacritic over base letter is uniA7CB and uniA7CC, approved for inclusion in the next Unicode version and used in Luiseño and Cupeño languages:
Regarding the ccmp feature, FontLab 8 is able to automatically build the code based on combinations it finds in the font. You may need to expand the code in order to catch additional composites, but it's a very good start.
Again, if it matters that one has a multi-character representation and that other doesn't, something's gone wrong with your client software. File a bug there.
Ironically, the font displaying Ɛ́ here doesn’t handle it and the combining acute is mispositionned. Users understandingly confuse this, or the lack of input methods, with Unicode not supporting their language.
I say that because (1) the screen shots above seem to show the same base glyph and diacritic, working in one browser/platform but not the other, yet (2) on my system with Chrome on Mac, I get the same Noto font as the screenshots for regular text, but a completely different base glyph and diacritic glyph for that accented combo. Huh.
Damn, if we are not 100% sure what is happening, what hope does the average user have?!
Moreover, on a cultural level the current situation also reveals a problematic inherited ‘colonialism’ aspect. Need French accented ch.s? Here they are. Need Spanish or Portugese ones? Here they are. Need German, Turkish, Polish, Vietnamese? Here they are. “But these have been pre-Unicode legacy encodings, hence…” is merely a cheep excuse. Now comes in some African guy: “hey, where are the ones I need?” “Help yourself” replies the (mainly English speaking) tech community. (Anglophone natives don’t feel a need for accents at all, that has some merits of its own but it doesn’t seem to help the rest of the world.)
I use a custom keyboard map that produces a bit of Andreas idea:
Or am I missing something here?
Which keyboard layout has a schwa key but no combining grave?