Hi everyone,
just for the sake of understanding OpenType:
is it at all possible to create a font that has an a and a dieresis and have applications automatically compose the adieresis, although it wouldn’t be in the font as a precomposed glyph?
I understand that it’s possible to have characters composed through mark positioning, but that would require the explicit use of encoded combining marks, and the encoded string would contain the two glyphs separately. While this would work in cases such as phonetic writing with modifiers, it wouldn’t work for normally encoded characters such as the ä.
Or would it?
0
Comments
[Some layout engines do the reverse operation, though. If text is encoded with combining mark sequences, the layout engine will check the font cmap table to find precomposed encodings.]
What we're missing is a (buffered) character level mechanism to map from precomposed diacritics to decomposed base+mark cmap entries, i.e. before one gets to the glyph processing level. This would enable us to make fully decomposed fonts, which would not need to include any precomposed glyphs.
For characters with Unicode normalisation compositions/decompositions, this could in fact be handled entirely at the layout engine level, with no changes to font formats. In the same way that layout engines will currently check cmap entries for matching precomposed mappings for decomposed text sequences, they could check cmap entries for decomposed glyph sequences for precomposed diacritic characters.
For me, the more interesting possibilities exist beyond the fixed set of characters with normalised decompositions, and what I proposed to the OpenType list was a new cmap format in which Unicode characters could be mapped to arbitrary glyph sequences. This would enable one to not only handle canonical decompositions but also things like stroke decompositions. [This is, by the way, one of the functionalities of DecoType's Arabic layout model: the mapping from Unicode characters to decomposed strokes without the need to go first to a precomposed glyph entry.]
But I fully understand why Adobe and others might consider it too late in the day to consider implementing such a mechanism in the context of OpenType, a format with heavy legacy inheritance. The number and diversity of operating systems, layout engines, and applications that would need to implement support for the new mechanism is such that there would be long-term pressure on font makers to avoid making fonts in this way or, at least, to make hybrid fonts with precomposed fallback, thereby diminishing the whole point of the exercise. And as Adobe pointed out at the time, anything that can be handled as decomposed glyphs can also be handled with composites or subroutines, meaning that precomposed glyphs for precomposed diacritics cmap entries can generated with scripting and have minimal impact on font size.
I think the best solution is to have a general set of composing rules, but to allow each font to override them by providing code or precomposed glyphs.
"If the text is Unicode, that shouldn't be an issue if the search function is doing what it should do. The precomposed and decomposed strings are canonically equivalent, and a good search function should normalise to capture both. "
That would indeed be ideal, but Unicode is still a foreign language to most of the TeX world, alas. Seven bits is not enough.