Some common fonts are equipped with countless anchors, up to 30 per glyph. Seems like every combining mark is positioned manually. While that certainly makes sense for high-end or general-purpose text fonts...
I've tried to minimize the number of anchors by having two exclusive sets of anchors for positioning above and below:
<div>[above] and [acute grave dotabove macronabove ...],</div><div><br></div><div>[below] and [ogonek cedilla dotbelow macronbelow ...].</div>
Each base glyph then either contains the single anchor or a full set of anchors from the other class, and each mark glyph contains the single anchor as well as one of the anchors in the extended set.
This way a big part of the base glyphs get away with just "above" and "below", and another big part with just one of the extended sets. I tested fonts produced with this approach and I didn't find any faults as long as the exclusivity of the anchor sets is preserved. I guess some validation tools might raise errors when a glyph doesn't contain all anchors? (or is it just the validator built into FontForge?). Do you know of any dangers with such approach?
Probably the commercial editors come with tools streamlining insertion of anchors, like inserting multiple anchors in the same spot or synchronizing anchor positions between glyphs, I wonder what is their output? For use with FontForge I developed a Python script copying mkmk anchors as mark anchors in derived glyphs, do the big editors do that automatically?
Btw: what about some more exotic anchors like comma-right-above? Do you think all base glyphs should contain one?
Comments
It is possible to create GPOS that uses e.g. a generic 'below-centre' anchor for all below-base marks, and then secondary lookups that provide positioning for specific marks that are exceptions to the generic positioning on some bases. What happens during layout is that the first anchor position is applied, and then the mark is repositioned on specific bases.
In the case of mark repositioning as I described above, you wouldn't want to use subtables, because the whole point is that you want the repositioning to be applied, not skipped. [Hmm. I suppose you could use subtables and put the exceptions before the generic mark positioning. I've not tried it that way.]
_____
I'm not sure about AFDKO, but I think VOLT will give a warning and won't compile if a single kern lookup applies to the same glyphs more than once. Of course, it is possible to kern the same glyphs in different lookups, but unlike mark positioning, OTL kerning is additive, so unless you use subtables (in reverse order, exceptions before classes), you'll end up with each lookup readjusting the results of the previous lookup. I have to do this for some Indic scripts, and it's a bit of a mind-bender, because none of the individual lookups visually provide the correct result, only the sum of their adjustments.
🤪
ignore
statement, then getting a match means finding a marked glyph in a given context.From this I conclude that with pair adjustment positioning, when input matches the first glyph, subsequent subtables are skipped for this glyph?
From that AFDKO docs example, it looks like subtables are automatically created if a single lookup tries to kern the same glyphs more than once. That makes sense, and is the sort of thing our higher level script (courtesy of @karstenluecke) does when converting from our kerning sources to VOLT lookups/subtables.
By 'get a string match for your input', I just mean the lookup does something. The OT Layout engine will run through a glyph run, glyph-by-glyph and lookup-by-lookup until it finds an input that triggers a GSUB or GPOS action in a lookup, and then perform that action. If the lookup is composed of subtables, as soon as that action is performed, the rest of the lookup will be skipped for that particular glyph or glyphs, and the engine will proceed to the next lookup. So...
...that is correct.
In my context of font reconstruction and font repair I can narrow it down to the available base chars and combining chars, and further narrow it to Languages.
Theoretically Unicode has 255 Canonical_Combining_Classes (property ccc). You can filter them down to the ones needed for the combining chars in the font.
E. g.
COMBINING OGONEK ccc=202 (=Attached_Below)
COMBINING DOT BELOW ccc=220 (=Below)
COMBINING DIAERESIS ccc=230 (=Above)
This means Unicode still provides different classes for ogonek and dotbelow. I would follow Unicode and use their names 1:1. The names like "below" are supported, if your programming language can read ccc.
Seems only 18 positions make sense for Latn.
Thanks for the question. Helped me a lot to sort my ideas.