GPOS: How many different anchors are necessary?

Adam Jagosz · January 2019

Some common fonts are equipped with countless anchors, up to 30 per glyph. Seems like every combining mark is positioned manually. While that certainly makes sense for high-end or general-purpose text fonts...

I've tried to minimize the number of anchors by having two exclusive sets of anchors for positioning above and below:

<div>[above] and [acute grave dotabove macronabove ...],</div><div><br></div><div>[below] and [ogonek cedilla dotbelow macronbelow ...].</div>

Each base glyph then either contains the single anchor or a full set of anchors from the other class, and each mark glyph contains the single anchor as well as one of the anchors in the extended set.

This way a big part of the base glyphs get away with just "above" and "below", and another big part with just one of the extended sets. I tested fonts produced with this approach and I didn't find any faults as long as the exclusivity of the anchor sets is preserved. I guess some validation tools might raise errors when a glyph doesn't contain all anchors? (or is it just the validator built into FontForge?). Do you know of any dangers with such approach?

Probably the commercial editors come with tools streamlining insertion of anchors, like inserting multiple anchors in the same spot or synchronizing anchor positions between glyphs, I wonder what is their output? For use with FontForge I developed a Python script copying mkmk anchors as mark anchors in derived glyphs, do the big editors do that automatically?

Btw: what about some more exotic anchors like comma-right-above? Do you think all base glyphs should contain one?

John Hudson · January 2019

For a Latin face, most marks can be positioned with an optically centred above and an optically centred below anchor. There are a few marks that require specific anchors, though, such as the right-side above dot U+0358. It can also help to have separate anchors for the ogonek and cedilla since these sometimes attach to left or right of centre.

Adam Jagosz · January 2019

Exactly, so then would it be completely valid to have a general "bottom" anchor and specific "bottom-center", "ogonek" and "cedilla" anchors, and use either one or the other in a given base glyph? Normally I guess I would just have "bottom", "cedilla" and "ogonek" in every base glyph, one anchor class less, but more anchor points throughout all glyphs.

I mentioned anchor classes like "grave", "acute" for cases like /i/j/l where it might be an option to shift the anchor sideways instead of having a specialized compact form of the accent. "Dotabove" might be positioned differently than centre for old orthography of Irish/Gaelic.

John Hudson · January 2019

To confirm, we're talking about GPOS mark positioning, not anchors for building composites within a font tool, right? In this case, we should be clear that we're talking about the combining mark characters, not the legacy spacing accents, so should be talking about e.g. /acutecomb/ not /acute/. [I strongly encourage composites to also be built using the combining mark glyphs, since this means the same anchors relate to both composites and GPOS.]

It is possible to create GPOS that uses e.g. a generic 'below-centre' anchor for all below-base marks, and then secondary lookups that provide positioning for specific marks that are exceptions to the generic positioning on some bases. What happens during layout is that the first anchor position is applied, and then the mark is repositioned on specific bases.

Adam Jagosz · January 2019

Yes, I build composites from the combining mark glyphs, so I am speaking of anchors used for both GPOS and building composites. That feels right because it provides consistency regardless of Unicode normalization form and its layout handling. I don't see a reason to use different anchors for both, but maybe there is one in some situations? I name the anchors without indicating the "combining" keyword, though, that would feel like overkill.

This has been my mantra: Doesn't work? Put it in a separate lookup! I didn't apply a .FEA mindset when trying to figure out anchors, so I missed that obvious solution. Works like a charm. Thank you so much!

By the way, what is the purpose of subtables? Do I understand correctly that the normal use of a lookup is to:

activate a list of substitutions or positioning,
perform a substitution on a glyph already substituted (in another lookup), which is impossible in the same lookup*,
perform subs of a given type (all subs in a lookup must be of the same type)?

So what is actually the application of a subtable? It doesn't seem to solve any problem. (Length...?)

*A separate question: What about positioning? Clearly repositioning marks is not permitted/doesn't work within the same lookup; but I think it is possible to kern a glyph repeatedly against several mutually non-exclusive classes, with the last entry overwriting the previous ones, like this:

<div>pos @A [ a b c ] -40;</div><div><br></div><div>pos @A [ a ] -30;</div><div><br></div><div>pos @A [ c ] -20;</div>

I noticed, though, that trying to reset the kerning to 0 with this method doesn't work. When the classes grow big, it's tempting to reuse them instead of manually creating new ones as set operation products. Is that bad practice?

John Hudson · January 2019

If you use lookup subtables, as soon as you get a string match for your input in one of the subtables, all subsequent subtables are skipped and layout proceeds to the next lookup. So they're a useful way to make your font more efficient. I use them in a variety of ways in both GSUB and GPOS, e.g. for different contexts.

In the case of mark repositioning as I described above, you wouldn't want to use subtables, because the whole point is that you want the repositioning to be applied, not skipped. [Hmm. I suppose you could use subtables and put the exceptions before the generic mark positioning. I've not tried it that way.]
_____

A separate question: What about positioning? Clearly repositioning marks is not permitted/doesn't work within the same lookup; but I think it is possible to kern a glyph repeatedly against several mutually non-exclusive classes, with the last entry overwriting the previous ones...

I'm not sure about AFDKO, but I think VOLT will give a warning and won't compile if a single kern lookup applies to the same glyphs more than once. Of course, it is possible to kern the same glyphs in different lookups, but unlike mark positioning, OTL kerning is additive, so unless you use subtables (in reverse order, exceptions before classes), you'll end up with each lookup readjusting the results of the previous lookup. I have to do this for some Indic scripts, and it's a bit of a mind-bender, because none of the individual lookups visually provide the correct result, only the sum of their adjustments.
🤪

Adam Jagosz · January 2019

John Hudson said:

If you use lookup subtables, as soon as you get a string match for your input in one of the subtables, all subsequent subtables are skipped and layout proceeds to the next lookup. So they're a useful way to make your font more efficient. I use them in a variety of ways in both GSUB and GPOS, e.g. for different contexts.

Thanks for the explanation. So what do you mean by "get a string match for your input"? I understand subtables can be created using the ignore statement, then getting a match means finding a marked glyph in a given context.

In the AFDKO docs I found an information that attempting to rekern results in splitting the code into subtables:

 pos [Ygrave] [colon semicolon] -55;   # [line 99]   In first subtable
 pos [Y Yacute] period -50;            # [line 100]  In first subtable
 pos [Y Yacute Ygrave] period -60;     # [line 101]  In second subtable

And it is stated as follows:

The pair (Ygrave, period) will have a value of 0 if the above example comprised the entire lookup, since Ygrave is in the coverage (i.e. union of the first glyphs) of the first subtable.

From this I conclude that with pair adjustment positioning, when input matches the first glyph, subsequent subtables are skipped for this glyph?

John Hudson · January 2019

I can't really comment on AFDKO, because I don't use it. I had thought, though, that the ignore statement would be like the EXCEPT statement in a VOLT lookup context, which is different from the mechanism to trigger a subtable (in VOLT, that's done by the syntax of the lookup names).

From that AFDKO docs example, it looks like subtables are automatically created if a single lookup tries to kern the same glyphs more than once. That makes sense, and is the sort of thing our higher level script (courtesy of @karstenluecke) does when converting from our kerning sources to VOLT lookups/subtables.

By 'get a string match for your input', I just mean the lookup does something. The OT Layout engine will run through a glyph run, glyph-by-glyph and lookup-by-lookup until it finds an input that triggers a GSUB or GPOS action in a lookup, and then perform that action. If the lookup is composed of subtables, as soon as that action is performed, the rest of the lookup will be skipped for that particular glyph or glyphs, and the engine will proceed to the next lookup. So...

From this I conclude that with pair adjustment positioning, when input matches the first glyph, subsequent subtables are skipped for this glyph?

...that is correct.

Adam Jagosz · January 2019

Hmm. Until now I've been just using the feature interpreter/compiler built into FontForge, while relying on the Adobe docs, since they were easily available and extensive. And I did notice some deviations from AFDKO, I think.

I tried to give VOLT a shot a while ago, but I found the UI quite unintuitive. I retried just now and I see it's a whole nother thing? Is there a way to convert a .fea file into a .vtp project?

Right now I'm trying out AFDKO. It does raise warnings that FontForge doesn't, and it seems to produce much smaller (better optimized?) output files than FF. For instance, it removes repeated pairs from kerning, which FF probably does not. However the font compiled with FDK is garbled in ways unrelated to the raised warnings. For instance, some mark anchors are shifted in unexpected ways for glyphs other than mentioned in the error logs. The sources that compile all right with FF don't work with FDK out of the box.

Adam Jagosz · January 2019

Okay, I remembered why I developed that "two exclusive anchor class sets" approach. FontForge takes whichever anchor is first defined in the base glyph for building composites, regardless of lookup order. So by having those two separate sets and using only one in a given base, I was able to have the same anchors for building composites in FF and for exported fonts.

If the normal approach is used (one lookup with general classes, another with specific anchors like cedilla, ogonek), fixing the order of anchors for affected base glyph in the project file (in a text editor) helps, but really I think this could be fixed in the program itself. I filed an issue.

John Hudson · January 2019

Is there a way to convert a .fea file into a .vtp project?

They're both text formats, so yes, it's possible. I'm currently looking at going the other way, and eventually hope to be able to roundtrip (although the .vtp includes information that .fea doesn't store, such as glyph encoding and GDEF classification).

Adam Jagosz · January 2019

I've found out how FontForge deals with kerning of overlapping classes. In the FontForge flavor of .fea, you could say, of several kerning statements for a glyph (e.g. found in two classes) within one lookup, the last non-zero positioning statement is taken and the rest discarded (kind of opposite of what happens when subtables are created). Statements in different lookups are treated normally (not optimized away), and the kerning is additive.

Inconsistent with FDK, and worse still, undocumented. I'm taking advantage of this behavior as for now, but it's going to hurt when I decide to switch to a different editor or start using FDK.

FWIW, for overlapping classes within one lookup, AFDKO creates subtables, and effectively uses the first positioning statement for a given glyph, and not the last like FF. However when kerning is repeated in another lookup, it replaces the previous value (it's not additive).

Helmut Wollmersdorfer · April 2021

How many different anchors is a good question.

In my context of font reconstruction and font repair I can narrow it down to the available base chars and combining chars, and further narrow it to Languages.

Theoretically Unicode has 255 Canonical_Combining_Classes (property ccc). You can filter them down to the ones needed for the combining chars in the font.

E. g.
COMBINING OGONEK ccc=202 (=Attached_Below)
COMBINING DOT BELOW ccc=220 (=Below)
COMBINING DIAERESIS ccc=230 (=Above)

This means Unicode still provides different classes for ogonek and dotbelow. I would follow Unicode and use their names 1:1. The names like "below" are supported, if your programming language can read ccc.

Seems only 18 positions make sense for Latn.

Thanks for the question. Helped me a lot to sort my ideas.

RichardW · April 2021

While combining classes may be a good first approximation, there are a number of complications. Hebrew has almost a different combining class for every diacritic, which seems to be overkill. Many Indic scripts assign almost every mark to ccc=0, and indic_positional_class isn't fine enough, though it's a good starting point. Multiple diacritics combine differently, so beyond yes/no the canonical combining classes are a poor guide for mark to mark positioning. You may need a judicious mix of horizontal and vertical stacking for marks above.

Additionally, 'non-interacting' marks may interact. Below and Attached_Below may interact, and I am not aware of any guarantee as to which orders a font will have to handle.

Florian Pircher · April 2021

John Hudson said:

(although the .vtp includes information that .fea doesn't store, such as glyph encoding and GDEF classification).

FEA can specify GDEF: http://adobe-type-tools.github.io/afdko/OpenTypeFeatureFileSpecification.html#9.b

GPOS: How many different anchors are necessary?

Comments

Categories