Order of execution of OpenType features

Simon Cozens · December 2019

Just checking something here. The OpenType Cookbook (by Tal Leming) suggests that it's the designers responsibility to order features:

The order in which you list your features is very important. This is the order in which they will be processed.

The Standard, though, suggests that the shaper has the responsibility of determining the order of features (at least for the Devanagari-related features):

The application is expected to process this feature and certain other features in an appropriate order to obtain the correct set of basic forms: 'nukt', 'akhn', 'rphf', 'rkrf', 'pref', 'blwf', 'half', 'pstf', 'cjct'.

The AFDKO documentation is ambiguous, just saying that the shaper will "assemble" the list of features:

Do the following first for the GSUB and then for the GPOS:
Assemble all features (including any required feature) for the glyph run’s language system.
Assemble all lookups in these features, in LookupList order, removing any duplicates. (All features and thus all lookups needn’t be applied to every glyph in the run.)

Is there a canonical understanding of the order in which features are processed?

Khaled Hosny · December 2019

The order of features does not matter, but the order of lookups does. However, the lookups of certain features are always processed in a certain order (for example `ccmp` lookups are processed first), and layout engines don’t usually agree on this (for HarfBuzz you can find this somewhere in the source code, for others you will have to either experiment with fonts or ask).

Simon Cozens · December 2019

OK, that’s confusing - surely the order of features has to matter!

Consider something like smcp and rlig: if you do the rlig first, you’ll replace f i by f_i, and then you may not have a substitution to f_i.sc. But if you do the smcp first, the ligature doesn’t apply.

Jens Kutilek · December 2019

Simon Cozens said:

OK, that’s confusing - surely the order of features has to matter!

Consider something like smcp and rlig: if you do the rlig first, you’ll replace f i by f_i, and then you may not have a substitution to f_i.sc. But if you do the smcp first, the ligature doesn’t apply.

That is because each feature creates an implicit lookup. If you put the lookup definitions before the feature definitions, and just reference the lookups in the features, you can see that it is really the lookup order, not the feature order that matters (for "normal" features as Khaled explained).

Simon Cozens · December 2019

So it sounds like there are two lists of lookup orderings: the shaper first pulls out lookups related to ccmp and possibly the Devanagari features (and possibly other stuff too) and executes them first (in some shaper-defined order), and then the remaining lookups are processed in the (designer-specified) order they appear in the table.

Do any shapers also have a list of features they pull out to execute at the end of processing?

Thomas Phinney · December 2019

I haven’t ever heard of a shaper that pulls out stuff to execute AFTER the stuff they don’t explicitly worry about. If they do something other than “order of lookups in the font,” they all do it in “their own order” and then do anything else, not previously specified, afterwards—in the order those lookups are in the font.

Georg Seifert · December 2019

Some features have a specific order. The biggest group are all features connected to Indic script. They are executed in the order as written in the spec.
Then there is 'rvrn'. It should always be executed first (that makes it unusable for general purpose feature variation substitution).
I just did some quick test about the 'ccmp'. Indesign respects the lookup order but Safari isn’t.

Simon Cozens · December 2019

OK, digging around in the Harfbuzz source I found references to "the spec", which I assumed would be the OpenType spec, but actually is the Microsoft Script Development Spec. This does define an expected order of processing for features for different scripts:

Regardless of the model an application chooses for supporting layout of standard scripts, Uniscribe requires a fixed order for executing features within a run of text to consistently obtain the proper basic form. This is achieved by calling features one-by-one in the standard order listed below.

Uniscribe by default processes features in the order ccmp, liga, clig, dist, kern, mark, mkmk. Harfbuzz does it in the order rvrn, (ltra,ltrm)/(rtla,rtlm), frac, numr, dnom, rand, trak, HARF, BUZZ, abvm, blwm, ccmp, locl, mark, mkmk, rlig, and then either (calt, clig, curs, dist, kern, liga, rclt) or vert. Then user-specified features come after that.

The Script Development Spec specifies additional feature orderings for USE scripts, Arabic, Buginese, Hangul, Hebrew, different Indic scripts, Javanese, Khmer, Lao, Myanmar, Sinhala, Syriac, Thaana, Thai and Tibetan.

What's fascinating is that nobody seemed to know that. :-)

Adam Twardoch · December 2019

Generally, it works like this:

First, the layout engine reads the GSUB features for the current languagesystem of the text run, and determines which features it should apply. It classifies the features into groups:

- Pre-shaping (ccmp, rvrn, locl)
- Shaping (script-specific)
- User-controllable

For each group, shapers have their own rules how to apply them: either the list of lookups associated with all features enabled for a given group are pulled and executed in the order of lookups, or the lookups associated with each feature are pulled or executed, in a predefined order of features.

In a way you could say that it's always a list of enabled that is pulled, in the order defined in the font but then some lookup groups are resorted (moved to the to of the list)

Then it does GPOS, analogically.

The FEA syntax uses the ordering of feature definitions to implicitly control the order of lookups. In FEA you can create lookups explicitly, but of you don't, lookups are created implicitly inside feature definitions

Theunis de Jong · December 2019

Suppose you have a ligature f_i and a small caps feature. A font might need a smallcap FI if it also has a hardcoded Unicode glyph fi, but if it hasn't, surely you don't need to include an explicit small cap f_i.smcap too? (And for every other ligature as well...) So there must be some rule "scap goes first, then liga", right?

Using FreeType I wrote myself a small feature tester, and since I don't know the "official" order, I just apply them in the order given on the command line. The results vary wildly with different ordering for some feature combos.

It should totally be possible to make any program expose its internal workings with a specially crafted feature file; that's been on my Nothing Else To Do list for some time now.

Mark Simonson · December 2019

So there must be some rule "scap goes first, then liga", right?

I don’t know if there is a rule per se, but that is a good way to do it, and I think most do it this way. (Also, it’s “smcp” not “scap”.)

Also, you don’t need both /f_i and /fi in a font. Using just /fi is common practice, rather than /f_i. (And using /fi as the sub for f i is okay in liga.)

Thomas Phinney · December 2019

I put them both in, but mostly out of habit. The only reason to do it, is to get better underlying text representation for PDFs created from print streams without access to the original font ... surely a real trivia point these days.

John Hudson · December 2019

Some background:

As initially spec'd, OpenType Layout would have been processed purely according to lookup order — i.e. as if all features were applied simultaneously —, with the font maker being entirely responsible for this and for compatibility with the expectations of layout engines regarding feature ordering. In other words, the lookups for certain GSUB features would need to be ordered in a particular way for complex scripts to ensure that they would be processed at the right time relative to glyph reordering and tracking operations needed for those scripts.

However, when Microsoft received initial OpenType complex script fonts from some developers, they found that the lookups were not correctly ordered. Instead of sending the fonts back and asking the developers to fix them, Microsoft decided that font makers couldn't be trusted to get this right, and so they changed the Unscribe shaping engines to apply some key features in fixed order, overriding the font lookup ordering.

This set a precedent for some features to be applied consecutively rather than simultaneously, and for the division of features into processing blocks. Subsequently when new features were proposed and registered, these needed to be considered in terms of processing blocks and recommendations made regarding when and how the features should be applied. So, for example, when I proposed the 'locl' feature, this seemed to be something that should be applied early, immediately as the script and language itemisation of the text run is completed, as the feature is intended to set up fresh input glyphs for subsequent lookups in downstream features.

A few years ago, I set out to document* what I thought the feature processing blocks are — or should be —, since up until then these were mostly presumed and/or specific to individual shaping engines. The categorisations I came up with turned out to be very similar to those that Andrew Glass used when he was inventing the Universal Shaping Engine.**

_____

* Enabling Typography: towards a general model of OpenType Layout

** The Universal Shaping Engine goes some way to restoring the original intent of OpenType Layout, in that only a few features have strict rules about when they are processed and what the expected input and output is for reordering and tracking purposes. Far more aspects of layout in the USE model are back in the responsibility of the font maker, including even processing of control characters such as U+200C ZERO WIDTH NON-JOINER and U+200D ZERO WIDTH JOINER within lookups, which requires font makers to be familiar with use of these characters as specified by the Unicode Standard.

Adam Twardoch · December 2019

To reiterate what John has said:

The layout engine processes first GSUB, and then GPOS. It splits features in each table into groups. For some complex scripts, it forms many “groups of one” for the script-specific features. For European scripts, it only forms 2 groups or so. The last group typically contains all user-controllable features.

Each feature group is processed as a step. Within each step, the layout engine determines which lookups are associated, in the given font, with the features in the group. It builds the list of lookups in the font order, and applies them. Each lookup goes through the text run and performs the actions, then the result is passed to the next lookup.

I believe that for European scripts, for GSUB, the layout engine first applies the lookups associated with rvrn, locl and ccmp (in the order of lookups), and then applies the lookups associated with all the other enabled features, in the order of lookups.

https://docs.microsoft.com/en-us/typography/ (the Script development specs) documents this behavior — but unfortunately, these specs are not always very clearly written.

For example https://docs.microsoft.com/en-us/typography/script-development/standard makes no mention of "rvrn" and it also does not mention any user-controlled (discretionary) features, so it’s not really easy to understand how the process works.

And there is an important point:

The GSUB and GPOS tables store script/languagesystems, features and lookups. Script/languagesystems and features only point to lookups, the lookups contain the actual processing rules. The lookups have a specific order, the order of the feature entries in those tables does not matter.

But in the FEA language (the Adobe FDK for OpenType syntax) — which is a popular, but not the only, way to define OpenType features in source form — the order of the feature blocks does matter, because the order of the feature blocks implicitly controls the order of lookups.
Basically, if you see:

feature smcp {
something;
} smcp;

in reality, it’s:

feature smcp {
lookup smcp {
something;
} smcp;
} smcp;

A feature block implicitly creates a lookup block. And if you change the type of the rule (e.g. ligatures, then simple substitutions, it may create multiple lookups. But you can also define lookups explicitly inside the feature blocks, and also outside the feature blocks. In the end, there is a list of lookups in a particular order, and that is the order you’ll get in the font.

For example, if you write:

lookup gsub1 {
sub f by f.smcp;
sub i by i.smcp;
} gsub1;

lookup gsub2 {
sub f i by f_i.liga;
} gsub2;

feature liga {
lookup gsub2;
} liga;

feature smcp {
lookup gsub1;
} smcp;

then in the compiled font, the gsub1 lookup will be written before the gsub2 lookup, even though the smcp feature block comes after the liga feature block.

Order of execution of OpenType features

Comments

Categories