Order of execution of OpenType features

Just checking something here. The OpenType Cookbook (by Tal Leming) suggests that it's the designers responsibility to order features:

The order in which you list your features is very important. This is the order in which they will be processed.

The Standard, though, suggests that the shaper has the responsibility of determining the order of features (at least for the Devanagari-related features):

The application is expected to process this feature and certain other features in an appropriate order to obtain the correct set of basic forms: 'nukt''akhn''rphf''rkrf''pref''blwf''half''pstf''cjct'

The AFDKO documentation is ambiguous, just saying that the shaper will "assemble" the list of features:

Do the following first for the GSUB and then for the GPOS:
Assemble all features (including any required feature) for the glyph run’s language system.
Assemble all lookups in these features, in LookupList order, removing any duplicates. (All features and thus all lookups needn’t be applied to every glyph in the run.)

Is there a canonical understanding of the order in which features are processed?


  • The order of features does not matter, but the order of lookups does. However, the lookups of certain features are always processed in a certain order (for example `ccmp` lookups are processed first), and layout engines don’t usually agree on this (for HarfBuzz you can find this somewhere in the source code, for others you will have to either experiment with fonts or ask).
  • OK, that’s confusing - surely the order of features has to matter!

    Consider something like smcp and rlig: if you do the rlig first, you’ll replace f i by f_i, and then you may not have a substitution to f_i.sc. But if you do the smcp first, the ligature doesn’t apply.

  • Jens KutilekJens Kutilek Posts: 244
    edited December 2019
    OK, that’s confusing - surely the order of features has to matter!

    Consider something like smcp and rlig: if you do the rlig first, you’ll replace f i by f_i, and then you may not have a substitution to f_i.sc. But if you do the smcp first, the ligature doesn’t apply.

    That is because each feature creates an implicit lookup. If you put the lookup definitions before the feature definitions, and just reference the lookups in the features, you can see that it is really the lookup order, not the feature order that matters (for "normal" features as Khaled explained).
  • So it sounds like there are two lists of lookup orderings: the shaper first pulls out lookups related to ccmp and possibly the Devanagari features (and possibly other stuff too) and executes them first (in some shaper-defined order), and then the remaining lookups are processed in the (designer-specified) order they appear in the table.

    Do any shapers also have a list of features they pull out to execute at the end of processing?
  • I haven’t ever heard of a shaper that pulls out stuff to execute AFTER the stuff they don’t explicitly worry about. If they do something other than “order of lookups in the font,” they all do it in “their own order” and then do anything else, not previously specified, afterwards—in the order those lookups are in the font.
  • Some features have a specific order. The biggest group are all features connected to Indic script. They are executed in the order as written in the spec. 
    Then there is 'rvrn'. It should always be executed first (that makes it unusable for general purpose feature variation substitution).
    I just did some quick test about the 'ccmp'. Indesign respects the lookup order but Safari isn’t.

  • OK, digging around in the Harfbuzz source I found references to "the spec", which I assumed would be the OpenType spec, but actually is the Microsoft Script Development Spec. This does define an expected order of processing for features for different scripts:

    Regardless of the model an application chooses for supporting layout of standard scripts, Uniscribe requires a fixed order for executing features within a run of text to consistently obtain the proper basic form. This is achieved by calling features one-by-one in the standard order listed below.

    Uniscribe by default processes features in the order ccmp, liga, clig, dist, kern, mark, mkmk. Harfbuzz does it in the order rvrn, (ltra,ltrm)/(rtla,rtlm), frac, numr, dnom, rand, trak, HARF, BUZZ, abvm, blwm, ccmp, locl, mark, mkmk, rlig, and then either (calt, clig, curs, dist, kern, liga, rclt) or vert. Then user-specified features come after that.

    The Script Development Spec specifies additional feature orderings for USE scripts, Arabic, Buginese, Hangul, Hebrew, different Indic scripts, Javanese, Khmer, Lao, Myanmar, Sinhala, Syriac, Thaana, Thai and Tibetan.

    What's fascinating is that nobody seemed to know that. :-)
  • Generally, it works like this: 

    First, the layout engine reads the GSUB features for the current languagesystem of the text run, and determines which features it should apply. It classifies the features into groups: 

    - Pre-shaping (ccmp, rvrn, locl)
    - Shaping (script-specific)
    - User-controllable

    For each group, shapers have their own rules how to apply them: either the list of lookups associated with all features enabled for a given group are pulled and executed in the order of lookups, or the lookups associated with each feature are pulled or executed, in a predefined order of features.  

    In a way you could say that it's always a list of enabled that is pulled, in the order defined in the font but then some lookup groups are resorted (moved to the to of the list)

    Then it does GPOS, analogically.

    The FEA syntax uses the ordering of feature definitions to implicitly control the order of lookups. In FEA you can create lookups explicitly,  but of you don't, lookups are created implicitly inside feature definitions 

  • Theunis de JongTheunis de Jong Posts: 108
    edited December 2019
    Suppose you have a ligature f_i and a small caps feature. A font might need a smallcap FI if it also has a hardcoded Unicode glyph fi, but if it hasn't, surely you don't need to include an explicit small cap f_i.smcap too? (And for every other ligature as well...) So there must be some rule "scap goes first, then liga", right?

    Using FreeType I wrote myself a small feature tester, and since I don't know the "official" order, I just apply them in the order given on the command line. The results vary wildly with different ordering for some feature combos.

    It should totally be possible to make any program expose its internal workings with a specially crafted feature file; that's been on my Nothing Else To Do list for some time now.
  • So there must be some rule "scap goes first, then liga", right?

    I don’t know if there is a rule per se, but that is a good way to do it, and I think most do it this way. (Also, it’s “smcp” not “scap”.)

    Also, you don’t need both /f_i and /fi in a font. Using just /fi is common practice, rather than /f_i. (And using /fi as the sub for f i is okay in liga.)
  • I put them both in, but mostly out of habit. The only reason to do it, is to get better underlying text representation for PDFs created from print streams without access to the original font ... surely a real trivia point these days.
  • Adam TwardochAdam Twardoch Posts: 449
    edited December 2019
    To reiterate what John has said: 

    The layout engine processes first GSUB, and then GPOS. It splits features in each table into groups. For some complex scripts, it forms many “groups of one” for the script-specific features. For European scripts, it only forms 2 groups or so. The last group typically contains all user-controllable features. 

    Each feature group is processed as a step. Within each step, the layout engine determines which lookups are associated, in the given font, with the features in the group. It builds the list of lookups in the font order, and applies them. Each lookup goes through the text run and performs the actions, then the result is passed to the next lookup. 

    I believe that for European scripts, for GSUB, the layout engine first applies the lookups associated with rvrn, locl and ccmp (in the order of lookups), and then applies the lookups associated with all the other enabled features, in the order of lookups.

    https://docs.microsoft.com/en-us/typography/ (the Script development specs) documents this behavior — but unfortunately, these specs are not always very clearly written. 

    For example https://docs.microsoft.com/en-us/typography/script-development/standard makes no mention of "rvrn" and it also does not mention any user-controlled (discretionary) features, so it’s not really easy to understand how the process works. 

    And there is an important point: 

    The GSUB and GPOS tables store script/languagesystems, features and lookups. Script/languagesystems and features only point to lookups, the lookups contain the actual processing rules. The lookups have a specific order, the order of the feature entries in those tables does not matter.

    But in the FEA language (the Adobe FDK for OpenType syntax) — which is a popular, but not the only, way to define OpenType features in source form — the order of the feature blocks does matter, because the order of the feature blocks implicitly controls the order of lookups. 
    Basically, if you see: 

    feature smcp { 
    } smcp;

    in reality, it’s:

    feature smcp { 
      lookup smcp { 
      } smcp;
    } smcp;

    A feature block implicitly creates a lookup block. And if you change the type of the rule (e.g. ligatures, then simple substitutions, it may create multiple lookups. But you can also define lookups explicitly inside the feature blocks, and also outside the feature blocks. In the end, there is a list of lookups in a particular order, and that is the order you’ll get in the font. 

    For example, if you write: 

    lookup gsub1 { 
      sub f by f.smcp;
      sub i by i.smcp;
    } gsub1;

    lookup gsub2 { 
      sub f i by f_i.liga;
    } gsub2;

    feature liga { 
      lookup gsub2;
    } liga;

    feature smcp { 
      lookup gsub1; 
    } smcp;

    then in the compiled font, the gsub1 lookup will be written before the gsub2 lookup, even though the smcp feature block comes after the liga feature block.
Sign In or Register to comment.