Dreaming of a new feature syntax

Simon Cozens
Simon Cozens Posts: 752
edited March 2020 in Font Technology
I have recently been looking both at some complex fonts and at the possibility of programmatically manipulating features at a high level, and it has left me feeling like there could be some ways that the venerable AFDKO feature language could be improved. The .fea format feels comprehensive, perhaps because we're all so used to it, but looking at fonts generated other ways I am finding features which are either very hard or very ugly to express in the language (class-based substitutions being one of them). It isn't a very ergonomic language.

One reason why it isn't ergonomic is because it's pretty low level - it forces you to represent things in a way that makes it easy to represent the features in the OTF binary, rather than in terms of expressing what you want to do. To give a concrete example, you can do this to swap two glyphs around (example 1):

lookup SwapAB { sub A by B; sub B by A; } SwapAB;
sub A' lookup SwapAB B' lookup SwapAB;
but you "can't" do this (example 2):

sub A B by B A;
I say "can't" in quotes because the only thing that's stopping you is the feature file syntax. A more sophisticated compiler would be able to transform (2) into (1), generating the auxiliary lookup for you automatically.

So now I am thinking about what else I would like in my dream feature language. So far I have come up with:
  • Automatic many-to-many substitutions as above.
  • Distributive dot-suffixing. What this means is that given @alpha = [a b c d e f...]; then @alpha.sc automatically expands to [a.sc b.sc c.sc d.sc e.sc f.sc...];. Hence:
    feature pnum { sub @digits by @digits.prop; } pnum;
    feature numr { sub @digits by @digits.numr; } numr;
    feature dnom { sub @digits by @digits.dnom; } dnom;
    
  • Dot-suffixing would also apply to ranges, so you could say
    feature smcp { sub [a-z] by [a-z].sc; } smcp;
  • Placeholder variables Each "captured" glyph in an input substitution would also be expressable using a numbered variable $1 $2 $3... (similar to regular expressions). To insert a tatweel between two glyphs:
    sub @kasra ayin.fina by $1 tatweel $2;
    
  • Placeholders would also dot-suffix:
    sub uni17D2 uni1780 by $2.coeng;
  • Hierarchical matching. This one needs a bit of thought to establish how it interacts with prefix/input/suffix marking, but something like
        kaf.init {
            mem.medi -> $1.KafMemInit $2.KafMemMedi;
            baa.medi alif.fina -> $1.KafBaaInit $2.KafBaaInit $3;
            heh.fina -> $1.KafHeh $2.KafHeh;
            lam.medi {
                mem.medi -> $1.KafLam $2.KafLamMemMedi $3.LamMemMedi;
                alif.fina -> $1.KafLam $2.KafLamAlif $3;
                _ -> $1.KafLam $2.KafLamMedi $3;
            }
        }
    
    is a darned sight easier to reason about than writing it all out in full. (In my head I'm starting to prefer the arrow syntax for substitution instead of "sub ... by ...", but that's just me.)
Some of these are obviously backwards-compatible extensions with the existing language; others would need a brand new format.

I wonder what other niggles readers have about feature files that would be dealt with by being able to express things more ergonomically.
«1

Comments

  • To give a concrete example, you can do this to swap two glyphs around (example 1):

    lookup SwapAB { sub A by B; sub B by A; } SwapAB;
    sub A' lookup SwapAB B' lookup SwapAB;
    
    but you "can't" do this (example 2):

    sub A B by B A;
    
    I say "can't" in quotes because the only thing that's stopping you is the feature file syntax. A more sophisticated compiler would be able to transform (2) into (1), generating the auxiliary lookup for you automatically.

    It would be great to see more powerful ways to code OpenType features.

    I'm certainly not used to FEA syntax (we use our own syntax), but I think this would fail because the first example also changes BA to AB while example 2 won't, so I think they are not the same.
  • I agree it seems like there's a world to be gained here. Now that people are beginning to see the use for variable fonts, perhaps improving fonts' feature syntax should be the next step in font technology, although personally I would also love to see non-linear interpolation in variable fonts.

    Beginner question: What does the .fea format do, if fonts can also be generated in other ways? And then what is the underlying syntax of those other ways?
  • Simon Cozens
    Simon Cozens Posts: 752
    edited March 2020
    I think this would fail because the first example also changes BA to AB while example 2 won't, so I think they are not the same.

    Nope: the "sub" rule means that the lookup is only called if "A' B'" matches; it won't be called for "B' A'". That was a good demonstration of how these manually generated auxiliary lookups for contextual substitutions are really unclear!
    I agree it seems like there's a world to be gained here. Now that people are beginning to see the use for variable fonts, perhaps improving fonts' feature syntax should be the next step in font technology, although personally I would also love to see non-linear interpolation in variable fonts.

    Yes, I had completely forgotten to mention variable fonts, although they are on my list. There are already discussions about how to adapt the .fea format to VF, but no conclusions. The best idea at the moment is just to write separate feature files for each master and have the compiler interpolate them, but I think we can do better.
    Beginner question: What does the .fea format do, if fonts can also be generated in other ways? And then what is the underlying syntax of those other ways?
    Adobe feature files are just one way of specifying what lookups should be placed in GPOS/GSUB tables. Theoretically a font editing app could actually write out the tables directly from their internal representation to the OpenType binary, but I don't know any that do that; most use a separate representation - VOLT uses its own format (.vtp files), Monotype's FontDame uses its own format, and so on. They then go through a compilation step to turn that representation into binary tables.

    I am working on a Python library which manipulates these features at a higher level, and translates between the different feature file formats and also the binary format, but it's only at a very early stage. If I get that done, though, adding another feature syntax shouldn't be difficult.
  • I am working on a Python library which manipulates these features at a higher level, and translates between the different feature file formats and also the binary format, but it's only at a very early stage. If I get that done, though, adding another feature syntax shouldn't be difficult.
    Cool! I was thinking a simple python syntax would work nicely, and personally I don't even think it's necessary to add a new syntax on top of that. Python is simple enough, no?

    But would this allow for substituting, say, ABC for ACB?
  • Well, I had considered writing the features themselves in a Python-like minilanguage (where lookups are functions) but I think it would actually prove more frustrating than useful - people will want to start doing if statements and other things which can't actually be represented inside OpenType GSUB/GPOS tables. (Not until we have a new font format and we embed Python interpreters in our shaping engines, but that's not happening any year soon...)

    No, what I meant is that I'm working on a Python library to be used as part of the font production process, which can arbitrate between text and binary formats for representing features. But the point of that would be for features and lookups to be able to be manipulated programmatically; for example, currently if you want to find all your Urdu language-specific code and duplicate it for Sindhi, you have do it manually, because features are specified as blobs of text. Having a library which can reason about your feature code at a high level would enable you to do that kind of thing.
  • I think this would fail because the first example also changes BA to AB while example 2 won't, so I think they are not the same.

    Nope: the "sub" rule means that the lookup is only called if "A' B'" matches; it won't be called for "B' A'". That was a good demonstration of how these manually generated auxiliary lookups for contextual substitutions are really unclear!

    You are right. That is one of the reasons why I prefer to use a visual OpenType feature designer. Our feature code isn't any better:
    lookup SwapAB {<br>&nbsp; sub A -> B;<br>&nbsp; sub B -> A;<br>}<br><br>lookup CCSwapAB {<br>&nbsp; context A B;<br>&nbsp; sub 0 SwapAB;<br>&nbsp; sub 1 SwapAB;<br>}

  • For some reason the code block isn't working, so here is the code in a quote:
    lookup SwapAB {
      sub A -> B;
      sub B -> A;
    }

    lookup CCSwapAB {
      context A B;
      sub 0 SwapAB;
      sub 1 SwapAB;
    }

  • Adam Jagosz
    Adam Jagosz Posts: 689
    @Jasper de Waard I would really love to see in variable fonts better ways of implementing rotation than either linear approximation with enough intermediate masters to achieve seamless circular motion (if that's even possible) or seamless behavior of two masters at the cost of absolute shape distortion.
  • @Jasper de Waard I would really love to see in variable fonts better ways of implementing rotation than either linear approximation with enough intermediate masters to achieve seamless circular motion (if that's even possible) or seamless behavior of two masters at the cost of absolute shape distortion.

    Based on what Underware is doing with HOI (Higher Order Interpolation), I think variable fonts already support that, however, I don't think many tools are built for it at the moment.
  • Jasper de Waard
    Jasper de Waard Posts: 641
    edited March 2020
    @Jasper de Waard I would really love to see in variable fonts better ways of implementing rotation than either linear approximation with enough intermediate masters to achieve seamless circular motion (if that's even possible) or seamless behavior of two masters at the cost of absolute shape distortion.
    I think rotation is just one element of the endless opportunities that non-linear (or higher order) interpolation could bring.

     Matthew Smith said:
    Based on what Underware is doing with HOI (Higher Order Interpolation), I think variable fonts already support that, however, I don't think many tools are built for it at the moment.
    FAFAIK, the variable font spec does not support that. (https://forum.glyphsapp.com/t/non-linear-interpolation/8719) Underware must have created their own tools, but this is not something that can be implemented straightforwardly in design apps.

    Sorry for derailing the thread. Maybe we should start a new one!

    Edit: I went ahead and created a new thread: https://typedrawers.com/discussion/3540/non-linear-or-higher-order-interpolation-hoi/p1?new=1
  • You may want to look at what we've done with extending the fea syntax to make it more powerful and efficient:

    https://github.com/silnrsi/pysilfont/blob/master/docs/feaextensions.md

    The new 'feax' is expanded into standard fea using a preprocessor - psfmakefea - then compiled normally (using fonttools). It's all open source and can be integrated into other software.

    One of the most powerful extensions is the ability to write subroutines in python.

    We'd love to see feax supported in various apps, and we'd very much welcome suggestions and improvements. Pull requests to our pysilfont package welcomed! 



  • @Victor Gaultney:  That looks great - the positioning stuff (baseclass) and the integrated Python especially. Would it be worth adding the ability to define classes using python code?
  • I always have to read mark positioning classes and rules two or three times to work out what they're actually saying. Maybe it's just me, but perhaps something like this would be clearer:

        anchors a {
            top <163 460>;
            bottom <162 0>;
            ogonek <262 -12>;
        }
        anchors e {
            top <173 422>;
            bottom <171 0>;
            ogonek <269 22>;
        }
        anchors acutecomb {
            _top <127 427>;
            top <124 604>;
        }
        anchors ogonekcomb {
            _ogonek <162 115>;
        }
    
        attach &ogonek &_ogonek;
        attach &top &_top;
    
    An advantage would be that the anchor definitions don't need to be specified if you're working with UFO3, since they could be pulled directly out of the file.
  • John Hudson
    John Hudson Posts: 3,268
    The way AFKDO, and hence most font tools, handles anchor attachment is glyph-centric — it works by defining corresponding anchors on bases and marks and generates lookups from these —, so in that respect your proposal, Simon, looks like an improvement: it is clearer, and presents the glyph-centric anchor definitions in a glyph-centric way.

    I'm used to VOLT's model, which is lookup-centric rather than glyph-centric. The anchor attachments for each base and mark glyph end up stored in the GDEF table Attachment Points List, of course, but they are derived from the lookups, not the other way round. So the whole experience of working with anchor attachment in VOLT is very different. I think the lookup-centric model lends itself better to a visual UI, because it means I can see which anchors are at play in each lookup, and what their effect is. The glyph-centric model, as implemented in current toosls, seems to me to be lacking a layer of UI that would enable one to look at a glyph and filter anchors by lookup or by context, so one would only see the anchors used in a given context and what their effect is.
  • Some more ideas:
    • Specify classes using regular expressions. This allows you to use your glyph names to set up your classes: @MARKS = /\.mark/;
    • I'm moving from the idea of a "feature file" - which in principle just deals with GSUB and GPOS features but in reality never was because it also allows you to play with GDEF and a bunch of other tables too - to a more general language explicitly for font engineering. Something like:
    DuplicateGlyphs @BASES, @BASES.spacing; # Creates new glyphs with .spacing suffix
    SetAsMark @BASES.spacing; # Changes GDEF category for the new glyphs to mark
    SetWidth @BASES.spacing, 0;
    
  • Belleve Invis
    Belleve Invis Posts: 269
    edited March 2020
    Well maybe you should consider leveraging existing programming languages and create an EDSL to express OT features...

    import { sub, match, contextual, lookAhead, gid } from "GsubEdsl";
    
    function WithSuffix(resolver, suffix) {
        return ......
    }
    
    const pnum = gsub.createFeature(`pnum`)
        .add(sub(digits).to(WithSuffix(digits, '.prop')));
    
    const liga = gsub.createFeature(`liga`)
        .add(sub(gid`f`, gid`f`, gid`i`).to(gid`f_f_i`))
        .add(sub(gid`f`, gid`i`).to(gid`f_i`));
    
    const swap = gsub.createLookups()
        .add(contextual(
            match(gid`A`).to(gid`B`),
            match(gid`B`).to(gid`A`)));
    
    const insert = gsub.createLookups()
        .add(contextual(
            match(kasra).to(copy, gid`tatweel`),
            lookAhead(gid`ayin.fina`));

  • Thomas Phinney
    Thomas Phinney Posts: 2,920
    GID already means something quite different in fonts, so I would be inclined not to use it to mean glyph name.
  • GID already means something quite different in fonts, so I would be inclined not to use it to mean glyph name.
    s/gid/gn/ (means "glyph name")?
  • Another attempt:
    import { sub, match, contextual, lookAhead, gid } from "GsubEdsl";
    
    function WithSuffix(resolver, suffix) {
        return ......
    }
    
    const pnum = gsub.createFeature(`pnum`)
        .add(sub(digits).to(AddSuffix(`.prop`)));
    
    const liga = gsub.createFeature(`liga`)
        .add(sub(gn`f`, gn`f`, gn`i`).to(gn`f_f_i`))
        .add(sub(gn`f`, gn`i`).to(gn`f_i`));
    
    const swap = gsub.createLookups()
        .add(contextual(
            match(gn`A`).to(gn`B`),
            match(gn`B`).to(gn`A`)));
    
    const insert = gsub.createLookups()
        .add(contextual(
            match(kasra).to(Sequence(Identity, gn`tatweel`)),
            lookAhead(gid`ayin.fina`));
    The “to” parameter will now become a function that transforms the match (one or more glyphs) into something else — maybe one glyph or multiple glyphs...

    Looks doable, but still need more API design.
  • I see the benefit, and something like that underneath might make sense, but I'm currently preferring to keep this a domain-specific language, because that's easier for non-programmers to get their heads around. I'm thinking of doing it in an extensible way, with plugins written in Python providing additional "verbs".
  • Just a note that I am playing with this in this repository, and am currently having a lot of fun. Here is an example of combining extensible Python plugins with regexp-based classes. Input feature file:
    
    LoadPlugin Arabic;
    DefineClass @inits = /^uni\w+.init$/;
    DefineClass @medis = /^uni\w+.medi$/;
    DefineClass @finas = /^uni\w+.fina$/;
    InitMediFina;
    

    Processed into Adobe feature format in the context of a font:
    
    $ ./fee2fea Amiri-Regular.ttf test.fee
    feature init {
        sub [uni06FC uni063A uni075E uni075D ...] by [uni06FC.init uni063A.init uni075E.init uni075D.init ...];
    } init;
    
    feature medi {
        sub [uni06FC uni063A uni075E uni075D ...] by [uni06FC.medi uni063A.medi uni075E.medi uni075D.medi ...];
    } medi;
    
    feature fina {
        sub [uni0625 uni0627 uni0774 uni0773 ...] by [uni0625.fina uni0627.fina uni0774.fina uni0773.fina...];
    } fina;
    
  • Yes, I had completely forgotten to mention variable fonts, although they are on my list. There are already discussions about how to adapt the .fea format to VF, but no conclusions. The best idea at the moment is just to write separate feature files for each master and have the compiler interpolate them, but I think we can do better.
    In general, I'm not sure having feature files for design masters and then generating the variable font is not the best way to think about this. (I'll qualify this statement later.) 

    Feature files generate lookup tables that can be triggered by feature tags, and lookups operate on glyph IDs within that font. When you create a variable font, by whatever means, there is one set of glyph outline data with corresponding glyph IDs. And the lookups operate in terms of those glyph IDs, not glyph IDs that may have existed in some data earlier in the development workflow.

    So, if you have multiple design masters, each with their own outlines, that get used to generate a variable font, you still end up with one set of outlines in the variable font. In principle, the glyph IDs in the original design matters aren't directly relevant to the lookups in the final variable font; the only thing that matters to the lookups in the final variable font are the glyph IDs in that variable font.

    Of course, in a workflow that generates a variable font from a set of design masters, there must be a correspondence of glyph IDs and outlines in each of the design masters, and probably the tools to generate the variable font from the masters will maintain that same correspondence in the variable font.

    Even so, I think logically it makes sense to think of design masters as providing outlines that will be used to generate the final outlines + deltas in the variable font, and then having one set of feature data for the final variable font.

    There is one qualification I'd make to that regarding positioning: If there is some detail of positioning that needs to be handled differently in different design masters, then maybe that is data that needs to be authored for the original masters and then combined into lookups w/ deltas when the variable font is generated.

    Given that, I think what would make most sense is one set of global feature data for the variable supplemented by some design-master-specific feature data for certain positioning  operations.
  • John Hudson
    John Hudson Posts: 3,268
    Hi Peter. If I understand correctly, I think I disagree. I'd say that having an outline source and having an OTL source, from which a VF corresponding to any part of the source design space can be generated, is directly parallel, and it makes sense to have both the design outlines and the OTL stored in the same source format if possible. And I say that as someone who for 20+ years has worked with outlines and OTL in separate tools and sources. VF development pushes us towards integrated sources in ways that static fonts did not.

    GSUB: we need to be able to see shaping during outline development, and then also to be able to easily specify feature variations within the same tools and also see those applied as appropriate in the tool UI.

    GPOS: unless making a monospace font, presume that all GPOS needs to be interpolated across the design space and between axes in exactly the same way that outline coordinates are; we need to be able to see this during multiple phases of the design development, not just as something added on at the end.
  • John: What I mainly had in mind is that you end up with duplication of data. E.g., you'd have the same fi ligature substitution data in each of the design-master fonts. Also, with that duplication, there's the possibility of those copies getting out of sync.

    The meed you mentioned of seeing shaping effects while outlines are being designed makes complete sense to me. I'm just thinking that what's ultimately needed is an integrated process that doesn't have duplication of source data.
  • John Hudson
    John Hudson Posts: 3,268
    edited March 2020
    Ah. Yes, I agree the data shouldn't be duplicated, but it should be possible to fork it anywhere in the design space.

    What we have in recent font tools is pretty close to this: unified GSUB and separate GPOS (the latter compiled from in-tool anchor and kerning functionality). What we're lacking is ways to do and see complex contextual stuff, and convenient ways to do feature variations.
  • @Simon Cozens whats the latest on this? :)
  • John Hudson
    John Hudson Posts: 3,268
    edited November 2020
    Back in March, I wrote:
    And I say that as someone who for 20+ years has worked with outlines and OTL in separate tools and sources. VF development pushes us towards integrated sources in ways that static fonts did not.
    In the meantime, @Khaled Hosny and I worked out new build processes for Tiro projects that, among other things, enable us to make variable fonts that incorporate OTL from separate VOLT projects. So let me rephrase that earlier statement to say that VF development pushes us towards integrating sources in new and interesting ways.

  • @Simon Cozens whats the latest on this? :)
    It's going really well. I decided to split the problem into two parts, which has definitely paid off. The first part, the fontFeatures library, is a set of Python classes for representing OpenType features, lookups and rules, converting to and from AFDKO syntax, MTI syntax and OpenType binary (eventually - currently it reads but does not emit OpenType binary).

    The second part is the FEE language which is a development of the ideas above. It's at the point now where I've used it to successfully engineer three very tricky fonts. Here are a few samples of working FEE code.

    Anchor attachment is just so much easier than Adobe:
    Include anchors.fee; # Generated from source.
    
    Feature mark {
      Routine DoMarkBase {
        Attach &top &_top bases;
        Attach &centre &_centre bases;
        Attach &bottom &_bottom bases;
        Attach &comma &_comma bases;
      };
    };
    
    Feature curs {
      Routine CursiveAttachment { Attach &entry &exit cursive; } IgnoreMarks RightToLeft;
    };
    
    Include fee/pre-mkmk-repositioning.fee;
    
    Feature mkmk {
      Routine DoMarkAttachment {
        Attach &top &_top marks;
        Attach &bottom &_bottom marks;
        Attach &bottom.yb &_bottom.yb marks;
      };
    };
    
    

    Here's an example of using a predicate-based glyph class to help with a spacing issue; when you have a dotted beh at the beginning of a word where the beh glyph is narrower than the dots (such as پطرس), you need to add a bit more spacing to stop the dots from crashing into the previous word. So we find all initial glyphs which are narrower than the widest dot, and add a positioning rule for that case:
    DefineClass @narrow_inits = @inits and (width < xMax(tdb));
    
    Routine OpenSpaceAroundSmallDottedInits {
      Position { @narrow_inits  @nuktas  } /.*[mf]\d+$/;
    } UseMarkFilteringSet @nuktas;

    The plugin idea has worked particularly well. I love the fact that I can write reusable rules which interrogate the font for its glyph set, metrics and even contours, in order to enumerate possibilities for rules. For example, I'm working on a plugin right now for Nastaliq spacing, where it computes the total rise of a sequence by "binning" the glyphs by rise, and then works out the correct kern value for the space such that in e.g. "تک سجص", the kaf tucks neatly underneath the seen.

    The other part of the puzzle is a visual layout editor based on top of fontFeatures. This is still in the early stages but it's somewhat usable already. It also uses the same plugin idea. I recorded a brief video about how to use it; since then it also supports loading and editing UFO, Fontlab, and OTF files.
  • Michael Rafailyk
    Michael Rafailyk Posts: 151
    edited October 2022
    @Peter Constable
    If there is some detail of positioning that needs to be handled differently in different design masters, then maybe that is data that needs to be authored for the original masters and then combined into lookups w/ deltas when the variable font is generated.
    Just ran into the fact that in a variable font (with X-Height axis), GPOS lookups should be different for different masters with different x-height. It's not a problem when exporting static fonts, but what to do with a variable one?@Simon Cozens
    for more complex rules, you in theory would have to write a per-master feature file, but most font editors don't support doing that.
    So technically it is possible but with a Python and some additional command line tools?
  • John Hudson
    John Hudson Posts: 3,268
    Just ran into the fact that in a variable font (with X-Height axis), GPOS lookups should be different for different masters with different x-height. It's not a problem when exporting static fonts, but what to do with a variable one?
    Do you have corresponding kerning and anchor data in each master? If so, a variable font should be no problem: intermediate instances in the design space will have interpolated GPOS. The challenge in current tools is if you wanted to explicitly vary the GPOS data within an area of the design space rather than letting it interpolate.