Zero width joiner can be omitted from GSUB?

It looks like Chrome ignores zero width joiners (u200D) in GSUB rules, while Edge needs them to present.

If the sequence is A u200D B then the following works in Chrome:
sub A' B by C;
But not in Edge. It needs the actual sequence (which makes sense):
sub A' u200D B by C;
I was wondering if anybody ran into this, or if there's documentation for it? I've been on a wild goose chase to debug this for almost the entire day, so I might be missing something obvious.

Comments

  • John Hudson
    John Hudson Posts: 3,190
    edited September 2019
    Quite a few years ago, Unicode specified that the ZWJ character could be used to specify a kind of intent-to-ligate in Latin and other non-complex scripts. The trouble was, of course, that inserting ZWJ in a sequence such as f+ZWJ+i would break the simple f+i liga feature in existing OpenType implementations. As far as I know, this was never formally resolved at the OpenType Layout level — unsurprisingly, given that there's no implementation spec for OTL —, and the matter of whether lookups need to include control character glyphs or not is something that's become quite inconsistent across different scripts and shaping engines. 

    It's possible Chrome's shaping engine may be using the ZWJ as a trigger for the substitution — following Unicode's intent-to-ligate concept —, rather than ignoring it per se. In what feature are you testing the substitutions? Do you get different results if you put the lookups in a different feature?

    Paging @Behdad Esfahbod
  • I'm testing this for the ccmp feature. I'm testing a sequence that is supposed to be chained together with ZWJs, but I forgot one in one of the substitutions. This was no problem in Chrome: the sequence still worked. In Edge, it broke. As I was primarily testing in Chrome, it took me a long time to check all possible causes before I ended up on the missing ZWJ.

    When I removed all ZWJs, which should break all these sequences, I was surprised to see it all still work in Chrome.

    Thanks John for the thorough answer, this helped a lot!
  • HarfBuzz skips ZWJ and few other control code points (like soft hyphen) when applying GSUB lookups to better match the Unicode expected behavior. AFAIK no other commonly used layout engine does that.