Unexplained phrase in GPOS explanation

Theunis de Jong
Theunis de Jong Posts: 112
edited April 2020 in Font Technology
In Microsoft's official OpenType spec page on GPOS Pair Positioning, PosFormat 1, there is a small side-remark which is not explained any further:

A PairPos subtable also defines an offset to a Coverage table (coverageOffset) that lists the indices of the first glyphs in each pair. More than one pair can have the same first glyph, but the Coverage table will list that glyph only once.
(my emph.)
The problem lies here:
When investigating a font, you can write out the PairPos code for a font such as Unicode.org's TestGPOSTwo.otf (https://github.com/unicode-org/text-rendering-tests/tree/master/fonts) like this:

  subtable 1:
    posFormat: 1
    coverage: uni25EF
    valueFormat1: X_ADVANCE
    valueFormat2: (None)
    pairSetCount: 2
      pairValueCount: 1
        uni25EF -> sun x1=-800 
      pairValueCount: 1
...
now what character comes at the '...'? PairSetCount is 2; and
A PairSet table enumerates all the glyph pairs that begin with a covered glyph.

so this must indeed be a case where originally uni25EF appeared twice in the coverage list – and thus included only once.

For other fonts, the value of pairSetCount is equal to the length of Coverage; each covered glyph only gets used once as the "first glyph". The specification acknowledges this may not be the case but does not offer an alternative.

A number of other tools and libraries agree with my interpretation so far (opentype.js, otl_dump.php). The otherwise reliable DTL OTMaster Light reports the second occurrence as '.notdef'.

I am unable to duplicate a valid .fea file to exactly mimic TestGPOSTwo's behaviour. makeotf complains it sees two same kerning pairs and only uses the larger value – it must be a common mistake – and my Feature-Fu is not strong enough to fool it otherwise.

It is very likely TestGPOSTwo.otf was artificially constructed, but the accompanying document states
The second subtable has two PairSets, both kerning ◯ U+25EF LARGE CIRCLE and ☼ U+263C WHITE SUN WITH RAYS. The first PairSet applies kerning so that the two symbols will exactly overlap. If the second PairSet was applied, it would add spacing to move the two symbols away from each other. But a correct text rendering engine should walk the PairSets in the order given by the font, and stop processing after finding the first match.
which indicates that first character ought to be valid for both pairs, even though a bad renderer might use the second value instead of the first.

Any insights on this?

Comments

  • It's not clear to me what you're asking. Your first question is in this context The subject seems to suggestion you're looking for clarification on the OT spec. But your first specific question is in this context:

    When investigating a font, you can write out the PairPos code for a font such as Unicode.org's TestGPOSTwo.otf (https://github.com/unicode-org/text-rendering-tests/tree/master/fonts) like this:

      subtable 1:
        posFormat: 1
        coverage: uni25EF
        valueFormat1: X_ADVANCE
        valueFormat2: (None)
        pairSetCount: 2
          pairValueCount: 1
            uni25EF -> sun x1=-800 
          pairValueCount: 1
    ...
    now what character comes at the '...'?
    The question is not about the OT spec since what is shown is not a direct reflection of the binary format, which is what the spec documents. Rather, you've show a higher-level interpretation of the format, which approximates the actual binary format with certain abstractions. 

    Now, guessing at how this higher-level interpretation is formatted, I would expect what is presented at the '...' would be the glyph ID  for the second glyph in the first glyph pair.

    PairSetCount is 2;
    Again, guessing at this higher-level format, it appears that the coverage table lists exactly one glyph, uni25EF. But if PairSetCount is 2, there needs to be at least two glyphs in the coverage table for the font to be valid—for each PairSet table, there needs to be a separately-listed glyph in the coverage table. If the is positioning data for two glyph pairs and both pairs have uni25EF as the first glyph, then there must be 1 PairSet table, not two, and the one PairSet table must include two PairValueRecords.

    and
    A PairSet table enumerates all the glyph pairs that begin with a covered glyph.

    so this must indeed be a case where originally uni25EF appeared twice in the coverage list – and thus included only once.

    Something seems amiss. It's not clear to me if the font itself is invalid (I'd guess not), or (more likely) the dump you're showing is not an accurate reflection of the font.

    For other fonts, the value of pairSetCount is equal to the length of Coverage; each covered glyph only gets used once as the "first glyph".

    Just so: that's how the format is spec'd. To clarify: the sequence of PairSet tables corresponds respectively to the sequence of glyphs in the coverage table, and each PairSet table provides positioning data for all pairs that begin with a given glyph in the coverage table (hence the sequence of glyphs in the coverage table must be at least as long as the array of PairSet table offsets).

    The specification acknowledges this may not be the case but does not offer an alternative.

    Eh?? If you're saying that the spec allows for a covered glyph to be used twice—i.e., to pertain to two different PairSet tables—then that is incorrect: the spec does not allow for that.

  • Theunis de Jong
    Theunis de Jong Posts: 112
    edited April 2020
    Apologies for providing an interpreted version of the original binary information. Here it is:
    00 01 00 26 00 04 00 00 00 02 00 0E 00 14 00 01 00 02 FC E0 00 01 00 02 00 C8
    (where the first 00 01 is the posFormat value). TTX shows it as

    <PairPos index="1" Format="1">
      <Coverage Format="1">
        <Glyph value="uni25EF"/>
      </Coverage>
      <ValueFormat1 value="4"/>
      <ValueFormat2 value="0"/>
      <!-- PairSetCount=2 -->
      <PairSet index="0">
        <!-- PairValueCount=1 -->
        <PairValueRecord index="0">
          <SecondGlyph value="sun"/>
          <Value1 XAdvance="-800"/>
        </PairValueRecord>
      </PairSet>
      <PairSet index="1">
        <!-- PairValueCount=1 -->
        <PairValueRecord index="0">
          <SecondGlyph value="sun"/>
          <Value1 XAdvance="200"/>
        </PairValueRecord>
      </PairSet>
    </PairPos>
    (one you may be more familiar with :) )

    It's not clear to me if the font itself is invalid (I'd guess not) ...

    The font itself must be considered invalid. Only now I found this (apologies, again), the exact phrase I was overlooking earlier on:
    The PairSet array contains one offset for each glyph listed in the Coverage table and uses the same order as the Coverage Index.

    and the suspect behavior tested in https://github.com/unicode-org/text-rendering-tests under GPOS-2 is not due to a specific renderer fault (which is what it is intended to test for), but because the offered scenario – an invalid font where length(Coverage) != PairSetCount – indicates a bad font instead.

    I am going to ignore the results from this test, then. I should probably post it as an issue on the Unicode Github page with a request to withdraw this particular test.
  • Yes, that’s an invalid font and you should probably raise an issue with Unicode. Additionally, I’d argue that there is a bug in fontTools if you can generate that font in the first place. If you can recompile that ttx dump back into a font without any errors or warnings, I’d suggest raising an issue with fontTools too.
  • I've posted the issue on Unicode's Github for this.

    ttx indeed recompiles it back to the exact same font, byte for byte. Does it show errors or warnings for similar mis-counts in the other tables? In that case this one got overlooked.
  • Perhaps the font was crafted explicitly to be invalid? (It is a test case, after all.)