In Microsoft's official OpenType spec page on
GPOS Pair Positioning, PosFormat 1, there is a small side-remark which is not explained any further:
A PairPos subtable also defines an offset to a Coverage table
(coverageOffset) that lists the indices of the first glyphs in each
pair. More than one pair can have the same first glyph, but the Coverage
table will list that glyph only once.
(my emph.)
The problem lies here:
When investigating a font, you can write out the PairPos code for a font such as Unicode.org's TestGPOSTwo.otf (
https://github.com/unicode-org/text-rendering-tests/tree/master/fonts) like this:
subtable 1:
posFormat: 1
coverage: uni25EF
valueFormat1: X_ADVANCE
valueFormat2: (None)
pairSetCount: 2
pairValueCount: 1
uni25EF -> sun x1=-800
pairValueCount: 1
...
now what character comes at the '...'? PairSetCount is 2; and
A PairSet table enumerates all the glyph pairs that begin with a covered glyph.
so this must indeed be a case where originally uni25EF appeared twice in the coverage list – and thus included only once.
For other fonts, the value of pairSetCount is equal to the length of Coverage; each covered glyph only gets used once as the "first glyph". The specification acknowledges this may not be the case but does not offer an alternative.
A number of other tools and libraries agree with my interpretation so far (opentype.js, otl_dump.php). The otherwise reliable DTL OTMaster Light reports the second occurrence as '.notdef'.
I am unable to duplicate a valid .fea file to exactly mimic TestGPOSTwo's behaviour. makeotf complains it sees two same kerning pairs and only uses the larger value – it must be a common mistake – and my Feature-Fu is not strong enough to fool it otherwise.
It is very likely TestGPOSTwo.otf was artificially constructed, but the accompanying document states
The second subtable has two PairSets, both kerning
◯ U+25EF LARGE CIRCLE and ☼ U+263C WHITE SUN WITH RAYS. The first
PairSet applies kerning so that the two symbols will exactly
overlap. If the second PairSet was applied, it would add spacing
to move the two symbols away from each other. But a correct text
rendering engine should walk the PairSets in the order given by
the font, and stop processing after finding the first
match.
which indicates that first character ought to be valid for both pairs, even though a bad renderer might use the second value instead of the first.
Any insights on this?
Comments
The question is not about the OT spec since what is shown is not a direct reflection of the binary format, which is what the spec documents. Rather, you've show a higher-level interpretation of the format, which approximates the actual binary format with certain abstractions.
Now, guessing at how this higher-level interpretation is formatted, I would expect what is presented at the '...' would be the glyph ID for the second glyph in the first glyph pair.
Again, guessing at this higher-level format, it appears that the coverage table lists exactly one glyph, uni25EF. But if PairSetCount is 2, there needs to be at least two glyphs in the coverage table for the font to be valid—for each PairSet table, there needs to be a separately-listed glyph in the coverage table. If the is positioning data for two glyph pairs and both pairs have uni25EF as the first glyph, then there must be 1 PairSet table, not two, and the one PairSet table must include two PairValueRecords.
Something seems amiss. It's not clear to me if the font itself is invalid (I'd guess not), or (more likely) the dump you're showing is not an accurate reflection of the font.
Just so: that's how the format is spec'd. To clarify: the sequence of PairSet tables corresponds respectively to the sequence of glyphs in the coverage table, and each PairSet table provides positioning data for all pairs that begin with a given glyph in the coverage table (hence the sequence of glyphs in the coverage table must be at least as long as the array of PairSet table offsets).
Eh?? If you're saying that the spec allows for a covered glyph to be used twice—i.e., to pertain to two different PairSet tables—then that is incorrect: the spec does not allow for that.
00 01 00 26 00 04 00 00 00 02 00 0E 00 14 00 01 00 02 FC E0 00 01 00 02 00 C8
(where the first 00 01 is the posFormat value). TTX shows it as
(one you may be more familiar with )
The font itself must be considered invalid. Only now I found this (apologies, again), the exact phrase I was overlooking earlier on:
and the suspect behavior tested in https://github.com/unicode-org/text-rendering-tests under GPOS-2 is not due to a specific renderer fault (which is what it is intended to test for), but because the offered scenario – an invalid font where length(Coverage) != PairSetCount – indicates a bad font instead.
I am going to ignore the results from this test, then. I should probably post it as an issue on the Unicode Github page with a request to withdraw this particular test.
ttx indeed recompiles it back to the exact same font, byte for byte. Does it show errors or warnings for similar mis-counts in the other tables? In that case this one got overlooked.