GPOS lookup with both Latin and PUA codepoints?

wiresandbooks · March 16

I tried writing a mark feature which would adjust positions for "mark" glyphs that I coded in the Private Use Area (U+F102-U+F11D). This seems to be supported by some OpenType engines, but not others; in particular, TextEdit.app (and other apps using Apple's engine, like Pages) will use the mark feature I defined, but LibreOffice does not seem to want to do it, and neither did Word (when I booted up my Windows machine).

I was aware that segmentation would usually prevent a feature from working on text that mixed two different scripts, but I was surprised this would extend to mixing Latin and PUA codepoints.

I wonder if anyone has a workaround to suggest?

markClass [uniF102] <anchor -276 1348> @MC_trope_ltr_004;
markClass [uniF103 uniF108 uniF111 uniF118 uniF11C] <anchor 0 1348> @MC_trope_ltr_004;
markClass [uniF10A] <anchor 0 53> @MC_trope_ltr_010;
markClass [uniF11D] <anchor 0 28> @MC_trope_ltr_010;
lookup LtrMarkPos {
    pos base uni25CC
        <anchor 617 1330> mark @MC_trope_ltr_004
        <anchor 217 0> mark @MC_trope_ltr_010;
    pos base a
        <anchor 483 1176> mark @MC_trope_ltr_004
        <anchor 166 -102> mark @MC_trope_ltr_010;
    pos base Z
        <anchor 627 1635> mark @MC_trope_ltr_004
        <anchor 219 -113> mark @MC_trope_ltr_010;
} LtrMarkPos;

feature mark {
    lookup LtrMarkPos;
} mark;

The reason behind this madness:

As you are probably aware, Hebrew has cantillation/trope marks, which are used to indicate the melody used when reading Torah in a religious context.

There is a growing community which uses the trope marks for English text; see an example on page 5 of this PDF, or at this category page of opensiddur. However, when used for English, the trope marks are generally handwritten, or replicated with vector symbols, because they need to be horizontally flipped (to match the script direction) and attach properly to the corresponding letters.

I'm aware of one existing font that supports trope mark glyphs for English text, but it does so using the old pre-Unicode approach of mapping the glyphs into Latin codespace. I was hoping I could create something a little bit more proper, and in fact I started with creating mirrored glyphs of the trope marks, with a ltrm feature to map them into their mirrored format when used in a LTR context. This actually works as desired in LuaLaTeX (since I have better control over the segmentation) but it didn't work in any other context, which is why I tried the Private Use Area approach.

Khaled Hosny · March 16

I’m afraid segmentation issues are out of the control of font developers, and with lack of a well defined standard for script segmentation, it is up to each implementation to do whatever it thinks right.

Khaled Hosny · March 16

The is most likely a text segmentation issue. Some applications will split PUA characters into their own script run, and lookups don’t usually get applied across run breaks (Apple’s CoreText might be an exception here, but I don’t know exactly how it does it). Some applications might treat PUA essentially as “Common” script (Firefox does that for instance), so they will be grouped with neighboring characters in the same script run.

But I don’t think you need PUA at all. You can try replacing the Hebrew marks with mirrored glyphs in “ltrm” or “ltra” feature, and then use them in LTR text. The layout engine would then apply these substitutions only when text is LTR. This is more likely to work better than PUA, though there is still the issue that some applications will insist in breaking the Hebrew marks into there own script run because they are assigned Hebrew script in Unicode not “Inherited” as usual for combining marks. I had this issue with using Hebrew marks with Arabic letter a few years ago and had to submit a few patches to several applications to fix it.

Edit: I missed the part about “ltrm” at the end of your post, so I’d be interested to know what applications it didn’t work in and how exactly it didn’t work.

wiresandbooks · March 16

Sorry, I was trying to be concise, and ended up being imprecise. I mean that the ltrm feature worked correctly to swap out the marks for the mirrored glyphs, but the mark feature to properly position the glyphs (using the original Hebrew codepoints) didn't work in anything I tried except LuaLaTeX (and FontGoggles).

I've pushed the source, built fonts, and a sample text to my git repository here. I'll see about making a proper list of what works and what doesn't in various apps.

Khaled Hosny · March 16

The font and the sample text show mirrored and correctly positioned marks for me in Pango, Firefox, and Chrome:

Image: https://us.v-cdn.net/5019405/uploads/editor/cr/o5hk7lanuumu.png

The positioning is broken in LibreOffice, but that is surprising as I expected it to work. I fixed a similar issue with Hebrew marks and Arabic letters a while ago and it still works, may be it does not work with Latin letters because LibrerOffice has this Western/CTL/Asian text split and Latin is Western while Hebrew is CTL in this (rather misguided) classification. Might be worth reporting an issue.

The mirroring works with CoreText but not the positioning. Neither works with DirectWrite.

wiresandbooks · March 16

I put together a collection of samples. (This also has a zipfile with the fonts, which is handy because apparently my git repository host is having downtime today.)

As you can see, CoreText will correctly mirror the glyphs, and it will also correctly position glyphs in the PUA codepoints. On the other hand, Chrome will correctly mirror the glyphs and correctly position the Hebrew codepoints, but not the PUA codepoints. LibreOffice won't correctly position either set, and MS Word (which I presume uses DirectWrite) not only doesn't position the glyphs, it doesn't mirror the glyphs, and it even drops the PUA glyphs from its PDF export.

So my original question stands, I think: Is there any technique I should try using to achieve better compatibility with different platforms, when using a mark feature with Latin base glyphs but Hebrew (or PUA) mark glyphs?

wiresandbooks · March 16

Khaled Hosny said:

I’m afraid segmentation issues are out of the control of font developers, and with lack of a well defined standard for script segmentation, it is up to each implementation to do whatever it thinks right.

I was afraid that'd be where we end up on this. Ah well.

Khaled Hosny · March 16

I’d still report issues, though. LibreOffice might fix it. CoreText issues I report eventually get fixed as well. I have no experience with reporting DirectWrite issues, though.

GPOS lookup with both Latin and PUA codepoints?

Best Answer

Answers

Categories