IPA ligature requirements

Some user using IPA symbols asked me about the following:
the need is to type

t U035C s

and get

U02A6 U035C

where U02A6 is the ts ligature.

Of course if one types t s U035C then the solution is trivial. But the need is to type with this order: t U035C s
since there is a lot of linguistic material already typed in this order. They were "trained" to type in this order since older fonts were placing U035C under t in such a way that was extending to the right, and s was falling at the right place. But those older fonts did not have or provide a ts ligature (U02A6).

Comments

  • I don’t understand what the promblem actually is. Missing glyphs in some fonts? An old composing hack which doesn’t work as expected with more recent fonts?

    to type
    t U035C s
    and get
    U02A6 U035C
    where U02A6 is the ts ligature.

    I don’t understand this, sorry. Sems completely illogical.


  • We have a font that does have ALL the characters: t s U035C and U02A6 where the latter is the ts ligature. So no glyph is missing. This is actually the NewComputerModern font freely available and mostly used with TeX.

    Linguists are typing t U035C s (with this order) and get the ts with an undertie with fonts such as CharisSIL or DoulosSIL. This functionality has been added to NewComputerModern (although not released yet). BUT linguists ask that they should not get ts with undertie but U02A6 with undertie.

    Is it now clear?




  • How does the input text differentiate between a ts that should be a ligature (U02A6) and one that shouldn't? Is a ligature feature applied selectively?

    In principle, the OpenType lookup flag "IgnoreMarks" should do what you want if the U035C glyph is classified as a mark.

    lookup TS_LIGATURE {
      lookupflag IgnoreMarks;
      sub t s by uni02A6;
    } TS_LIGATURE;
  • Antonis Tsolomitis
    Antonis Tsolomitis Posts: 13
    edited September 2022
    Ah, thank you. This is what I was missing, because in FontForge it is a metadata for the whole table.
    In what table does the ts-lig would naturally belong to?
    I think this is a rare ligature so dlig looks a reasonable choice.
    The IgnoreMarks affects the whole dlig table. Fontforge has this choice as a metadata for tables. No problem to enable it but your code above (with the lookupflag) seems to affect only the ts-lig.
    Of course I can make a new table, say ipal, just for the IPA ligatures. I wonder though if this is correct practice.

  • Of course I can make a new table, say ipal, just for the IPA ligatures. I wonder though if this is correct practice.
    It is, although your terminology is wrong: what you are doing is creating a new lookup within the dlig feature.
  • You should not encode text with the sequence < 02A6, 035C > but rather should use < 02A6, 032E >. The intended display for 035C is to overhang the previous and following base characters.
  • You should not encode text with the sequence < 02A6, 035C > but rather should use < 02A6, 032E >. The intended display for 035C is to overhang the previous and following base characters.
    But linguists do not type neither < 02A6, 035C > nor < 02A6, 032E >. So I am not encoding those. Linguists type < t 035C s > This is the one I need to encode.





  • You should not encode text with the sequence < 02A6, 035C > but rather should use < 02A6, 032E >. The intended display for 035C is to overhang the previous and following base characters.
    That is what the original post describes, i.e. input
    t U+035F s
    but wanting the t+s letter sequence to be ligated. So the text is not being encoded as <02A6, 035C>: that’s just a possible glyph outcome:
    /uni02A6 /uni035C
    It could also be something like 
    /t_s / /uni032E.wide





  • André G. Isaak
    André G. Isaak Posts: 634
    edited September 2022
    There's something a bit peculiar about this request — the ligation in U02A6 (to the extent that it is used at all) takes the place of the undertie so using both seems redundant. Most linguists would expect either an unligated ts with a tie below or would use a superscripted s to make it clear that an affricate is being represented.

    if someone really wants this they can manually kern the tie under the ligature, but I don't think  a font should reasonably be expected to handle such nonstandard cases.
  • Denis Moyogo Jacquerye
    edited September 2022
    In 1989, ligatures ʦ, ʣ, ʧ, ʤ, etc. were withdrawn for the IPA as their alternative with under tie t͜s, d͜z, t͜ʃ, d͜ʒ, etc. or over tie t͡s, d͡z, t͡ʃ, d͡ʒ were preferred.
    Having the ligature with the under tie or over tie is quite redundant and non standard IPA, even historically. At least a few of linguists in Poland seem to use the non standard IPA ʦ̑, etc. similarly to what was mentioned.

    Nevertheless, you need two lookups:
    • One ligature substitution lookup, not in any feature, that ignore marks with a substable that replaces the source glyphs "t s" by the ligature glyph "uni0361".
    • A second contextual substitution lookup, in the "dlig" feature in "DFLT{dflt} latn{dflt}", that applies the first ligature sub lookup on the context "t uni0361 s". I’d do that one with the By Glyphs and Complex dialog.
  • If this usage does occur in some places as Denis indicates, and you want to support it, then I would suggest the simplest approach would be to use a precomposed t_uni035C_s ligature which is handled by the 'ccmp' feature.

    Having a t s -> t_s rule which skips marks is problematic since you only want this sort of ligation to occur in contexts where ts represents an affricate, and that's not something that OT features can determine.
  • OK. Thank you all for all this wealth of information. I will talk to the users who asked this (I am not a linguist) and decide what is best.
  • Denis Moyogo Jacquerye
    edited September 2022
    From what I understand, this is a unfortunate interpretation of IPA notation. For example, http://www.grzegorj.ugu.pl/gram/en/ipa.html wrongly assumes ʦ̑, ʣ̑, ʧ̑, ʤ̑ are IPA. The IPA would actually be t͡s, d͡z, t͡ʃ, d͡ʒ like our linguist would input.
    The page cites http://hctv.humnet.ucla.edu/departments/linguistics/VowelsandConsonants/appendix/languages/polish/Polish.html as a reference but that one uses images. In the images, things like t̠͡ṣ (or ṯ͡ṣ) are shown (dot below is not standard IPA but that’s acknowledged). Obviously the minus (or macron) below the t and the dot below the s are problematic if one wants to use ʦ̑ with them.

    For the lookups, the t s -> uni02A6 (not uni0361 in my previous comment) substitution that ignore marks can only be applied when the context is t uni0361 s, so it doesn’t happen for every "t s". Obviously, you’d need something more complex to handle additional diacritics. There are several issues here.

    The confusion probably comes from previous IPA usage and current non standard IPA usage of those ligatures and the fact that U+0361 behaves terribly in many fonts.
  • John Hudson
    John Hudson Posts: 3,268
    edited September 2022
    fact that U+0361 behaves terribly in many fonts
    All the double-letter combining marks in Unicode are difficult to implement well. You need to plan for base letters of differing widths as well as differing heights, so need multiple contextual variant widths and contextual GPOS to adjust the height.

    This is the contextual implementation in the upcoming v2.20 release of STIX Two Text:

  • Florian Pircher
    Florian Pircher Posts: 176
    edited September 2022
    fact that U+0361 behaves terribly in many fonts.
    I observed that Core Text tries to “help” by moving the glyph for U+0361 down based on some heuristic and then ignoring GPOS code for that glyph. For example, I have a glyph for o and for U+0361:



    In the following graphic, the red version is as interpreted by Core Text and the blue version is what my GPOS code would specify (and what HarfBuzz does):

    I managed to get the blue/correct output with Core Text by substituting my normal glyph for U+0361 (as specified by ‘cmap’) by a different but equal glyph, i.e. the following ‘calt’ feature code:

    sub breveinverteddoublecomb by breveinverteddoublecomb.copy;

    Now, Core Text no longer interferes, and I can freely move /breveinverteddoublecomb.copy using GPOS. Testing two fonts (without the above feature code, Keep.ttf; and with the feature code, Sub.ttf) yields:

    # testing HarfBuzz
    $ hb-shape --shapers=ot Keep.ttf "o͡o"
    [o=0+581|uni0361=0@0,-240+0|o=2+581]
    $ hb-shape --shapers=ot Sub.ttf "o͡o"
    [o=0+581|uni0361.copy=0@0,-240+0|o=2+581]
    # testing Core Text
    $ hb-shape --shapers=coretext Keep.ttf "o͡o"
    [o=0+537|uni0361=0@0,-224+44|o=2+581]
    $ hb-shape --shapers=coretext Sub.ttf "o͡o"
    [o=0+581|uni0361.copy=0@0,-240+0|o=2+581]
    

    So, with HarfBuzz (and Core Text when using the ‘sub’ trick) the mark gets moved down 240 units as specified by my ‘kern’ code. But without the ‘sub’ trick, Core Text moves the mark down by itself by 224 units and also adds 44 units to the advance width (!?)

    I think older versions of Apple's Pages caused even more damage if double marks were moved by GPOS and back then, Core Text alone did nothing, I’m not sure anymore. The moral: Few fonts support character, shaper tries to help, shaper messes with the few fonts that support the character.
  • OK, after I discussed with my users they do not insist on the ts-lig with undertie. They want the ts with undertie AND the ts-lig (without undertie). They told me that tipa indeed removed the ts-lig aw written above in the 1989 but many linguists still use it, and since it exists in unicode why not support it. So this is what I will prefer to do.