IPA: Best practice?

13

Comments

  • Christian Thalmann
    Christian Thalmann Posts: 1,988
    edited November 2024
    Aha, the creator of ipa.type.it, Tomasz, informs me he's not aware of any active replacement with precomposed glyphs on his side. I'm wondering whether it might be a browser thing, or a MacOS thing...
    In any case, I even ran into that problem with ḳ, which seems an oddly exotic precomposed character to expect in a font. I'm daunted by the prospect of hunting down and adding all those precomposed combinations for those dozens of accents... :/ Maybe I should just accept that I'm not fully compatible with ipa.type.it.
  • John Hudson
    John Hudson Posts: 3,264
    Unless you are very specifically making a font for IPA and similar transcription, yes, it is a good idea to include casing pair support.
  • The Glyphs.app IPA glyphs list seems to be missing a few standard IPA characters like ˤᵊᶹˣᶿʱ, common ones like ᵐᶯᶮᵑᶰ, or the inverted breve below as in ə̯.

    The 2015 IPA chart shows several the superscript modifier letters: ʷ for labialization, ʲ for palatalization, ˠ for velarization, ˤ for pharyngealization, ⁿ for nasal release, ˡ for lateral release. The 1999 IPA handbook also mentions a few others like ᶹ for labialization strictly without velarization, ᵊ for mid central vowel release, ᶿ for voiceless dental fricative release, ˣ for voiceless velar fricative release, an obsolete ˢ  and a non standard ʸ. It also uses ʱ for voiced aspiration in the Hindi illustration.
    The section on Extensions to the IPA also shows ʶ, ꟸ, ꟹ which may be out of scope.
    While not explicitely mentioned or used in the IPA chart or IPA handbook, a lot of superscript modifier letters are used nonetheless, for example ᵐ, ᶯ, ᶮ, ᵑ, ᶰ are common for prenasalized consonants like ᵐb, ᶯʈ, ᶮɟ, ᵑɡ, ᶰɢ. Not all the superscript of base symbols are common, so not explicitely standard ones may be out of scope for you.

    Those modifier letters should have anchors as well. For example ⁿ̪d̪ is sometimes used for the strictly dental ⁿd, or z̙ᵊ̙ or, if in the font, ᶜ̧. Arguably, smaller variants of the combining marks could be useful, but you may get away without.
    All base glyphs should have anchors really, so β, χ, θ should have anchors as well.

    For the ligature ties, t͡s can also be represented with t͜s, there are many possible pairs of base symbols they are used or can be used with like b͡β d͡ð ɠ͜ɓ m͡b ɴ͡q and many more. Having ligatures for those is impractical. You can horizontally center the tie on zero, have the above ligature tie high enough by default to not clash with ascenders, similarly the below ligature tie can be low enough to not clash with descenders. A more refined approach would do more, but that’s functional for most cases.

    The ˊ and ˋ should match but don’t in the current font. They’re legacy IPA and are not used anymore but are used in other systems.

    The inverted breve below for "non-syllabic" vowels, for example ə̯ as in the IPA chart (or k̯ as in your sample), is missing.

    Mind the soft dotted glyphs, for example ɨ̀ or j̊ should not have a dot. See https://googlefonts.github.io/gf-guide/diacritics.html#soft-dotted-glyphs or use Glyphs.app’s automatic code (which expects istroke.dotless and jdotless in these cases).

    Regarding the precomposed characters, there’s no way around it. Not supporting the precomposed form renders the font useless in many cases. Applications, or Unicode and font technology implementations they use, will normalize decomposed character sequences to precomposed character sequences as they see fit as doing so is often recommended. Unicode defines both forms as equivalent. As you’re using anchors, it should be low hanging fruit.

  • Does anyone have a list of all precomposed letters belonging to those dozens of diacritics…? It sounds like a lot of manual work to figure out what’s needed. For example, I was surprised to find k with underdot as a precomposed glyph given that it’s in none of the lists in Glyphs…
  • Christian Thalmann
    Christian Thalmann Posts: 1,988
    edited November 2024
    You could try my Glyphs script "Report All Composeable Glyphs", it analyses Glyph's Glyph Info data and the current font to determine which glyphs can be composed from existing parts.
    That sounds perfect, thanks! 👍
  • @Denis Moyogo Jacquerye
    Does the superscript modifiers also need anchors above them? And the preferred position for the diacritic is right below the modifier or below the regular baseline?
  • You can horizontally center the tie on zero

    Oh, is that the reason why some use cases are misaligned for me? I assumed it should look like this:

    So should it have a negative LSB and an equally positive RSB?

  • Alright, I applied Denis' trick and picked out all low-hanging fruit and IPA-related-looking things. Hope that helps. I can at least confirm it solves the problem for ḳ.


  • Yes.

    As a broad matter, all combining marks should have zero advance width, and then it is up to you how to position them within their advance width.

    Glyphs App as I understand it will zero the advance widths for you, automatically at font generation time. This is an interesting convenience feature, that I have mixed feelings about.

    FontLab, last I checked, expected you to set the width yourself. Same for TypeTool and I think FontForge as well. Not sure about other font editors.
  • John Hudson
    John Hudson Posts: 3,264
    All combining mark glyphs should be zero-width, which is not to say that all combining mark characters need to be. Some of the things that Unicode encodes as combining marks, e.g. U+0315, can be more easily implemented as spacing glyphs and not as marks.

    The double-width marks like the tie are oddities, in that they are intended to sit between two base characters and extend over or under both. So these are usually zero-width but centered, and not positioned with anchors.

    When I get serious about supporting these, I include multiple widths and offset some of them left or right depending whether they are preceded/followed by narrow/medium/wide bases, and then I contextually adjust the heights to clear ascenders, descenders or other marks. Oh, and we’ve worked with some authors who want to be able to apply a second mark above a tie, so needed to include an anchor for that.
  • Denis Moyogo Jacquerye
    edited November 2024
    Does the superscript modifiers also need anchors above them?
    It seems rare but does happen, so yes.
    You can find things like ⁿ̊t  ᵑ̊k ᵊ̃ ʷ̃.
    And the preferred position for the diacritic is right below the modifier or below the regular baseline?
    For ᶜ̧ it should be right below or on the modifier letter. Some get away with below the regular baseline but it seems unconventional.
    Oh, and we’ve worked with some authors who want to be able to apply a second mark above a tie, so needed to include an anchor for that.
    The ALA-LC transliteration, used in many library catalogues, uses t͡͏̇s (0074 0361 034F 0307 0073) for ҵ, not to be confused with the same sequence without 034F COMBINING GRAPHEME JOINER which gets 0307 reordered before 0361: ṫ͡s, or not to be confused with the similar sequence t︠̇s︡  with ligature half left and ligature half right which might be used for legacy reasons. Fun times!

    Another thing about ligature ties, the soft-dotted glyphs should be without dots in both first and second position, for example both u͡i and i͡u, or both ʉ͡ɨ and ɨ͡ʉ if that’s clearer why. /dotaccentcomb 0307 can be used explicitely if the dot needs to remain: u͡i̇ and i̇͡u, ʉ͡ɨ and ɨ̇͡ʉ. IPA doesn’t use ligature ties with vowels but other systems do.
  • Thank you very much, Denis. Is there a place where we can get texts with complex phonetic notation to test the fonts? And what fonts handle all these situations correctly? Probably only SIL's Gentium and maybe John's Brill, I guess. Full phonetic support is really defying.
  • John Hudson
    John Hudson Posts: 3,264
    maybe John's Brill
    The upcoming v5.00 is getting there, but there are still some small holes around complex bits of the system.

  • Christian Thalmann
    Christian Thalmann Posts: 1,988
    edited November 2024
    Centering the tie bars on x=0 didn't help. The top tie bar comes out nicely but the bottom one does weird things. For instance, it latches onto the /t/ in t͜ś (and it would do the same in t͜s if I didn't have a ligature for it). Maybe it's because the bottom tiebar comes with a _bottom anchor and the top one doesn't?
    Edit: That was actually the reason! Without the _bottom anchor, the bottom tiebar works like a charm as well:

  • John Hudson
    John Hudson Posts: 3,264
    Yes, if you want the tie to simply sit on the sidebearing between the two letters, then you have to avoid giving it any _anchor. Anchor positions are always applied relative to the 0,0 coordinate of the preceding glyph, so even if there were no corresponding anchor on that preceding glyph you could end up with a mispositioned tie.

    Some of your tir positions still look a bit odd. Why is the tie below tś further left than the tie below ts? Why are the ties over and under mn so far left?
  • Christian Thalmann
    Christian Thalmann Posts: 1,988
    edited November 2024
    Good question, I'm not sure where the asymmetries come from...
    BTW, I asked @Georg Seifert from Glyphs App to remove the _bottom and _top anchors from the tiebars but keep the bottom and top ones for attaching marks to the tiebars themselves. However, he says that would fail because marks following the tiebar get shuffled to before the tiebar and are applied to the preceding letter instead...?
    Georg wrote:
    I just tested it in FontGoggles and it seem that if I type “a/breveinverteddoublecomb/dieresiscomb/a”, it is reordered to “a/dieresiscomb/breveinverteddoublecomb/a”.
  • Denis Moyogo Jacquerye
    edited November 2024

    @Christian Thalmann: @Georg Seifert is almost right but actually mistaken see previous comment.
    The ALA-LC transliteration, used in many library catalogues, uses t͡͏̇s (0074 0361 034F 0307 0073) for ҵ, not to be confused with the same sequence without 034F COMBINING GRAPHEME JOINER which gets 0307 reordered before 0361: ṫ͡s, or not to be confused with the similar sequence t︠̇s︡  with ligature half left and ligature half right which might be used for legacy reasons.
  • Oh, that was above my paygrade I'm afraid 😜
  • This is very interesting. Thanks for explaining this. 
    I tried to make a simple test font to get the dot above the tiebar. Adding an anchor in the tiebar itself didn’t work (it is still attached the top anchor in the t). And where exactly should I put the top anchor in the tiebar? Or does it need a contextual anchor in the "t"?
  • John Hudson
    John Hudson Posts: 3,264
    edited November 2024
    If you’re classing the tie as a mark, then put a mkmk anchor on top of the above tie bar (in the middle).

    If you’re still not distinguishing between mark and mkmk anchors, well, I can’t help you. :#

    If you’re not classing the tie as a mark—and why would you if it is not being positioned with an anchor and shouldn’t be skippped in mark-to-base processing—, then you could use any anchor above it, but I still make a distinction between marks applied to letter-like things and marks applied to mark-like things, to enable the kind of control I want over different kinds of distance relationships.
  • Yup, if 0361 /breveinverteddoublecomb or other non-spacing marks have a top anchor but no _top anchor, they should still be in the GPOS mkmk lookup for that anchor.

    You can enforce it by having a _top anchor in /breveinverteddoublecomb but then you need another set of anchors, say topdouble and _topdouble in /breveinverteddoublecomb and topdouble on all base glyphs, to reposition /breveinverteddoublecomb as the anchors are applied in that order given those names.

    Alternatively, you can have a different anchor for mark-to-mark like John does, for example topMark and _topMark. /dotaccentcomb would then have _top, topMark and _topMark, and no "top" anchor which would only be in base glyphs. /breveinverteddoublecomb would have topMark and _topMark.

    The first option adds an anchors to all base glyphs which can be used to shift double marks at least when that base glyph is tall, the second option adds an anchor to all marks which can also add more control for stacking marks.

    You’ll likely need a blank glyph for 034F graphemejoinercomb for things to work.

    That said, IPA typically doesn’t use marks above the top ligature tie, or below the bottom ligature tie, other systems do.
  • I hate the way the canonical shape of /zretroflexhook/ truncates the tail of /z/ and leaves a gap open. This is particularly painful in italics, where the wavy tail gets lost. Can I get away with attaching the hook to the bottom of an intact /z/?


  • John Hudson
    John Hudson Posts: 3,264
    You don’t necessarily need to lose the wavy stroke.
  • Those are nice! I'm a bit worried that the tail will cause unnecessary collisions if it protrudes that much. But I suppose the same goes for ɖ etc...

  • John Hudson
    John Hudson Posts: 3,264
    As I recall, I kern these postively to avoid collisions with following descenders, but didn’t go so far as to kern to below-base marks on following letters. If I were aware of specific instances where this was a problem, I would target those.
  • Igor Freiberger
    Igor Freiberger Posts: 282
    edited November 2024
    If you add phonetics to your font, you need to take care of many kerning issues. z with retroflex hook is just one among them. (The sample below has no kerning and a +50 tracking just to illustrate the characters).


  • Thanks, John and Igor... I will leave the kerning as an exercise for the user. :wink:
    And wow, some outrageously rude glyphs in that list! The way that second /x/ is manspreading is ridiculous. :(

  • In unrelated news, I absolutely love my Hairline Italic schwa with rhotic hook, it looks so Sütterlin. :grimace:
  • John Hudson
    John Hudson Posts: 3,264
    What character is that manspreading x, Igor? Is it a variant of ᶍ?