IPA: Best practice?

13»

Comments

  • Christian Thalmann
    Christian Thalmann Posts: 1,978
    edited November 5
    Aha, the creator of ipa.type.it, Tomasz, informs me he's not aware of any active replacement with precomposed glyphs on his side. I'm wondering whether it might be a browser thing, or a MacOS thing...
    In any case, I even ran into that problem with ḳ, which seems an oddly exotic precomposed character to expect in a font. I'm daunted by the prospect of hunting down and adding all those precomposed combinations for those dozens of accents... :/ Maybe I should just accept that I'm not fully compatible with ipa.type.it.
  • John Hudson
    John Hudson Posts: 3,170
    Unless you are very specifically making a font for IPA and similar transcription, yes, it is a good idea to include casing pair support.
  • The Glyphs.app IPA glyphs list seems to be missing a few standard IPA characters like ˤᵊᶹˣᶿʱ, common ones like ᵐᶯᶮᵑᶰ, or the inverted breve below as in ə̯.

    The 2015 IPA chart shows several the superscript modifier letters: ʷ for labialization, ʲ for palatalization, ˠ for velarization, ˤ for pharyngealization, ⁿ for nasal release, ˡ for lateral release. The 1999 IPA handbook also mentions a few others like ᶹ for labialization strictly without velarization, ᵊ for mid central vowel release, ᶿ for voiceless dental fricative release, ˣ for voiceless velar fricative release, an obsolete ˢ  and a non standard ʸ. It also uses ʱ for voiced aspiration in the Hindi illustration.
    The section on Extensions to the IPA also shows ʶ, ꟸ, ꟹ which may be out of scope.
    While not explicitely mentioned or used in the IPA chart or IPA handbook, a lot of superscript modifier letters are used nonetheless, for example ᵐ, ᶯ, ᶮ, ᵑ, ᶰ are common for prenasalized consonants like ᵐb, ᶯʈ, ᶮɟ, ᵑɡ, ᶰɢ. Not all the superscript of base symbols are common, so not explicitely standard ones may be out of scope for you.

    Those modifier letters should have anchors as well. For example ⁿ̪d̪ is sometimes used for the strictly dental ⁿd, or z̙ᵊ̙ or, if in the font, ᶜ̧. Arguably, smaller variants of the combining marks could be useful, but you may get away without.
    All base glyphs should have anchors really, so β, χ, θ should have anchors as well.

    For the ligature ties, t͡s can also be represented with t͜s, there are many possible pairs of base symbols they are used or can be used with like b͡β d͡ð ɠ͜ɓ m͡b ɴ͡q and many more. Having ligatures for those is impractical. You can horizontally center the tie on zero, have the above ligature tie high enough by default to not clash with ascenders, similarly the below ligature tie can be low enough to not clash with descenders. A more refined approach would do more, but that’s functional for most cases.

    The ˊ and ˋ should match but don’t in the current font. They’re legacy IPA and are not used anymore but are used in other systems.

    The inverted breve below for "non-syllabic" vowels, for example ə̯ as in the IPA chart (or k̯ as in your sample), is missing.

    Mind the soft dotted glyphs, for example ɨ̀ or j̊ should not have a dot. See https://googlefonts.github.io/gf-guide/diacritics.html#soft-dotted-glyphs or use Glyphs.app’s automatic code (which expects istroke.dotless and jdotless in these cases).

    Regarding the precomposed characters, there’s no way around it. Not supporting the precomposed form renders the font useless in many cases. Applications, or Unicode and font technology implementations they use, will normalize decomposed character sequences to precomposed character sequences as they see fit as doing so is often recommended. Unicode defines both forms as equivalent. As you’re using anchors, it should be low hanging fruit.

  • Does anyone have a list of all precomposed letters belonging to those dozens of diacritics…? It sounds like a lot of manual work to figure out what’s needed. For example, I was surprised to find k with underdot as a precomposed glyph given that it’s in none of the lists in Glyphs…
  • Christian Thalmann
    Christian Thalmann Posts: 1,978
    edited November 6
    You could try my Glyphs script "Report All Composeable Glyphs", it analyses Glyph's Glyph Info data and the current font to determine which glyphs can be composed from existing parts.
    That sounds perfect, thanks! 👍
  • @Denis Moyogo Jacquerye
    Does the superscript modifiers also need anchors above them? And the preferred position for the diacritic is right below the modifier or below the regular baseline?
  • You can horizontally center the tie on zero

    Oh, is that the reason why some use cases are misaligned for me? I assumed it should look like this:

    So should it have a negative LSB and an equally positive RSB?

  • Alright, I applied Denis' trick and picked out all low-hanging fruit and IPA-related-looking things. Hope that helps. I can at least confirm it solves the problem for ḳ.


  • Thomas Phinney
    Thomas Phinney Posts: 2,867
    Yes.

    As a broad matter, all combining marks should have zero advance width, and then it is up to you how to position them within their advance width.

    Glyphs App as I understand it will zero the advance widths for you, automatically at font generation time. This is an interesting convenience feature, that I have mixed feelings about.

    FontLab, last I checked, expected you to set the width yourself. Same for TypeTool and I think FontForge as well. Not sure about other font editors.
  • John Hudson
    John Hudson Posts: 3,170
    All combining mark glyphs should be zero-width, which is not to say that all combining mark characters need to be. Some of the things that Unicode encodes as combining marks, e.g. U+0315, can be more easily implemented as spacing glyphs and not as marks.

    The double-width marks like the tie are oddities, in that they are intended to sit between two base characters and extend over or under both. So these are usually zero-width but centered, and not positioned with anchors.

    When I get serious about supporting these, I include multiple widths and offset some of them left or right depending whether they are preceded/followed by narrow/medium/wide bases, and then I contextually adjust the heights to clear ascenders, descenders or other marks. Oh, and we’ve worked with some authors who want to be able to apply a second mark above a tie, so needed to include an anchor for that.
  • Denis Moyogo Jacquerye
    edited November 7
    Does the superscript modifiers also need anchors above them?
    It seems rare but does happen, so yes.
    You can find things like ⁿ̊t  ᵑ̊k ᵊ̃ ʷ̃.
    And the preferred position for the diacritic is right below the modifier or below the regular baseline?
    For ᶜ̧ it should be right below or on the modifier letter. Some get away with below the regular baseline but it seems unconventional.
    Oh, and we’ve worked with some authors who want to be able to apply a second mark above a tie, so needed to include an anchor for that.
    The ALA-LC transliteration, used in many library catalogues, uses t͡͏̇s (0074 0361 034F 0307 0073) for ҵ, not to be confused with the same sequence without 034F COMBINING GRAPHEME JOINER which gets 0307 reordered before 0361: ṫ͡s, or not to be confused with the similar sequence t︠̇s︡  with ligature half left and ligature half right which might be used for legacy reasons. Fun times!

    Another thing about ligature ties, the soft-dotted glyphs should be without dots in both first and second position, for example both u͡i and i͡u, or both ʉ͡ɨ and ɨ͡ʉ if that’s clearer why. /dotaccentcomb 0307 can be used explicitely if the dot needs to remain: u͡i̇ and i̇͡u, ʉ͡ɨ and ɨ̇͡ʉ. IPA doesn’t use ligature ties with vowels but other systems do.
  • Thank you very much, Denis. Is there a place where we can get texts with complex phonetic notation to test the fonts? And what fonts handle all these situations correctly? Probably only SIL's Gentium and maybe John's Brill, I guess. Full phonetic support is really defying.
  • John Hudson
    John Hudson Posts: 3,170
    maybe John's Brill
    The upcoming v5.00 is getting there, but there are still some small holes around complex bits of the system.

  • Christian Thalmann
    Christian Thalmann Posts: 1,978
    edited 8:42PM
    Centering the tie bars on x=0 didn't help. The top tie bar comes out nicely but the bottom one does weird things. For instance, it latches onto the /t/ in t͜ś (and it would do the same in t͜s if I didn't have a ligature for it). Maybe it's because the bottom tiebar comes with a _bottom anchor and the top one doesn't?
    Edit: That was actually the reason! Without the _bottom anchor, the bottom tiebar works like a charm as well: