ccmp and decomposition

I'm wondering if anyone here has designed a font in which accented glyphs are decomposed into base glyphs and combining marks in the ccmp feature.

e.g.

sub aacute' by a acutecomb;

I'm thinking here of, e.g., a script face in which each base glyph has numerous different contextual forms where it might be simpler to ignore most marks when constructing one's contextual rules.

Will most software be able to deal with this, or would this approach create problems in some applications?

Comments

  • Adam Jagosz
    Adam Jagosz Posts: 689
    edited June 2019
    This is a reasonable idea, not only for the sake of feature code length, but also font size — alternates for all the diacritics blow the final file up a decent bit. It's sad that InDesign trips on this.
    I though what was happening was that the decomposed pair got composed again before the substitution rule had the chance to run. So I tried to prevent the re-composition by inserting an extra character between the base and mark, a combining grapheme joiner or a custom zero-width character. To no avail, so I suppose my hypothesis was wrong and the mechanics of this issue are different.
    But I think that the issue only concerns the default Adobe composers. After switching to one of the World-Ready composers, the features seem to work. (Btw are any pitfalls to using a World-Ready composer, apart from having to manually select it?)
    This is to avoid having to provide mark-to-diacritic GPOS in addition to mark-to-base and mark-to-mark.
    If anyone's interested, I wrote a script to (kind of “smartly” ;)) copy anchors from the bases and marks into the diacritic glyphs... if you happen to be using FontForge.
  • Mateusz Karpow
    Mateusz Karpow Posts: 6
    edited June 2019
    First of all, it’s my first post here, so hello and thanks to everybody for such a great resource!
    I decided to hop on this post not because I have any practical experience with this approach, but because I long seen it as a way to optimise font filesize for the web, so eg. to be able to provide wider language coverage without compromising site performance (pt. 14 here).
    https://github.com/twardoch/ttfdiet/ was the first place which got me thinking. Since then, I discovered that for web browsers, blanking glyphs doesn’t seem to be needed. And that Chrome provides composition and decomposition on-the-fly without a need for ccmp table. No other browsers are that clever, unfortunately.
    When U+0119 is requested but not available in the font, Chrome will render ę using U+0065+U+0328 (provided they are available). Conversely, when U+0065+U+0328 is requested, but U+0328 is unavailable, Chrome will render ę using U+0119 (provided it’s available). I haven’t tested all scenarios, esp. mixing multiple precomposed glyphs, combining marks and spacing marks, but I suppose Chrome just uses whatever rules Unicode provides that cover all of it (?). All other browsers require ccmp table to provide the above flexibility.
    My current thinking would be to try this approach serving fonts without ccmp to Chrome and with ccmp to all other browsers (unless ccmp table size will prove negligible). Fonts wouldn’t have any precomposed glyphs, unless mark positioning wouldn’t satisfy the design requirements (yes, often in ę scenario :)).
    After this, much too long, an introduction, I wanted to ask you, who have practical knowledge, about some of the things that I worry about:
    1. What if one would require to kern a composed glyph differently to its base glyph? Is that possible or would it require an exception and providing a precomposed form?
    2. What if one would require alternative forms of diacritics (eg. for lower- and uppercase)?
    3. Are there any pitfalls when mixing this approach with variable fonts? Is mark positioning flexible enough?
    Just because I’m no more than an amateur, there are definitely problems I haven’t thought about. But this, along with variable fonts, is my biggest hope (because font streaming seems a long way off).
    Although, I’m only concerned with web (there is no need to slim the desktop fonts such), I did some tests in InDesign CC 2018 just for fun with a stub font without ccmp and the results weren’t promising:


    Hopefully, Adobe will use an opportunity of having to rework their engine to be ready for variable fonts to be better. As will the browsers one day join Chrome…

  • My current thinking would be to try this approach serving fonts without ccmp to Chrome and with ccmp to all other browsers (unless ccmp table size will prove negligible). Fonts wouldn’t have any precomposed glyphs, unless mark positioning wouldn’t satisfy the design requirements (yes, often in ę scenario :)).
    In general if the font lacks precomposed characters, a fallback font or the missing glyph is used. So it remains important to include precomposed characters, as I mentioned in this article:

  • Mateusz Karpow
    Mateusz Karpow Posts: 6
    edited June 2019
    Thank you for your article, Erwin. It helped me feel more at ease regarding pt. 2.
    In general if the font lacks precomposed characters, a fallback font or the missing glyph is used.
    My experience is that Chrome can compose/decompose correctly (like in an example above) in the absence of precomposed/decomposed forms when they are called for, even without ccmp or ghost glyphs. Other browsers require ccmp, but when it’s present, they do not require precomposed forms. If I’m wrong then the whole idea is doomed and I just wasted a lot of your time!
    Other applications require precomposed forms, no disagreement here. I’m only talking about specialised font files created to be used exclusively in the web browser context.
  • Khaled Hosny
    Khaled Hosny Posts: 289
    What Chrome does is that it first sends the text to HarfBuzz (the OpenType layout engine it uses) without checking whether the font supports the the characters or not. HarfBuzz in turn will decompose the text then recompose it and use whatever form of a given character that the font supports. Chrome will then check HarfBuzz output and for any character that is not supported (after HarfBuzz dis its magic) it uses the next fallback font and repeats the process (more or less, there are some optimizations).
  • Khaled Hosny
    Khaled Hosny Posts: 289
    Other browsers (and almost any other application except LibreOffice AFAIK) will check the font cmap table for supported characters and use fallback fonts before shaping and without doing any composition/decomposition magic.
  • Khaled Hosny
    Khaled Hosny Posts: 289
    1. What if one would require to kern a composed glyph differently to its base glyph? Is that possible or would it require an exception and providing a precomposed form?
    2. What if one would require alternative forms of diacritics (eg. for lower- and uppercase)?
    3. Are there any pitfalls when mixing this approach with variable fonts? Is mark positioning flexible enough?
    1. You can do contextual pair positioning that uses the combining marks in the context.
    2. You can also do contextual substitution of the mark glyph that depends on the base glyph.
    3. AFAIK no.

  • Thomas Phinney
    Thomas Phinney Posts: 2,887
    1. Although one could do contextual kerning, it is complicated. Also: which type design apps support it today?

    3. I imagine Khaled is saying “no” to the pitfalls, rather than to the flexibility.

    The main pitfall is one that applies to kerning with variable fonts in general, and not to your decomposed approach in particular. The problem is that interpolation is linear, and sometimes the interactions of two shapes would benefit from doing something quite unlike linear interpolation, for the kerning values.

    For example, consider the “To” combination in a sans serif, one in which at the heaviest weights the “o” becomes unable to tuck under the T, and this transition occurs (as one would expect) rather abruptly.

    If you kern the bold and the light “To” correctly, the bold with its minimal kerning will have undue influence on in-between situations, which do not have the problematic shape interaction. So they end up under-kerned.

    There is no easy solution for this, that I know of. :(
  • Mark Simonson
    Mark Simonson Posts: 1,734
    You could have an alternate /o (with different kerning) that gets swapped for the normal one at a certain weight value, similar to the way dollar signs are handled. (Debatable whether that could be called easy...)
  • Thank you! And thanks to HarfBuzz or, rather, to all its contributors! (I even wanted it to be more aggressive in its magic).
    When/if this approach ever gets implemented/tried, I hope it’s going to be possible to work around other browser’s deficiencies using ccmp and ghost glyphs à la ttfdiet. If not, users of these browsers will have to suffer unnecessarily bloated fonts. I’m not aware of this being tried anywhere in production environment, though (?).
    I do hope I haven’t derailed the discussion too much already, @André G. Isaak, but the issue of kerning in variable fonts sounds like a deficiency of the spec. One that maybe non-linear interpolation could help solve, if I understand correctly?
    I really wouldn’t know, but having – vaguely – remembered Font Wars, we seem to be in a much better situation now when it comes to spec evolution. The big players all got together behind it and are introducing support relatively quickly mainly because of perceived performance opportunities. If the spec is still too constraining on the design/production side, and this slows the performance dream, maybe it isn’t too naïve to be optimistic that it can evolve quickly? I, for one, am happy to be naïve.
  • Thomas Phinney
    Thomas Phinney Posts: 2,887
    edited June 2019
    Given the nature of various storage optimizations available for OpenType, I do not expect that getting rid of precomposed accented characters would save very much file size for the finished font, if it is intelligently constructed/compiled.

    I would be curious to hear just how much space you save by using your approach, in absolute and percentage terms.
  • Thomas Phinney
    Thomas Phinney Posts: 2,887
    You could have an alternate /o (with different kerning) that gets swapped for the normal one at a certain weight value, similar to the way dollar signs are handled. (Debatable whether that could be called easy...)
    Excellent thought. The alternate “o” could have identical outlines, just different kerning. (Whether it is “easy” is another question, but “best available option” is good enough for me!)
  • Mark Simonson
    Mark Simonson Posts: 1,734
    The alternate “o” could have identical outlines, just different kerning. 
    Right, that's what I meant. :smile:
  • Kent Lew
    Kent Lew Posts: 937
    The alternate “o” could have identical outlines, just different kerning.
    I believe that’s exactly what DJR did with his Bild variable font. There’s a brief explanation at the end of this blog post.
  • Thomas Phinney said:
    I would be curious to hear just how much space you save by using your approach, in absolute and percentage terms.
    These are all just tests… One of the authors of an idea cites 10% savings. Playing with it, it drops to 6% in woff2. But when I remove ccmp and ghost glyphs, woff2 savings are back to 10% in the same example font. That’s very substantial. On the site like the Guardian, that’s a whole new 1,2 styles (fonts) for “free”. One less faux style. Or, possibly, Vietnamese support without sacrificing site performance.
    (…) if it is intelligently constructed/compiled.
    Absolutely! Every little counts!
  • Thomas Phinney
    Thomas Phinney Posts: 2,887
    If there are a lot of font styles involved, I imagine you could see pretty major savings by moving to variable fonts—at least for those browser versions that support them.