ccmp and decomposition

André G. Isaak · June 2019

I'm wondering if anyone here has designed a font in which accented glyphs are decomposed into base glyphs and combining marks in the ccmp feature.

e.g.

sub aacute' by a acutecomb;

I'm thinking here of, e.g., a script face in which each base glyph has numerous different contextual forms where it might be simpler to ignore most marks when constructing one's contextual rules.

Will most software be able to deal with this, or would this approach create problems in some applications?

John Hudson · June 2019

I did this contextually in the ccmp feature for the Cambria and Brill fonts, so diacritic glyphs decompose when followed by a combining mark. This is to avoid having to provide mark-to-diacritic GPOS in addition to mark-to-base and mark-to-mark. It works beautifully in most places, but there's a persistent bug in Adobe's layout engine that causes it to fail in InDesign.

Your script face example is closer to what we do in complex Arabic fonts in which we decompose archegraphemic letter shapes and their differentiating dots so that letter shape contextual substitutions can be handled on a small subset of glyphs.

Adam Jagosz · June 2019

This is a reasonable idea, not only for the sake of feature code length, but also font size — alternates for all the diacritics blow the final file up a decent bit. It's sad that InDesign trips on this.

I though what was happening was that the decomposed pair got composed again before the substitution rule had the chance to run. So I tried to prevent the re-composition by inserting an extra character between the base and mark, a combining grapheme joiner or a custom zero-width character. To no avail, so I suppose my hypothesis was wrong and the mechanics of this issue are different.

But I think that the issue only concerns the default Adobe composers. After switching to one of the World-Ready composers, the features seem to work. (Btw are any pitfalls to using a World-Ready composer, apart from having to manually select it?)

John Hudson said:

This is to avoid having to provide mark-to-diacritic GPOS in addition to mark-to-base and mark-to-mark.

If anyone's interested, I wrote a script to (kind of “smartly”

) copy anchors from the bases and marks into the diacritic glyphs... if you happen to be using FontForge.

Mateusz Karpow · June 2019

First of all, it’s my first post here, so hello and thanks to everybody for such a great resource!

I decided to hop on this post not because I have any practical experience with this approach, but because I long seen it as a way to optimise font filesize for the web, so eg. to be able to provide wider language coverage without compromising site performance (pt. 14 here).

https://github.com/twardoch/ttfdiet/ was the first place which got me thinking. Since then, I discovered that for web browsers, blanking glyphs doesn’t seem to be needed. And that Chrome provides composition and decomposition on-the-fly without a need for ccmp table. No other browsers are that clever, unfortunately.

When U+0119 is requested but not available in the font, Chrome will render ę using U+0065+U+0328 (provided they are available). Conversely, when U+0065+U+0328 is requested, but U+0328 is unavailable, Chrome will render ę using U+0119 (provided it’s available). I haven’t tested all scenarios, esp. mixing multiple precomposed glyphs, combining marks and spacing marks, but I suppose Chrome just uses whatever rules Unicode provides that cover all of it (?). All other browsers require ccmp table to provide the above flexibility.

My current thinking would be to try this approach serving fonts without ccmp to Chrome and with ccmp to all other browsers (unless ccmp table size will prove negligible). Fonts wouldn’t have any precomposed glyphs, unless mark positioning wouldn’t satisfy the design requirements (yes, often in ę scenario

).

After this, much too long, an introduction, I wanted to ask you, who have practical knowledge, about some of the things that I worry about:

What if one would require to kern a composed glyph differently to its base glyph? Is that possible or would it require an exception and providing a precomposed form?
What if one would require alternative forms of diacritics (eg. for lower- and uppercase)?
Are there any pitfalls when mixing this approach with variable fonts? Is mark positioning flexible enough?

Just because I’m no more than an amateur, there are definitely problems I haven’t thought about. But this, along with variable fonts, is my biggest hope (because font streaming seems a long way off).

Although, I’m only concerned with web (there is no need to slim the desktop fonts such), I did some tests in InDesign CC 2018 just for fun with a stub font without ccmp and the results weren’t promising:

Hopefully, Adobe will use an opportunity of having to rework their engine to be ready for variable fonts to be better. As will the browsers one day join Chrome…

Erwin Denissen · June 2019

Mateusz Karpow said:

My current thinking would be to try this approach serving fonts without ccmp to Chrome and with ccmp to all other browsers (unless ccmp table size will prove negligible). Fonts wouldn’t have any precomposed glyphs, unless mark positioning wouldn’t satisfy the design requirements (yes, often in ę scenario ).

In general if the font lacks precomposed characters, a fallback font or the missing glyph is used. So it remains important to include precomposed characters, as I mentioned in this article:

https://www.high-logic.com/font-editor/fontcreator/tutorials/latin-diacritical-marks-accents

Mateusz Karpow · June 2019

Thank you for your article, Erwin. It helped me feel more at ease regarding pt. 2.

Erwin Denissen said:

In general if the font lacks precomposed characters, a fallback font or the missing glyph is used.

My experience is that Chrome can compose/decompose correctly (like in an example above) in the absence of precomposed/decomposed forms when they are called for, even without ccmp or ghost glyphs. Other browsers require ccmp, but when it’s present, they do not require precomposed forms. If I’m wrong then the whole idea is doomed and I just wasted a lot of your time!

Other applications require precomposed forms, no disagreement here. I’m only talking about specialised font files created to be used exclusively in the web browser context.

Khaled Hosny · June 2019

What Chrome does is that it first sends the text to HarfBuzz (the OpenType layout engine it uses) without checking whether the font supports the the characters or not. HarfBuzz in turn will decompose the text then recompose it and use whatever form of a given character that the font supports. Chrome will then check HarfBuzz output and for any character that is not supported (after HarfBuzz dis its magic) it uses the next fallback font and repeats the process (more or less, there are some optimizations).

Khaled Hosny · June 2019

Other browsers (and almost any other application except LibreOffice AFAIK) will check the font cmap table for supported characters and use fallback fonts before shaping and without doing any composition/decomposition magic.

Khaled Hosny · June 2019

Mateusz Karpow said:

What if one would require to kern a composed glyph differently to its base glyph? Is that possible or would it require an exception and providing a precomposed form?
What if one would require alternative forms of diacritics (eg. for lower- and uppercase)?
Are there any pitfalls when mixing this approach with variable fonts? Is mark positioning flexible enough?

You can do contextual pair positioning that uses the combining marks in the context.
You can also do contextual substitution of the mark glyph that depends on the base glyph.
AFAIK no.

Thomas Phinney · June 2019

1. Although one could do contextual kerning, it is complicated. Also: which type design apps support it today?

3. I imagine Khaled is saying “no” to the pitfalls, rather than to the flexibility.

The main pitfall is one that applies to kerning with variable fonts in general, and not to your decomposed approach in particular. The problem is that interpolation is linear, and sometimes the interactions of two shapes would benefit from doing something quite unlike linear interpolation, for the kerning values.

For example, consider the “To” combination in a sans serif, one in which at the heaviest weights the “o” becomes unable to tuck under the T, and this transition occurs (as one would expect) rather abruptly.

If you kern the bold and the light “To” correctly, the bold with its minimal kerning will have undue influence on in-between situations, which do not have the problematic shape interaction. So they end up under-kerned.

There is no easy solution for this, that I know of.

Mark Simonson · June 2019

You could have an alternate /o (with different kerning) that gets swapped for the normal one at a certain weight value, similar to the way dollar signs are handled. (Debatable whether that could be called easy...)

Mateusz Karpow · June 2019

Thank you! And thanks to HarfBuzz or, rather, to all its contributors! (I even wanted it to be more aggressive in its magic).

When/if this approach ever gets implemented/tried, I hope it’s going to be possible to work around other browser’s deficiencies using ccmp and ghost glyphs à la ttfdiet. If not, users of these browsers will have to suffer unnecessarily bloated fonts. I’m not aware of this being tried anywhere in production environment, though (?).
I do hope I haven’t derailed the discussion too much already, @André G. Isaak, but the issue of kerning in variable fonts sounds like a deficiency of the spec. One that maybe non-linear interpolation could help solve, if I understand correctly?

I really wouldn’t know, but having – vaguely – remembered Font Wars, we seem to be in a much better situation now when it comes to spec evolution. The big players all got together behind it and are introducing support relatively quickly mainly because of perceived performance opportunities. If the spec is still too constraining on the design/production side, and this slows the performance dream, maybe it isn’t too naïve to be optimistic that it can evolve quickly? I, for one, am happy to be naïve.

Thomas Phinney · June 2019

Given the nature of various storage optimizations available for OpenType, I do not expect that getting rid of precomposed accented characters would save very much file size for the finished font, if it is intelligently constructed/compiled.

I would be curious to hear just how much space you save by using your approach, in absolute and percentage terms.

Thomas Phinney · June 2019

Mark Simonson said:

You could have an alternate /o (with different kerning) that gets swapped for the normal one at a certain weight value, similar to the way dollar signs are handled. (Debatable whether that could be called easy...)

Excellent thought. The alternate “o” could have identical outlines, just different kerning. (Whether it is “easy” is another question, but “best available option” is good enough for me!)

Mark Simonson · June 2019

The alternate “o” could have identical outlines, just different kerning.

Right, that's what I meant.

Kent Lew · June 2019

The alternate “o” could have identical outlines, just different kerning.

I believe that’s exactly what DJR did with his Bild variable font. There’s a brief explanation at the end of this blog post.

Mateusz Karpow · June 2019

Thomas Phinney said:
I would be curious to hear just how much space you save by using your approach, in absolute and percentage terms.

These are all just tests… One of the authors of an idea cites 10% savings. Playing with it, it drops to 6% in woff2. But when I remove ccmp and ghost glyphs, woff2 savings are back to 10% in the same example font. That’s very substantial. On the site like the Guardian, that’s a whole new 1,2 styles (fonts) for “free”. One less faux style. Or, possibly, Vietnamese support without sacrificing site performance.

Thomas Phinney said:

(…) if it is intelligently constructed/compiled.

Absolutely! Every little counts!

Thomas Phinney · June 2019

If there are a lot of font styles involved, I imagine you could see pretty major savings by moving to variable fonts—at least for those browser versions that support them.

ccmp and decomposition

Comments

Categories