Font production frustrations and solutions

Nick Shinn · March 2019

I’m in the habit of using the full-width accents for composites. After all, they are the default for accents.

Thomas Phinney · March 2019

Nick Shinn said:

I’m in the habit of using the full-width accents for composites. After all, they are the default for accents.

Not sure what you mean by “the default.”

They are characters that are more commonly in old legacy encodings and codepages, sure. Like you, I have long been in the habit of using them in my pre-built composite characters. The (only?) reason being that if we go back 25 years, those were the characters we had in our fonts, typically, rather than the (zero-width) combining accents. So I started out using those.

But it is quite clear from Unicode which characters are combining and which are not. If you make a new-fangled font that relies on dynamically combining accents with base characters, it surely makes sense to use... the combining accents, and not the stand-alone spacing accents.

Andreas Stötzner · March 2019

Good heaven’s grief, Word, Word and always Word…

I always dissolve all components prior to generating the font.

Nick Shinn · March 2019

Thomas Phinney said:
Not sure what you mean by “the default.”

They are the ones that appear in the standard encodings in FontLab 5.1.4, which is the latest version I'm using.

AbrahamLee · March 2019

Thanks everyone for the comments. It’s been very insightful.

Any others you’ve learned as operating systems and apps have evolved?

LeMo aka PatternMan aka Frank E Blokland · March 2019

It is good to read that OTM is considered quite useful for post-production processing. We made a start with posting testimonials from OTM users here. Any additional testimonials are very welcome, of course.

Thomas Phinney · March 2019

Nick Shinn said:

Thomas Phinney said:
Not sure what you mean by “the default.”

They are the ones that appear in the standard encodings in FontLab 5.1.4, which is the latest version I'm using.

Sure, and you should quite possibly include them because they are in those old encodings, and they still have some relevance for that reason. FontLab VI still has those old encodings. But if you are going to make a font to more modern specs, and use dynamically combined accents, using the diacritic slots intended for that purpose makes sense.

Mark Simonson · March 2019

Does FLS 5.1.4 even support combining accents and mark-to-base/mark-to-mark?

Thomas Phinney · April 2019

Not directly. You need this: https://github.com/adobe-type-tools/fontlab-scripts/blob/master/Anchors/MarkFeatureGenerator.py

Hat tip to @Adam Twardoch for pointing us at that.

John Savard · April 2019

AbrahamLee said:

Thanks, John. When you say “make fonts to spec”, are you referring to industry font specs (e.g., the OpenType spec) or customer specs/requirements?

I thought it was obvious from the context that he meant the former. Of course, while he should make fonts to spec for general sale, the customers do not benefit if the font doesn't work on the tool they need it to work on.

I've noticed that a few free fonts don't work on Microsoft's WordPad. But the reason isn't because the fonts haven't done something nonstandard, but simply because WordPad only supports a limited part of the standard.

So Goudy Bookletter 1911 doesn't work because it doesn't include the default Windows encoding (as I learned from Microsoft Write, on which it also didn't work without some workaround effort). STIX apparently doesn't work because its outlines are Adobe outlines.

If a font that both complies strictly with the OpenType spec, and which is made to the lowest common denominator, using only basic features that are supported by pretty much every application that can use any fonts at all, still doesn't work on a particular application, then, in general, there's not much that can be done about it that is reasonable, but making it work for a specific application is a legitimate extra-cost service. I don't know how many applications out there are unable to make use of fonts that work on WordPad, but that is definitely the application's fault.

I'm not saying people shouldn't make fonts designed to use advanced features. Obviously such fonts will usually be used on well-known design programs which support those features. And I presume that most of those programs aren't restricted to nonstandard fonts. I wonder what the offenders are. And how the giant font companies like Monotype Imaging, with their larger resources, manage to deal with this.

Nick Shinn said:
I thought I would be really clever and make the /i character as a composite from dotless i and dot accent. But of course, Word doesn’t like that!

Since the character "i" is a part of the normal ASCII character set, it will be used frequently. This would cause printing to consume extra machine cycles! (That is, even in the case where it worked perfectly.)

Kent Lew said:

Given that the dot accent outline is being referenced as a component, I don’t understand how its advance width would come into play during rendering. But it does seem that this could be the source of the bug. FWIW, I’ve only seen examples with the zero-width dotaccentcmb U+0307 used for this type of construction. (And I’ve never tested it in Word on Windows.)

What, you mean the rule isn't that the rule for composite characters is not that the advance width of the result is the greatest of those of any of their components? (Plus, in the case of any of those components, any displacement applied to their location in the compound character.)

Kent Lew · April 2019

In the discussion of “make the /i character as a composite from dotless i and dot accent” I think perhaps we need to distinguish between

a) making the /i glyph from components of /dotlessi and /dotaccentcmb, and

b) generating an /i character on-the-fly via {ccmp} & {mark} or some other mechanism to combine /dotlessi and /dotaccentcmb using GPOS.

In the latter case, yes, advance widths could conceivably come into play in some rendering environment due to different levels of support or interpretation for {mark} and GPOS.

But I took Nick to mean the first case, given his use of the term “component”, which I take to be a reference within the glyph outline description, and which is what I have been discussing.

Thomas Phinney · April 2019

Ah, I took Nick to mean the second case (GPOS). The first case, of components, is so very commonplace that I would be quite surprised if this problem existed and we were unaware of it. I have certainly built many fonts that way.

It will be enlightening to learn/confirm which he meant!

Erwin Denissen · April 2019

John Savard said:

So Goudy Bookletter 1911 doesn't work because it doesn't include the default Windows encoding (as I learned from Microsoft Write, on which it also didn't work without some workaround effort). STIX apparently doesn't work because its outlines are Adobe outlines.

Some problems can be easily avoided by using a more recent version of your font editor. I'm not saying that solves all technical issues, and sometimes a font editor does more harm than good, so always keep testing your fonts.

I only took a quick look at Goudy Bookletter 1911, and it seems to have valid mappings, so I'm not sure what causes the problem. Still just opening it with FontCreator and then exporting it again fixed the issue.

STIX has a corrupt table. Again just opening it with FontCreator and the exporting it fixed the problem.

John Savard said:

Nick Shinn said:
I thought I would be really clever and make the /i character as a composite from dotless i and dot accent. But of course, Word doesn’t like that!

Since the character "i" is a part of the normal ASCII character set, it will be used frequently. This would cause printing to consume extra machine cycles! (That is, even in the case where it worked perfectly.)

Using a simple or a composite glyph shouldn't make a difference, and the extra CPU cycles are negligible. In an earlier post I mentioned I made such font and it worked perfectly in Word. I think it is too easy to blame Word. I would really like to see a font with such composite that fails to work with Word...

Thomas Phinney · April 2019

I suspect that one issue with WordPad is that it uses the GDI+ API, which IIRC does not support OpenType CFF (.otf) fonts. TrueType (.ttf) only for GDI+, I think.

Erwin Denissen · April 2019

Thomas Phinney said:

I suspect that one issue with WordPad is that it uses the GDI+ API, which IIRC does not support OpenType CFF (.otf) fonts. TrueType (.ttf) only for GDI+, I think.

WordPad has no issues with the CFF based font I generated out of Goudy Bookletter 1911 with FontCreator.

Kent Lew · April 2019

Ah, I took Nick to mean the second case (GPOS).

I wondered if that might be so in several of the responses here.

That seems like a very cumbersome approach to me. First of all, it’s very unlikely that any user (or keyboard) is inputting u+0131 plus u+0307 rather than just typing an “i” u+0069.

So, you still need to have some glyph present in the font to map 0x0069 to in the <cmap> table. (Otherwise, the font won’t respond when an “i” is typed, right?)

Why not have that be an /i glyph? You could certainly construct that glyph from components, i.e., my first case above.

But, in order to implement the second case GPOS solution, you’d then need to also have a GSUB Type 2 one-to-many decomposition to go from the /i glyph to /dotlessi plus /dotaccentcmb, presumably registered to {ccmp}. And then you’d need a {mark} GPOS to position the combining dot accent over the base.

Seems too clever by half.

Thomas Phinney · April 2019

Kent Lew said:

Ah, I took Nick to mean the second case (GPOS).
I wondered if that might be so in several of the responses here.
That seems like a very cumbersome approach to me. First of all, it’s very unlikely that any user (or keyboard) is inputting u+0131 plus u+0307 rather than just typing an “i” u+0069.
So, you still need to have some glyph present in the font to map 0x0069 to in the <cmap> table. (Otherwise, the font won’t respond when an “i” is typed, right?)

Nope. It’s not “the font” responding when you type something. It’s an app, and it is using either its own or system-level services for Unicode support.

Unicode specifies that for those precomposed characters which correspond to certain sequences of combining characters, they are “canonically equivalent” and should be treated the same way. In the absence of the precomposed characters, apps/systems should look for the combining characters. (In theory, if they are smart enough. Which I say because Unicode compliance in the real world is not just a binary switch you flip.)

In heavily Unicode-savvy environments, this means that if the font doesn’t have a precomposed, say, eacute, it will then instead be asked “how about e + combining acute?” because that is canonically equivalent. If it has that, it falls back to it. If not, it *might* (or might not) try e + spacing acute as a next fallback.

(There are some extra complexities here with i vs dotless i as a component, so I am using e so as to simplify the example.)

Ray Larabie · April 2019

I've been making my i and j using components for dotlessi and dotlessj with dot accents in my fonts for a while. They appear normally in Wordpad for Windows 10. Here's Bitcrusher at 1001fonts.com if anyone wants to test it.

Cory Maylett · April 2019

Thomas Phinney said:

Sure, and you should quite possibly include them because they are in those old encodings, and they still have some relevance for that reason. FontLab VI still has those old encodings. But if you are going to make a font to more modern specs, and use dynamically combined accents, using the diacritic slots intended for that purpose makes sense.

Just to clarify, are you suggesting it might be a good idea to use both in the same font, as in upper and lowercase diacritics plus combining versions of each? That's four separate slots for every accent character.

notdef · April 2019

@Cory Maylett The spacing diacritics are used for
a) talking about diacritical marks, such as the dieresis: ¨ (though space/dotted circle+combining diacritics is a good alternative)
b) displayed as “dead key” in some circumstances (On my Mac I type ¨+e to get ë, and the computer briefly displays the spacing dieresis before I hit the e)
c) punctuation symbols/letters, either by default or as a fallback option, in certain languages (Tongan use of the ´, sometimes rendered after the vowel, sometimes with modified fonts shifting the acute to the right comes to mind.)

There are also reasons for including multiple versions of the combining diacritical marks:
a) they may require or benefit from different shapes above capitals and lowercase – also above/below small caps, superiors, etc.
b) TrueType instructions controlling their position and rendering may require separate glyphs for each.
c) broad, and well executed, language support benefits form narrow and wide variations above certain base letterforms.

Kent Lew · April 2019

Nope. It’s not “the font” responding when you type something. It’s an app, and it is using either its own or system-level services for Unicode support.

Okay, fair enough. As the app is processing the Unicode via its own or system-level services, it’s going to be checking the <cmap> table in the font to see if there is a matching key to map the codepoint to a glyph (which I characterized as the font “responding”).

And yes, there are Unicode-defined canonically equivalent options or fallbacks that can be tried if certain accented characters are not present in the <cmap>. So a font can support “é” even if the <cmap> doesn’t have an entry for 0x00E9, if instead it has /e and a combining acute (u0301).

That’s a good clarification for the general understanding. Agreed.

But I was talking specifically about “i” here. Is that really canonically equivalent to u0131+u0307?

In heavily Unicode-savvy environments, if the font doesn’t have a precomposed “i”, will it then instead be asked “how about dotlessi + combining dotaccent?”

I’m asking seriously, not just rhetorically (albeit perhaps a bit skeptically).

I’ve just assumed that since “i” is such a core element of the Latin orthography that it is not treated by Unicode as a precomposed form of dotlessi plus dotaccent. (Except perhaps within the context of Turkish language — but even then, I believe u0069 is canonically equivalent to u0069 + u0307.)

As you say, “i” is a unique case.

notdef · April 2019

According to FileFormat.info (I’d go directly to the source, but I don’t know where exactly Unicode lists this information), /i/ is not canonically equivalent with dotlessi + combining dot above.

George Thomas · April 2019

They aren't equivalent. The Unicode 11 book on page 291 says this:

"Diacritics on i and j. A dotted (normal) i or j followed by some common nonspacing marks above loses the dot in rendering. Thus, in the word naïve, the ï could be spelled with i + diaeresis. A dotted-i is not equivalent to a Turkish dotless-i + overdot, nor are other cases of accented dotted-i equivalent to accented dotless-i (for example, i + ¨ is not equivalent to dotless-i + ¨). The same pattern is used for j. Dotless-j is used in the Landsmålsalfabet, where it does not have a case pair.

To express the forms sometimes used in the Baltic (where the dot is retained under a top accent in dictionaries), use i + overdot + accent (see Figure 7-2).

All characters that use their dot in this manner have the Soft_Dotted property in Unicode."

Kent Lew · April 2019

Thanks for finding that reference, George. I quickly perused my Unicode 8 document and didn’t find it this morning. But now I see what you quoted, in Chapter 7.1. As I suspected.

Michael Rafailyk · October 2021

Andreas Stötzner said:
I always dissolve all components prior to generating the font.

I've researched system fonts and some of them leave components in production but some don't.
Composite diacritics: Arial, Baskerville, Charter, Cochin, Georgia, Helvetica, Hoefler, Lucida Grande, Menlo, New York, Palatino, SF, Tahoma, Times, Verdana, Zapfino.

Flat diacritics without components: Bodoni, Copperplate, Didot, Futura.

@John Hudson Are there any limitations of the OpenType standard in the use of components in the production version? In-app support, etc.

Michael Rafailyk · October 2021

Thomas Phinney said:
But it is quite clear from Unicode which characters are combining and which are not. If you make a new-fangled font that relies on dynamically combining accents with base characters, it surely makes sense to use... the combining accents, and not the stand-alone spacing accents.

@Thomas Phinney It make sense, but I noticed that FontLab replaces combined marks with not combined, on export.
I generated "A+acutecomb" (0301 acutecomb with 0 width) and I got character of this two components as expected. But after export I figure out that Aacute contain "A+acute" components but not "A+acutecomb" and it's not that I expect to see. Quickly inspected some of system fonts with composite diacritics, I saw the same everywhere — Aacute contain A+acute, etc.
Is it ok?

John Hudson · October 2021

@John Hudson Are there any limitations of the OpenType standard in the use of components in the production version?

Composites are a TrueType mechanism (CFF fonts are decomposed, but may use subroutines to compress storage of identical outline segments).

I am not aware of any issues with using simple components in TTFs, e.g. base + mark precomposed diacritic glyphs. But the TrueType spec also allows for component transforms, i.e. scaled, rotated and flipped components, and these do have longstanding associated issues, and we always avoid them. I don’t recall specifics of the issues, i.e. what software was/is affected, only that Microsoft advised against using component transforms in fonts.

I also prefer to avoid nested components, although that is mostly due to wanting to simplify composite management in my sources. In the spec, it is possible to make a component of a composite, e.g. /aringacute/ could be made of components /aring/+/acutecomb/ (where /aring/ is itself a composite of /a/+/ringcomb/), but I prefer a composite of /a/+/ringcomb/+/acutecomb/.

[FontLab Studio 5 did not support nested composites, which suited me fine. FontLab 7 does, and I am wishing for a preference setting to never use them. It is a pain in the neck trying to manage composites and find that some nested ones have crept in because of FL7’s default behaviour]

John Hudson · October 2021

I generated "A+acutecomb" (0301 acutecomb with 0 width) and I got character of this two components as expected. But after export I figure out that Aacute contain "A+acute" components but not "A+acutecomb" and it's not that I expect to see. Quickly inspected some of system fonts with composite diacritics, I saw the same everywhere — Aacute contain A+acute, etc.

Is your /acutecomb/ glyph itself a composite of the /acute/ glyph?

I am not seeing the behaviour you describe in any of my fonts.

Florian Pircher · October 2021

John Hudson said:
I am not aware of any issues with using simple components in TTFs, e.g. base + mark precomposed diacritic glyphs.

By default, Glyphs decomposes overlapping components on TT export. That might also be of interest to other TT production systems. I am not sure how relevant that is in today’s software landscape, and the behavior can be disabled if desired. Accents rarely overlap, anyway.

John Hudson · October 2021

TrueType renderers, to my knowledge, have never had an issue with overlapping components (or overlapping outline contours in general)—I regularly build cedilla diacritics with overlapping components—, but there may be issues when such composites are converted to outlines in bezier environments such as Illustrator.

In the variable font world, overlapping contours and components are very much expected (which is the primary reason for CFF2).

Howdy, Stranger!

Quick Links

Categories

Font production frustrations and solutions

Comments