Contextually offsetting diacritics

LeMo aka PatternMan aka Frank E Blokland · June 2018

When demonstrating OTM at TYPO Labs 2018 last April, I brieﬂy showed a small experiment with contextually offsetting diacritics that I made last year. Inspired by a Renaissance practice of moving diacritics sideways to the right to prevent collisions with terminals as, for example, can be found in the famous roman type of Nicolas Jenson, I wrote a small piece of features code that replaces precomposed characters with diacritics by mark-to-base ones.

Today the prevention of collisions of letter parts is often handled via kerning pairs. However, this can be a rather crude method because one solves the collision at the price of a gap between characters, which cripples the pattern. For example, the combination /f with /? can be improved by positive kerning, but a more elegant alternative is to provide a /f with a shortened terminal for this combination. In case of positioning accented characters, offsetting of the diacritics can be applied whether or not in combination with kerning.

The script below is a rough start that I made last year; unfortunately I did not ﬁnd time to enhance it any further. I use OTM for the precise positioning of the diacritics (in the image above especially the vertical positioning of the dieresis is a bit arbitrary). It is quite possible that the generation of the mark-to-base stuff on basis of the accented characters will be included in FoundryMaster in the near future.

If you are interested, please feel free to adapt and enhance the code below. I am pretty sure that some on this forum will ﬁnd room for improvement of its structure (I am not an expert).

---------------------

# --- Uses GPOS mark-to-base for characters with (contextually positioned) diacritics
# --- For the Latin script
# --- AFDKO syntax
# --- Basis for further development
# --- (c) FEB, last update: 1 May 2017

# --- LANGUAGE SYSTEMS

languagesystem DFLT dflt;
languagesystem latn dflt;

# --- KERNING
# --- To test the interaction with the stuff below:

feature kern {
pos T a -60;
pos T aacute -35;
pos T adieresis -35;
pos T ccedilla -75;
pos T e -70;
pos T eacute -70;
pos T edieresis -70;
pos T iacute 15;
pos T oacute -45;
pos T ocircumflex -15;
pos T yacute 15;
pos f a 20;
pos f aacute 10;
pos f adieresis 20;
pos f eacute -5;
pos f edieresis -15;
pos f iacute 25;
pos f oacute -10;
pos f yacute 40;
} kern;

# --- MARK CLASSES
# --- To prevent that all diacritics are combined with all base letters, lookups are used to form groups

lookup MRKCLS_1 {

# --- Standard combinations with diacritics on top:

markClass [gravecomb acutecomb circumflexcomb dieresiscomb tildecomb caroncomb macroncomb] <anchor 0 0> @DIACRITIC_TOP_1;

pos base [a c e o s u y z] <anchor 0 0> mark @DIACRITIC_TOP_1;

} MRKCLS_1;

lookup MRKCLS_2 {

# --- Variant of MRKCLS_1 to offset diacritics in relation to T (‘calt' feature):

markClass [gravecomb acutecomb circumflexcomb dieresiscomb tildecomb caroncomb macroncomb] <anchor -50 0> @DIACRITIC_TOP_2;

pos base [a c e o s u y z] <anchor 0 0> mark @DIACRITIC_TOP_2;

} MRKCLS_2;

lookup MRKCLS_3 {

# --- Variant of MRKCLS_1 to offset diacritics in relation to f (‘calt' feature):

markClass [gravecomb acutecomb circumflexcomb dieresiscomb tildecomb caroncomb macroncomb] <anchor -30 0> @DIACRITIC_TOP_3;

pos base [a c e o s u y z] <anchor 0 0> mark @DIACRITIC_TOP_3;

} MRKCLS_3;

lookup MRKCLS_4 {

# --- Adapted diacritics for i and j:

markClass [gravecomb.i acutecomb.i circumflexcomb.i dieresiscomb.i tildecomb.i caroncomb.i macroncomb.i] <anchor 0 0> @DIACRITIC_TOP_4;

pos base [dotlessi] <anchor 0 0> mark @DIACRITIC_TOP_4;

} MRKCLS_4;

lookup MRKCLS_5 {

# --- Combinations with diacritics below baseline:

markClass [cedillacomb ogonekcomb] <anchor 0 0> @DIACRITIC_BELOW_1;

pos base [a c] <anchor 0 0> mark @DIACRITIC_BELOW_1;

} MRKCLS_5;

feature ccmp {

# --- Glyph Composition/Decomposition:

# --- Substitutes the i and j:

sub i' @DIACRITIC_TOP_3 by dotlessi;

sub j' @DIACRITIC_TOP_3 by dotlessj;

# --- Substitutes precomposed characters with diacritics with mark-to-base variants:
sub agrave by a gravecomb;
sub egrave by e gravecomb;
sub igrave by i gravecomb.i;
sub ograve by o gravecomb;
sub ugrave by u gravecomb;
sub aacute by a acutecomb;
sub eacute by e acutecomb;
sub iacute by i acutecomb.i;
sub oacute by o acutecomb;
sub uacute by u acutecomb;
sub yacute by y acutecomb;
sub acircumflex by a circumflexcomb;
sub ecircumflex by e circumflexcomb;
sub icircumflex by i circumflexcomb.i;
sub ocircumflex by o circumflexcomb;
sub ucircumflex by u circumflexcomb;
sub adieresis by a dieresiscomb;
sub edieresis by e dieresiscomb;
sub idieresis by i dieresiscomb.i;
sub odieresis by o dieresiscomb;
sub udieresis by u dieresiscomb;
sub ydieresis by y dieresiscomb;
sub atilde by a tildecomb;
sub etilde by e tildecomb;
sub itilde by i tildecomb.i;
sub ntilde by n tildecomb;
sub otilde by o tildecomb;
sub utilde by u tildecomb;
sub ytilde by y tildecomb;
sub scaron by s caroncomb;
sub zcaron by z caroncomb;
} ccmp;

feature calt {

# --- For offsetting diacritics based on contextual characters:

sub T @DIACRITIC_TOP_1' by @DIACRITIC_TOP_2;
sub f @DIACRITIC_TOP_1' by @DIACRITIC_TOP_3;

} calt;

feature mark {

# --- Mark-to-base positioning

lookup MRKCLS_1;
lookup MRKCLS_2;
lookup MRKCLS_3;
lookup MRKCLS_4;
lookup MRKCLS_5;

} mark;

# --- Declaring base characters and marks for the Glyph Definition table:

@BASE = [a c e o s u y z dotlessi];

@MARKS = [@DIACRITIC_TOP_1 @DIACRITIC_TOP_2 @DIACRITIC_TOP_3 @DIACRITIC_TOP_4 @DIACRITIC_BELOW_1];

table GDEF {

GlyphClassDef @BASE,,@MARKS,;

} GDEF;

---------------------

Vasil Stanev · June 2018

You've been busy

Nice job, although me non comprende the code. Personally, I prefer to make ligatures and change the letter before the one with the diacritic. So e.g. in T+asieresis combinations, I shorten the right hand of the T

Same goes for ТЪ.

Are there combinations like ffá and the like?
Do you have a similar script for Vietnamese?

Christian Thalmann · June 2018

Do those offset diacritics actually look good in a wordshape? The /T/ä up there looks a bit tortured, but it's hard to tell without context.

Chris Lozos · June 2018

I usually only see the problem with diacritics over the i. I usually draw a set of diacritics that are narrower than normal for those glyphs only and make a kern class for those glyphs as well.

LeMo aka PatternMan aka Frank E Blokland · June 2018

I shorten the right hand of the T […]

Adapting the shape to the combination may work well for the /T, but is this solution also possible for, for example, the /V, /W, and /Y?

I prefer to make ligatures […]

Of course, that is a solution too. However, this could make the character set quite extensive. Whether one solution should exclude another one, remains open for discussion. After all, every solution has its restrictions and perhaps a combination of them would provide the best result?

Are there combinations like ffá and the like?

Do you have a similar script for Vietnamese?

One can add any character combination that one considers applicable to the code, including for Vietnamese, I reckon.

The /T/ä up there looks a bit tortured, but it's hard to tell without context.

The top image from Jenson’s De Evangelica Praeparatione shows the same sort of shifting and this archetypal roman is considered one of the best –if not the best, because it set the standard (for patterning). We are not much used to, i.e., conditioned with, this shifting of diacritics and that could be a reason for considering it unusual. One could state though, that if this offsetting works for Jenson’s roman, this does not by deﬁntion imply that it will work for any type design. I can imagine that a more condensed roman than the one applied in De Evangelica Praeparatione is less suitable for this solution.

I usually only see the problem with diacritics over the i.

Personally I see many more problematic combinations, if only looking at the number kerning pairs for accented characters. That being said, I would like to emphasize here that this is an experiment of which its practicality can be always questioned by referring to one’s daily practice.

notdef · June 2018

Precomposed letter-accent combinations encoded in Unicode, such as ë (“sub edieresis by e dieresiscomb”), are not decomposed in the Adobe Paragraph Composer. And if you type an “e” followed by a “combining dieresis”, the result is one glyph – not two. I am not sure if you can reliably modify positioning of marks once they are positioned with an anchor.

Christian Thalmann · June 2018

LeMo aka PatternMan aka Frank E Blokland said:

The top image from Jenson’s De Evangelica Praeparatione shows the same sort of shifting and this archetypal roman is considered one of the best –if not the best, because it set the standard (for patterning).

Just because Jenson's work was groundbreaking and standard-defining doesn't mean he didn't get things wrong.

We are not much used to, i.e., conditioned with, this shifting of diacritics and that could be a reason for considering it unusual.

That's putting it mildly. To my eye, that tiny stratospheric tittle looks detached and forlorn. I doubt something like that can work amongst modern sensibilities.

Bahman Eslami · June 2018

Why not putting the contextual mark positioning in the 'mark' feature, does it have any advantage to put it in the 'calt'?

Kent Lew · June 2018

Precomposed letter-accent combinations encoded in Unicode, such as ë (“sub edieresis by e dieresiscomb”), are not decomposed in the Adobe Paragraph Composer.

But they are by the Adobe World-Ready Paragraph Composer, FWIW.

LeMo aka PatternMan aka Frank E Blokland · June 2018

[…] does it have any advantage to put it in the 'calt'?

Not really, I reckon. I did this for demonstrating how kerning can be applied on precomposed characters for preventing collisions versus the offset of diacritics, by toggling on/of the features in OTM’s text viewer (which has HarfBuzz under the bonnet).

I doubt something like that can work amongst modern sensibilities.

As such this is not a plea for reintroducing Renaissance peculiarities, like the dot of Jenson’s /i. I am just questioning whether Jenson’s shifting of diacritics, which was a simple technical solution to prevent collisions, can be translated in a more versatile present-day system. The answer could well be negative after a solid evaluation. I don’t see any risk in that.

ivan louette · June 2018

For me that's an evidence : in the Jenson case this shift contributes also to the visual dynamics of the font. That's not only a question of collisions (even if that solves elegantly this issue too).

John Hudson · June 2018

But they are by the Adobe World-Ready Paragraph Composer, FWIW.

Not so far as I have observed. If they were, I wouldn't have had a bug lodged with InDesign since CS5 that breaks mark-to-diacritic handling in Brill and Cambria, both of which rely on <ccmp> decomposition of precomposed diacritics that neither InDesign composer applies.

Kent Lew · June 2018

Hmm. I thought I tested it — with Brill, in fact. I’ll have to double-check what I thought I observed.

Kent Lew · June 2018

Okay, that’s interesting. I created a new test, taking a WIP font and putting in just the following test feature:

feature ccmp { #testing one-to-many<br>    sub edieresis by e dieresiscmb;<br>    sub ecircumflex by e tildecmb;<br>} ccmp;

I decomposed the edieresis glyph and shifted the dieresiscmb glyph up noticeably to be able to distinguish between the precomposed glyph and the combining accent.

In InDesign, I typed “ëê” and set the World-Ready Paragraph Composer. The edieresis rendered as the precomposed glyph. But the ecircumflex did indeed get replaced by e with a combining tilde.

Note that in the second instance, I purposely decomposed to a mismatched combination (and I don’t have a precomposed etilde in the font), which may be why it worked.

The only way I can think to explain this behavior is that InDesign is not actually ignoring the decomposition, but is subsequently recombining — based upon the underlying encoding of the substituted glyphs, which strikes me as highly unintuitive and noncompliant behavior.

My earlier belief that it was decomposing correctly was based upon a test of only the second case (which was the simplest thing to set up quickly for easy detection). And that’s why I got a false positive.

(I don’t recall what I tried with the Brill font. I thought I’d chosen a combination from your ccmp tables that would clearly show up a decomposition, but perhaps I inadvertently triggered a precomposed double-accent glyph that I wasn’t aware of.)

John Hudson · June 2018

The only way I can think to explain this behavior is that InDesign is not actually ignoring the decomposition, but is subsequently recombining — based upon the underlying encoding of the substituted glyphs, which strikes me as highly unintuitive and noncompliant behavior.

Yes. Initially, I thought InDesign was failing to apply contextual ccmp substitutions (i.e. decomposing precomposed diacritics only when followed by a combining mark), but then noted that the same problems occurred when I removed the context statement from the lookups. As you say, it looks like they're post-processing recomposition of Unicode precomposed diacritics after running ccmp. Which is as you say, unintuitive and non-compliant, and to which I might add a few less polite adjectives.

Bhikkhu Pesala · June 2018

I use a different approach. I have a set of low profile diacritics for use with Small Capitals or Capitals. The umlaut, tilde, and macron are narrower for narrow composites like ï and Ï

Kent Lew · June 2018

it looks like they're post-processing recomposition of Unicode precomposed diacritics after running ccmp.

The strangest thing, to me, is that it is using the codepoint of the substituted accent to recompose the precomposed glyph.

If I put in the following non-matching decomposition:

    sub acircumflex by a brevecmb;

then the World-Ready Composer will render â as a precomposed abreve.

The only way that could be happening is if the composer is actually sniffing out the codepoint mapped to the combining accent glyph.

(Which defies everything we said in that other discussion about using features to insert a wordjoiner to prevent hyphenation of Basque digraphs — i.e., that substituting a glyph would not get the character codepoint into the mix.)

John Hudson · June 2018

then the World-Ready Composer will render â as a precomposed abreve.

You're able to confirm that the resulting display is the precompoed /abreve/ glyph and not a + combining breve?

Kent Lew · June 2018

Yes. I purposely shifted the position of the breve component dramatically in the precomposed abreve glyph in order to be able to distinguish just that.

Contextually offsetting diacritics

Comments

Categories