Initial and Medial-Final Latin characters

Jacob Casal · May 2019

A feature unretained in digitized Didots (as far as I have seen) that I intend to retain in mine is the peculiar italic /v, which consistently has an initial and medial-final form across manuscripts. I suspect a similar treatment may have been given to /w, but I am still parsing through the few manuscripts that have /w to find out:

Image: https://us.v-cdn.net/5019405/uploads/editor/ze/gflefkyph2lx.png

One sees either one or the other used in the modern designs (cf. HTF Didot and LT Didot; also this undated file).

Having read through Arabicshaping.txt and articles on Unicode compatibility characters and on Unicode equivalence, I see that it is not so simple as designing a “/v.medi.” Would one go about this by perhaps designing the medial-final variant of the /v and that transforms into the initial /v when paired with specific characters such as a /space. I can’t help but think there’s a better answer than that.

Craig Eliason · May 2019

Though this isn’t a swash exactly, you may find guidance in OpenType feature code that others have devised for swashes.
But I think your stab at the method is probably correct.

Georg Seifert · May 2019

This tutorial might help: https://glyphsapp.com/tutorials/features-part-4-positional-alternates

It is written for Glyphs but the Feature code is the same.

Paul Miller · May 2019

Version 1.8 of the OpenType specification was released in September 2016 and has significantly revised the following features: Initial Forms {init}, Isolated Forms {isol}, Medial Forms {medi}, and Terminal Forms {fina}.

They should no longer be used with Latin script. Most OpenType layout engines haven't been updated yet, but sooner or later they might drop support for Latin based init, medi, and fina.

Kent Lew · May 2019

This situation is encountered frequently when writing features for connecting scripts. For designs where the default forms of glyphs have both incoming and outgoing connections, it’s not uncommon to substitute initial variants that modify or eliminate the incoming stroke for some. The approach for your v would be the same.

I have usually used {calt} for this.

John Savard · May 2019

Paul Miller said:

Version 1.8 of the OpenType specification was released in September 2016 and has significantly revised the following features: Initial Forms {init}, Isolated Forms {isol}, Medial Forms {medi}, and Terminal Forms {fina}.
They should no longer be used with Latin script. Most OpenType layout engines haven't been updated yet, but sooner or later they might drop support for Latin based init, medi, and fina.

I certainly thank you for the useful information, but I am also surprised.

In that case, won't some fonts cease to display properly when operating systems are updated? Since fonts cost money, and companies that make fonts sometimes go out of business, one would expect that strict upwards compatibility should have been the rule for a font format standard.

Of course, if the font format includes coding for a version number, then as long as Open Type engines continue to support all previous versions, changes in new versions would not cause that issue.

André G. Isaak · May 2019

The medi, init, isol, and fina features were originally proposed for language like Arabic and Syriac. Extending these to language like English might seem useful, but it would in fact be rather problematic since the names of the features are potentially misleading. 'fina', for example, *doesn't* refer to a word final character, but rather to the final character in a sequence of connected characters which may or may not be at the end of the word (e.g. a word like Quran ‘ القرآن‎’ contains initial (ل), final (ر), and isolated (آ) characters in word-medial positions). For scripts like Arabic, what constitutes a connected run is well-defined. For Latin-based scripts, not so much.

For Arabic, the shaping engine can determine whether a given character should be rendered as a connected form or not. For latin-based scripts, on the other hand, which characters do or do not connect is font-specific, and thus can't be handled by the shaping engine. I'm assuming this is why connections in English are left to 'calt' rather than to the positional features defined for some Semitic languages.

I've also encountered a few latin fonts which use 'init' and 'fina' to implement decorative initial or final forms which completely ignore questions of punctuation. Some final forms would be acceptable with a following question or exclamation mark, others not so much. An 'n' with a swashy thing on the right might be fine with a following quotation mark; a 't' with an elongated bar not so much...

André G. Isaak · May 2019

Wow, that worked. I'm always nervous about mixing Arabic with latin punctuation — All sorts of BIDI directional weirdness can arise.

Paul Miller · May 2019

John Savard said:

I certainly thank you for the useful information, but I am also surprised.

In that case, won't some fonts cease to display properly when operating systems are updated? Since fonts cost money, and companies that make fonts sometimes go out of business, one would expect that strict upwards compatibility should have been the rule for a font format standard.

Of course, if the font format includes coding for a version number, then as long as Open Type engines continue to support all previous versions, changes in new versions would not cause that issue.

As André said this feature was meant for Arabic and it's use in Latin was considered a miss-use. What they said is that it isn't supported in Latin but it does continue to be supported in Arabic.

I'm sure people will continue to (miss)use this feature, but if you use it in a Latin font you're designing be warned that the users might not get what they are expecting and that probability will increase over time.

Thomas Phinney · May 2019

I am definitely guilty of promoting some of that misuse, ~ 15–20 years ago.

Sorry about that.

Jacob Casal · May 2019

Thanks for the admonition Paul, that helps dodge a potential bullet. After a bunch of reading on features (the rest of the Glyphs tutorials on the matter were still helpful, as was The Opentype Cookbook) and looking at some other fonts’ code for them I came up with something like this, keeping it small in case it’s wrong or obtuse as my first code written for a lookup and feature (its like I’m learning SPSS all over again haha):

 \quotedblright by \quotedblleft \v.alt \quotedblright;
} caltLatinContextualAlternateslookup0;
feature calt {
  lookup caltLatinContextualAlternateslookup0;
} calt;
</code>@spacepunct = [\space \hyphen ];<br>lookup caltLatinContextualAlternateslookup0 {<br>  lookupflag 0;<br>   sub @spacepunct \v by @spacepunct \v.alt;<br>   sub \quotedblleft \v<code>

Though the Glyphs tutorial may need revising against the Latin usage of those features, it provides ample opportunity to go a little into something like how Mongolian script or Arabic use those features (Admittedly not having read too much into other connected scripts, Mongolian certainly gets very interesting in its extent of using these features, but I am biased toward it having followed its digital use for a time.)

Edit: Hmm, no I think I need to add a substitution lookup too for multiple glyphs to change as it gives an error… Ah, nevermind, I got it to work.

 \quotedblright by \v.alt;
  } caltLatinContextualAlternateslookup0;
feature calt {
    lookup caltLatinContextualAlternateslookup0;
  } calt;</code>@spacepunct = [\space \hyphen];<br>lookup caltLatinContextualAlternateslookup0 {<br>    lookupflag 0;<br>    sub @spacepunct \v' by \v.alt;<br>    sub \quotedblleft \v'<code>

Paul Miller · May 2019

Apart from adding fancy alternatives to letters based on their positions 'fina' is used for substituting the final sigma for sigma in Greek. And I have done this in some of my fonts.

I guess I will have to change them. Jacob's code looks interesting, I will take a look and see If I can get it to work in one of my fonts.

Kent Lew · May 2019

The shortcoming of your approach to this feature is that it won’t account for initial characters at the beginning of a line, since there is no preceding trigger.

Usually, I approach these situations from the opposite angle, using an ignore sub (Adobe syntax).

Khaled Hosny · May 2019

They should no longer be used with Latin script. Most OpenType layout engines haven't been updated yet, but sooner or later they might drop support for Latin based init, medi, and fina.

It is the other way around, almost all OpenType layout engines only supported these features for scripts that has Arabic-like shaping behavior, AFAIK the only exception was InDesign.

John Savard · May 2019

Paul Miller said:

Apart from adding fancy alternatives to letters based on their positions 'fina' is used for substituting the final sigma for sigma in Greek. And I have done this in some of my fonts.

Of course, this shows that Greek, at least, has a legitimate need for 'final', so if that feature was excluded from Greek in addition to Latin, that was a mistake.

What surprises me is that this change means that an OpenType renderer would have to have an immense table tellling it which Unicode characters are Latin and which are non-Latin. That just seems silly; features should work the same way on all glyphs, and language-specific details should be left to the font designer.

Of course, I'm forgetting that computers these days have gigabytes of RAM; we're not living in the days when a computer might have 4K 12-bit words of core memory, and people were thrilled to have it.

André G. Isaak · May 2019

John Savard said:

What surprises me is that this change means that an OpenType renderer would have to have an immense table tellling it which Unicode characters are Latin and which are non-Latin. That just seems silly; features should work the same way on all glyphs, and language-specific details should be left to the font designer.

The rendered *has* to know which unicode characters belong to which ranges. Otherwise it wouldn't know which shaping engine to use. Plus, of course, unicode fonts will contain different lookup sets for each script value, which also requires this. So OpenType has always had to know this information.

André G. Isaak · May 2019

Also, I'm not sure whether it is typical to handle final sigma using OpenType features. Final sigma has its own unicode point, and its own position on Greek keyboard layouts which I assume Greek speakers are used to dealing with. Similarly for Hebrew, I don't think unicode layout is expected to handle word-final forms.

John Savard · May 2019

André G. Isaak said:

Otherwise it wouldn't know which shaping engine to use.

Huh? I thought a Bezier curve was the same in any language.

André G. Isaak · May 2019

The shaping engine doesn't render bezier curves. It's responsible for script-specific behaviour and preprocessing. That would include substituting positional forms in Arabic, or reordering characters in Indic Scripts before the glyph runs are passed on to the application.

https://en.wikipedia.org/wiki/Complex_text_layout

Thomas Phinney · May 2019

Yes, “shaping” has a special (and not entirely obvious) meaning in this context.

Khaled Hosny · May 2019

The Unicode FAQ has an entry for why Greek sigma has a separately encoded final form: https://www.unicode.org/faq/greek.html#5

Peter Baker · May 2019

A lookup that depends on the space character is problematic. I'm not sure why, but with some apps the shaping engine won't see the space, so the lookup will fail. I found this somewhere (can't remember where—but I can't take credit), and it seems to work reliably. In this case it substitutes an alternate p at the beginning of a word:

lookup calt_04 {<br>&nbsp; lookupflag IgnoreMarks;<br>&nbsp; ignore sub [@AllLowerCase @AllUpperCase] p' ;<br>&nbsp; sub p' by p.alt ;<br>} calt_04;

You need big classes for all (Latin) alphabetic characters, of course, but Glyphs, and I suppose other editors, make these easy to create. And similar but inverted code will work at the end of the word.

Jacob Casal · May 2019

It’s been a bit hectic so I didn’t get to do much more research into the ignore sub rule. I think its starting to make sense to me. The discussion here is enlightening in its own regard as well, many thanks there.

@Khaled Hosny
It is the other way around, almost all OpenType layout engines only supported these features for scripts that has Arabic-like shaping behavior…

So it wasn’t supported, then it was, but it’s still a misuse for Latin in the sense of “just because you can doesn’t mean you should?” That’s at least what I’m gleaning from the discussion.

@Peter Baker and @Kent Lew Thank you for the idea, I’ll adapt it to my syntax and try it out; it should help me understand ignore sub a little more too.

Let me see if I understand what’s going on with the ignore rule here: it’s saying, “Hey! before you look at any subs, Program, if any of these conditions here are met, then you are to ignore the sub commands with this class after this.” Hence, it’s saying here, “Program, if there is ever any of @thesecharacters appear before a regular /v, then ignore the following command to change /v to /v.alt” Yes? Thus:

@allLatin = [\A \B \C \D \E \F \G \H \I \J \K \L \M \N \O \P \Q \R \S \T \U 
\V \W \X \Y \Z \a \b \c \d \e \f \g \h \i \j \k \l \m \n \o \p \q \r \s \t 
\u \v \w \x \y \z];
#etc etc everything necessary gets listed in the class

lookup caltLatinContextualAlternateslookup0 {
    lookupflag 0;
    ignore sub @allLatin \v';
    sub \v' by \v.alt;
  } caltLatinContextualAlternateslookup0;
feature calt {
    lookup caltLatinContextualAlternateslookup0;
  } calt;

Indeed it works! However, this now presents a new problem of contextual line breaks with hyphens (and soft hyphens) with the /v and /v.alt. This is another constant throughout the manuscripts. You see, in the Didot image above “quatre-vingt-trois” separates three words, so “-vingt” gets \v.alt. If the line were to break after “quatre-” (note the hyphen) “vingt” would still get \v.alt as a separate word. But, if the word “gouvernement” were to break in the middle of it, at “gou-,” the /v at the beginning of the next line would still be a regular medial \v. This one is still a headscratcher for me, perhaps a matter of organizing syntax in just the right way.

An example: élever

Image: https://us.v-cdn.net/5019405/uploads/editor/5u/z47j97iobzj2.png

André G. Isaak · May 2019

Indeed it works! However, this now presents a new problem of contextual line breaks with hyphens (and soft hyphens) with the /v and /v.alt. This is another constant throughout the manuscripts. You see, in the Didot image above “quatre-vingt-trois” separates three words, so “-vingt” gets \v.alt. If the line were to break after “quatre-” (note the hyphen) “vingt” would still get \v.alt as a separate word. But, if the word “gouvernement” were to break in the middle of it, at “gou-,” the /v at the beginning of the next line would still be a regular medial \v. This one is still a headscratcher for me, perhaps a matter of organizing syntax in just the right way.

This is simply a problem you'll have no choice but to live with. Once a line breaks (and it doesn't matter whether the break was explicit or put there by the software), you're dealing with a new OpenType run and it's simply not possible for contextual rules to refer to the previous run.

Christian Thalmann · May 2019

Couldn't you have the swashless /v/ as default and substitute the swashy /v/ whenever another letter comes before it? That should be robust vs line breaks.

The downside is that when CALT is not implemented, you're stuck with the swashless /v/.

Jacob Casal · May 2019

@Christian Thalmann I was thinking about that as well. Perhaps when the font is finally finished long down the road I could leave some up front, hard to miss documentation for the user on how to get the most out of the font on various platforms. And hey, on the bright side if they forget to turn on CALT the swashless /v is more space efficient, helpful in a design where Didot had very long ascenders and descenders taking some vertical space. I’ll look into implementing it when I get some time.

Many thanks to everyone for the help!

John Hudson · May 2019

Of course, this shows that Greek, at least, has a legitimate need for 'final', so if that feature was excluded from Greek in addition to Latin, that was a mistake.

No, it wasn't. Greek encodes the final sigma as a separate character, and it is input from the keyboard as such. There has never been any need to try to handle final sigma display as a glyph substitution.

The joining behaviour OTL features rely on shaping engine analysis of character strings and use rule-based application of the features to specific characters based on their adjacent characters. Those rules have to be defined somewhere, and the Unicode ArabicShaping.txt document is the only place where they are defined in a standard resource. There is no standard for joining behaviour for the Latin script: so it is excluded from application of these features on that basis.

[There is one grandfathered legacy implementation of one of these features — init — for a script outside the ArabicShaping.txt standard, and that is the initial form of vowel signs ে and ৈ in Bengali/Assamese, which relies on shaping engine analysis on the beng and bng2 layout models. Note that if this script were shaped using the Universal Shaping Engine (using the putative bng3 layout model) use of init would be excluded, since USE can only apply analysis as defined in ArabicShaping.txt. So the initial form of these Bengali vowel signs would need to be handled using contextual GSUB lookups, as is the case for word-position variants in Latin.]

Jacob Casal · May 2019

@Christian Thalmann In my rush I didn’t think far through it enough but—and I mean no harshness here, it’s probably a concept I missed on my part—wouldn’t that not work either? I wasn’t very clear about the positions of /v and /v.alt in my posts: /v was already swashless and /v.alt swashy. If I changed the swashless /v to the swashy /v.alt when a letter preceded it then we would be in the opposite situation of before. Line breaks are fixed, but everything else reversed.

If I’m speaking off-base of what you were saying and misinterpreting you please correct me; thanks for the help.

In the meantime of sorting that out, however, I must give due kudos to @Ray Larabie with this comment. A shortcode for the medial, swashless /v “cheats” it into appearing at the beginning of a line when necessary. Not the ideal automatic, but an effective manual.

Example (pardon the rough appearance, they were quickly mocked up for testing):

Image: https://us.v-cdn.net/5019405/uploads/editor/54/khzy9y6s98rs.png

The code:

@allLatin = [\A \B \C \D \a \b \c];
#etc etc everything necessary gets listed in the class

lookup caltLatinContextualAlternateslookup0 {
lookupflag 0;
ignore sub @allLatin \v';
sub \v' by \v.alt;
sub \backslash' \v' \backslash' by \v;
} caltLatinContextualAlternateslookup0;
feature calt {
lookup caltLatinContextualAlternateslookup0;
} calt;

Christian Thalmann · May 2019

Jacob: Maybe I don't understand your reply correctly either, but it doesn't matter what your swashy and swashless glyphs are currently named. My point is that if the swashless glyph is the default and the swashy one is cycled in by CALT, rather than the other way around, then you don't get your problem in the first place, since CALT then doesn't have to recognize line breaks, only preceding letters, which is easy.

Peter Baker · May 2019

I do something like what Christian is suggesting with long s: have hist (and in my case also a Stylistic Set, since hist isn't available in Word) change all instances of s to long s, and then:

sub @LowerCase_f longs' by s ;<br>ignore sub longs' @AllLowerCaseExceptf ;<br>sub longs' by s ;

It's true that you no longer have the problem of an unwanted substitution at the beginning of a line, but you unfortunately have the inverse problem of an unwanted substitution at the end of a line (or run). I don't know what to do about this, aside from expecting users to tidy up manually.

And then you've got the absolutely horrible problem of calt being off by default in MS Word and 90% of your users not knowing to turn it on. If your swash v is the default, most users are going to see only that.

Nick Shinn · May 2019

If you want something users can’t mess with, use <rlig>

Howdy, Stranger!

Quick Links

Categories

Initial and Medial-Final Latin characters

Comments