Revised definition of isol, init, medi, and fina layout features

John Hudson
John Hudson Posts: 3,535
edited February 2017 in Font Technology
Last year, I mentioned in a couple of threads that following OTWG discussions, I'd been asked to prepare revised definitions for the four OpenType Layout features Isolated Forms <isol>, Initial Forms <init>, Medial Forms <medi>, and Terminal Forms <fina>. This was to replace the very old and mostly speculative text, which suggested these features would be applied based on analysis of word position, with text that described how these features are actually applied by shaping engines, which is explicitly based on Unicode joining properties of the <ArabicShaping.txt> standard.

In all the excitement about variable fonts, it may have slipped notice that as of OpenType v1.8, these revisions have been incorporated into the format. Anyone supporting these features in fonts, layout engines, or applications, should familiarise themselves with the new definitions. Note that the change means that these features should expressly not be used for word-positional variants. In my proposal, I did discuss the possibility of defining new features for that purpose, but decided not to propose them at this time. Such independent word-positional features would have the benefit of being available to all scripts, including those that also have joining behaviours, but present challenges for layout engines that need to be worked out before the features would be viable.

Thank you to the people who reviewed and improved my proposal, especially Jonathan Kew, Vladimir Levantovsky, and Peter Constable.

Updated registered feature descriptions:
<isol> Isolated Forms
<init> Initial Forms
<medi> Medial Forms
<fina> Terminal Forms


[Note that the three Syriac-specific Medial Forms #2 <med2>, Terminal Forms #2 <fin2>, and Terminal Forms #3 <fin3> are not affected by this change.]

Comments

  • John Hudson
    John Hudson Posts: 3,535
    Further on the <isol> feature:

    The example section in the Isolated Forms <isol> feature description may be confusing, even ignoring the small typo that mentions the <init> feature instead of the <typo>. This should read:

    Example: In an Arabic-script font, the application would apply the 'init' feature to the letter ARABIC LETTER HEH (U+0647 “ه”) when not adjacent to any joining character, thereby potentially replacing the default “ه” glyph with a special, isolated form (likely, a contextual and language-specific substitution, substituting one isolated form for another).

    The first thing to note about the Isolated Forms feature is that most fonts for joining scripts won't need this feature at all, because the isolated forms will be encoded as the default glyphs of the Unicode characters. That being the case, one can make e.g. an Arabic font that does not contain an <isol> feature.

    The existence of the <isol> feature, and its processing by shaping engines for joining scripts, enables two things:

    1. The possibility of making fonts in different ways, in which the isolated forms are not the default encoded glyphs. While this might be rare, it provides a flexible approach for font makers, and is perhaps a useful reminder that, with appropriate shaping engine support in place, the default encoded glyph for a character needn't look like the Unicode chart exemplar.*

    2. Exceptional behaviour of isolated forms involving additional contextual information during joining feature application. This is the sort of thing that is described in the feature example. For some languages using the Arabic script—notably Arabic itself—there are two possible forms of isolated hā’, which may be conventionally—and hence contextually—used in different situations. Some fonts may implement this as a contextual substitution in the <isol> feature, so that one form is the default non-joining form used within words, while the second form is used only in true isolation, i.e. when not adjacent to another letter. This convention helps avoid potential confusion between the default non-joining form and the Arab numeral 5. [See attached image.]

    Confusion may arise due to terminology. The name of the feature 'Isolated Forms', suggests isolation, but in fact what the feature deals with is characters in non-joining circumstances. This could be, for instance, and Arabic letter at the end of a word,preceded by a letter that only joins on the right. This is distinguished in the contextual substitution described from true isolation (non-adjacency) at the glyph level.

    Note also that the contextual substitution described could also be implemented elsewhere in the OpenType Lookup, most obviously in the Contextual Alternates <calt> feature, if one wanted to provide for users to disable to the substitution, or in the Required Contextual Alternates <rclt> feature if not. When the <isol> feature was first registered and the first fonts were made using it in this way—I believe Microsoft's Arabic Typesetting font might have been the first—the <rclt> feature did not exist. Putting the contextual substitution for the variant isolated 
    hā’ in the <isol> feature enabled it to be a required substitution independent of <calt>.

    _____

    *cf. A Soyombo test case for the Universal Shaping Engine, in which the default glyphs are decidedly not the typical forms of the characters. Soyombo is not a joining script in the ArabicShaping.txt sense, but that project is a good example of a font built in a way that is completely dependent on the shaping engine for even the most basically legible representation of text.