"[feature x] table maps the character sequence" #347

simoncozens · 2020-01-27T17:46:31Z

"The 'ccmp' table maps the character sequence"

ccmp is about glyph composition and decomposition, and acts (like any other feature) on glyph sequences, not character sequences.

Document Details

⚠ Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

ID: 4da11419-0864-96ff-7f67-296680843fc1
Version Independent ID: 1d46dac1-d559-ce3e-2f6f-f47cf8cd9e0f
Content: Registered features, a-e - Typography
Content Source: typographydocs/opentype/spec/features_ae.md
Product: typography
GitHub Login: @PeterCon
Microsoft Alias: PeterCon

PeterCon · 2020-08-13T21:08:50Z

This may be a single instance of a generic issue: features that describe actions on characters rather than on glyphs (for characters). General review of feature descriptions for this issue might be a good idea.

PeterCon · 2020-09-07T21:01:16Z

Proposed changes for partial fix, covering features_ae.htm:

'afrc':

Recommended implementation: ~~The 'afrc' table maps sets of figures separated by slash (U+002F) or fraction (U+2044) characters to corresponding fraction glyphs in the font (GSUB lookup type 4).~~Sequences of default glyphs for figures (digits) separated by a slash (U+002F) or fraction slash (U+2044) are mapped to a corresponding ligature glyph for the fraction (GSUB lookup type 4).

Application interface: The application must define the full sequence of GIDs to be replaced. When the full sequence is found in the 'afrc' coverage table, the application passes the sequence to the 'afrc' table and gets a new GID in return.The application applies the feature to a complete sequence of figures separated by slash (U+002F) or fraction slash (U+2044).

'akhn':

Function: Preferentially substitutes default glyphs for a sequence of characters with a ligature. This substitution is done irrespective of any characters that may precede or follow the sequence.

'ccmp':

Example: In Syriac, the character 0x0732 is a combining mark that has a dot above AND a dot below the base character. To avoid multiple glyph variants to fit all base glyphs, the default glyph for the character is decomposed into two glyphs~~...~~: a dot above and a dot below. These two glyphs can then be correctly placed using GPOS. In Arabic it might be preferred to combine the shadda with fatha (0x0651, 0x064E) into a ligature before processing shapes. This allows the font vendor to do special handling of the mark combination when doing further processing without requiring larger contextual rules.

Recommended implementation: The 'ccmp' table maps the character sequence to its corresponding ligature (GSUB lookup type 4) or string of glyphs (GSUB lookup type 2). When using GSUB lookup type 4, sequences that are made up of larger number of glyphs must be placed before those that require fewer glyphs.Default glyphs for multiple characters are mapped to a single glyph (GSUB lookup type 4), or the default glyph for a character is mapped to a sequence of glyphs (GSUB lookup type 2).

'cjct':

Recommended implementation: The 'cjct' table maps the sequence of a consonant (the nominal form) followed by a virama (halant) followed by a second consonant (the nominal form or a half form) to the corresponding conjunct form (GSUB lookup type 4).A glyph for a consonant conjunct form is mapped from a sequence of two or more glyphs for consonants separated by virama (halant) (e.g., C H C, or C H C H C, etc.); or from sequences of glyphs involving simpler consonant conjuncts or conjoining consonant forms resulting from earlier substitutions from default consonant + virama + consonant (+ etc.) sequences (GSUB lookup type 4).

'dnom':

Recommended implementation: The 'dnom' table maps ~~sets of~~default glyphs for figures (digits) and related characters to corresponding numerator glyphs in the font (GSUB lookup type 1).

'dtls':

Recommended implementation: Single substitution (GSUB lookup type 1), for default glyphs of all dotted characters.

khaledhosny · 2020-09-07T21:43:20Z

Application interface: The application applies the feature to a complete sequence of figures separated by slash (U+002F) or fraction slash (U+2044)

What is the “Application” here? Are there any OpenType implementation that does the described behavior? What if the user wanted to apply the feature to things other than figures and slash?

PeterCon · 2020-09-07T21:45:46Z

What is the “Application” here?...

I think the same thing that is meant by "the application" in every other feature description: any application that supports the feature. Whether, in fact, there is an actual application that supports it is a separate question.

khaledhosny · 2020-09-07T21:51:30Z

That is still not clear to me, in the case of, say, web browser, is it the responsibility of the web page author, the browser engine, or the OpenType layout engine to enforce this constraint? Or is it a mere recommendation and applications are free not to implement it?

PeterCon · 2020-09-07T22:03:25Z

[are] applications... free not to implement it?

As far as the OT spec itself is concerned, applications are not required to support any particular features. (Just like Unicode conformance doesn't require applications to support any particular characters.)

But, I think you've raised a valid question (which had occurred to me while I made the revision): Does a user control what span of characters a discretionary feature is applied to? Or should an application do that? For most discretionary features, it can be entirely up to the user. In this particular case, it's less clear: what should happen if a user applied the feature only to "/", or only to digits after "/"? In this case, I think the intent of the vendor that registered the feature was for an app to apply the feature over a complete digits-slash-digits span. (Whether or not that's a useful model is a separate question.)

But this is all a separate topic from this issue. Please open a separate issue if you think it needs further discussion.

PeterCon · 2020-09-07T23:06:49Z

A final note on discussion of 'afrc': looking at 'frac', that seems to suggest that the user determines the string the feature is applied to, like most discretionary features. So, that might have also been the intent when 'afrc' was registered.

PeterCon · 2020-09-07T23:43:04Z

Proposed changes for partial fix, covering features_fj:

'frac':

Recommended implementation: The 'frac' table maps sets of figures separated by slash or fraction characters to corresponding fraction glyphs in the font. These may be precomposed fractions (GSUB lookup type 4) or arbitrary fractions (GSUB lookup type 1).Default glyphs for figures (digits) separated by a slash (U+002F) are mapped to variant forms (GSUB lookup type 1, or contextual substitutions that reference type 1 lookups), or sequences of such glyphs are mapped to ligature fraction glyphs (GSUB lookup type 4).

Application interface: The application must define the full sequence of GIDs to be replaced, based on user input (i.e. user selection determines the string’s delimitation). When the full sequence is found in the 'frac' coverage table, the application passes the sequence to the 'frac' table and gets a new GID in return. When the 'frac' table does not contain an exact match, the application performs two steps. First, it uses the 'numr' feature to replace figures (as used in the 'numr' coverage table) preceding the slash with numerators, and to replace the typographic slash character (U+002F) with the fraction slash character (U+2044). Second, it uses the 'dnom' feature to replace all remaining figures (as listed in the 'dnom' coverage table) with denominators.The application evaluates the sequence of default glyphs for the span of characters over which this feature has been applied by the user. If an associated lookup subtable matches the entire glyph sequence, the substitutions described in that lookup are applied to the glyph sequence directly. If no lookup subtable is found that matches the entire sequence, then the application does the following:

Apply the 'numr' feature to the figures within the span that precede the slash and also to the slash, and process associated lookups.

Apply the 'dnom' feature to the figures within the span that follow the slash and process associated lookups.

NOTE: The above changes are meant to pertain specifically to this issue. The draft attempts to retain the intent of the current 'frac' description. It is not attempting to address separate issues raised in #580.

'hlig':

Recommended implementation: ~~The 'hlig' table maps default ligatures and character combinations to corresponding historical ligatures (GSUB lookup type 1).~~Sequences of default glyphs for certain character combinations are mapped to corresponding historical ligature glyphs.

'hngl':

Recommended implementation: This table associates each hanja character in the font with one or more hangul characters. The manufacturer may choose to build two tables (one for each lookup type) or only one which uses lookup type 3 for all substitutions.Default glyphs for hanja characters are mapped to corresponding glyphs for Hangul syllables (GSUB lookup type 1); or default glyphs for hanja characters are mapped to two or more corresponding alternate glyphs for Hangul syllables (GSUB lookup type 3). As in any one-from-many substitution, alternates should be ordered consistently across a family, so that those alternates can work correctly when switching between family members.

PeterCon · 2020-09-08T00:32:59Z

Proposed changes for partial fix, covering features_ko:

'ljmo':

Recommended implementation: ~~The 'ljmo' table maps the sequence required to convert a series of jamos into its leading jamo form (GSUB lookup type 4)~~The default glyph for a leading jamo is mapped into an alternate form required for conjoining in a syllable (GSUB lookup type 1, or a contextual substitution referencing a type 1 lookup).

(This also includes a fix for a separate issue pertaining to jamo sequences and 'ccmp'.)

'lnum':

Recommended implementation: ~~The 'lnum' table maps each oldstyle figure, and any associated characters to the corresponding lining form~~Default glyphs for figures (digits) or other characters used in numbers are mapped to corresponding lining forms (GSUB lookup type 1). If the default figures are non-lining, they too are mapped to the corresponding lining form.

'numr':

Recommended implementation: The 'numr' table maps sets of figures and related characters to corresponding numerator glyphs in the font. It also maps the typographic slash (U+002F) to the fraction slash (U+2044). All mappings are one-to-one (GSUB lookup type 1).Default glyphs for figures (digits) or other characters used in numbers (grouping or decimal separators) are mapped to corresponding numerator glyphs; and the glyph for slash (U+002F) is mapped to a fraction slash (glyph for U+2044). Substitutions are one-to-one (GSUB lookup type 1).

'onum':

Recommended implementation: ~~The 'onum' table maps each lining figure, and any associated characters to the corresponding oldstyle form~~Default glyphs for figures (digits) or other characters used in numbers (grouping or decimal separators) are mapped to corresponding oldstyle forms (GSUB lookup type 1). If the default figures are non-lining, they too are mapped to the corresponding oldstyle form.

'ordn':

Recommended implementation: ~~The 'ordn' table maps various lowercase letters to corresponding ordinal forms in a chained context (GSUB lookup type 6), and the sequence No to the numero character (GSUB lookup type 4)~~Default glyphs for various lowercase letters are mapped to corresponding orginal forms using a chained-context substitution (GSUB lookup type 6); and the sequence of default glyphs for “No” are mapped to a numero ligature glyph (GSUB lookup type 4)..

'ornm':

Recommended implementation: ~~The 'ornm' table maps all ornaments in a font to the bullet character (GSUB lookup type 3) and each ornament in a font to a corresponding alphanumeric character (GUSB lookup type 1)~~All ornament glyphs are mapped from the default glyph of the bullet character (U+2022) as alternates (GSUB lookup type 3); or ornament glyphs are mapped from the default glyph of corresponding alphanumeric characters (GSUB lookup type 1). The manufacturer may choose to build two tables (one for each lookup type) or only one which uses lookup type 3 for all substitutions. As in any one-from-many substitution, alternates present in more than one face should be ordered consistently across a family, so that those alternates can work correctly when switching between family members.

PeterCon · 2020-09-08T01:16:36Z

Proposed changes for partial fix, covering features_pt:

'pcap':

Function: Some fonts contain an additional size of capital letters, shorter than the regular smallcaps and whimsically referred to as petite caps. Such forms are most likely to be found in designs with a small lowercase x-height, where they better harmonise with lowercase text than the taller smallcaps (for examples of petite caps, see the Emigre type families Mrs Eaves and Filosofia). This feature turns glyphs for lowercase characters into petite capitals. Forms related to petite capitals, such as specially designed figures, may be included.

'sinf':

Function: Replaces lining or oldstyle figures (digits) with inferior figures (smaller glyphs which sit lower than the standard baseline, primarily for chemical or mathematical notation). May also replace default glyphs for lowercase characters with alphabetic inferiors.

...

Recommended implementation: ~~The 'sinf' table maps figures to the corresponding inferior forms~~Default glyphs for figures (digits) are mapped to corresponding inferior forms (GSUB lookup type 1).

'smcp':

Function: This feature turns default glyphs for lowercase characters into small capitals. This corresponds to the common SC font layout. It is generally used for display lines set in Large & small caps, such as titles. Forms related to small capitals, such as oldstyle figures, may be included.

'stch':

Recommended implementation: ~~The 'stch' table maps the character to a set containing~~The default glyph for a character that requires streching is mapped to a sequence comprised of an odd number of corresponding glyphs (GSUB lookup type 2). The rendering engine reorders the last glyph from the substituted set to the end of the set of characters being enclosed. The remaining glyphs from the substituted set are positioned at the start of the set of characters being enclosed. Odd-numbered glyphs in the decomposition set are positioned so that they are distributed evenly over the width of the text being enclosed. Even-numbered glyphs in the decomposition set are repeated by the rendering engine so the width of the space between fixed, odd-numbered glyphs is filled by the spacing, even-numbered glyphs.

Application interface: ~~For GIDs found in the 'stch' coverage table, the application passes the sequence of GIDs to the table, and gets back the GIDs for the multiple substitution.~~For characters that require stretching, such as Syriac abbreviation mark (U+070F), the 'stch' feature is applied. If the default glyph for the character is in the coverage of an associated lookup subtable, the mapped glyph sequence is retrieved. The last glyph of the substitute sequence is reordered to the end of the sequence of glyphs to be enclosed or encompassed. The remaining glyphs from the substitution sequence are inserted before the sequence of glyphs to be enclosed. Odd-numbered glyphs in the substitution sequence are positioned so as to be distributed evenly over the width of text being enclosed. Even-numbered glyphs are repeated so that the spaces between the odd-numbered glyphs is filled.

'tjmo':

Recommended implementation: ~~The 'tjmo' table maps the sequence required to convert a series of jamos into its trailing jamo form (GSUB lookup type 4)~~The default glyph for a trailing jamo is mapped into an alternate form required for conjoining in a syllable (GSUB lookup type 1, or a contextual substitution referencing a type 1 lookup.

(This also includes a fix for a separate issue pertaining to jamo sequences and 'ccmp'.)

PeterCon · 2020-09-08T01:21:14Z

Proposed changes for partial fix, covering features_uz:

'vjmo':

Recommended implementation: ~~The 'vjmo' table maps the sequence required to convert a series of jamos into its vowel jamo form (GSUB lookup type 4)~~The default glyph for a vowel jamo is mapped into an alternate form required for conjoining in a syllable (GSUB lookup type 1, or a contextual substitutions that references a type 1 lookup).

(This also includes a fix for a separate issue pertaining to jamo sequences and 'ccmp'.)

PeterCon · 2020-09-08T01:22:25Z

Proposed changes for the next version in features_ae, features_fj, features_ko, features_pt and features_uz are given in five preceding comments.

NorbertLindenberg · 2020-09-08T03:55:35Z

The proposed new texts quite frequently use the phrase “default glyph”. If I understand correctly, this phrase means the glyph IDs that are obtained by mapping code points through cmap tables. In complex scripts it’s quite common that glyph IDs that are the result of applying one lookup become inputs for the next lookup. Therefore, it cannot be assumed that the inputs for lookups are always default glyphs.

PeterCon · 2020-09-08T04:07:47Z

@NorbertLindenberg : understood. Of the revised feature descriptions, it's only an issue particularly for 'cjct', and I made a point to work around that in that case.

In some cases—akhn, ljmo, ornm, stch, tjmo, vjmo—the intent really is to act on the default glyph for a character.

For most discretionary features, the prototypical case is to act on the default glyph mapped from the cmap, and I think it's not unreasonable to describe the "recommended" implementation that way. But perhaps in some of these cases it would also work not to mention "default".

NorbertLindenberg · 2020-09-08T04:09:24Z

What is the “Application” here?...

I think the same thing that is meant by "the application" in every other feature description: any application that supports the feature.

See bug #290.

PeterCon · 2020-09-08T04:21:07Z

Per review feedback, removed "default" in descriptions for the following features:

afrc
ccmp (recommended implementation)
dnom
dtls
frac
hlig
hngl
lnum
numr
onum
sinf
smcp

tiroj · 2020-09-08T16:50:20Z

On frac and afrc:

The run for application of the frac feature is usually user-selected, either manually or via markup. I have seen some fonts that use more complex contextual GSUB to try to identify fraction strings within a text, such that the frac feature could be applied to a whole document and correctly identify, by context, what are fractions and what are not. This has always seemed to me frightening and beyond what OTL contextual substitutions were designed to handle.

The afrc feature description presumes that the default form of fraction in a font will be the slash form, and the alternative will be the stacked form with horizontal bar. Would it be incorrect for a font to have default stacked fractions? Use of the term 'nut fraction' is perhaps too specific, since a nut fraction was historically so called because it was roughly an en width, but that presumes a single numerator and single denominator. A stacked fraction could have e.g. multiple denominators, and hence be wider than a nut.

The afrc feature presumes stacked fractions will be handled via a ligature substitution, probably because at the time no one had figured out a way to handle arbitrary fractions in stacked form (outside of TeX or other specialised math layout software and fonts). But I was nutso enough to try it.

tiroj · 2020-09-08T17:17:46Z

For most discretionary features, the prototypical case is to act on the default glyph mapped from the cmap, and I think it's not unreasonable to describe the "recommended" implementation that way. But perhaps in some of these cases it would also work not to mention "default".

I think that prototypical case is a bit Latin-centric, and even for such 'simple' scripts the output of preceding feature lookups sometimes needs to be accounted for. For complex cripts, by the time you get to discretionary features, orthographic unit shaping has already taken place, as has reordering, so the glyph run is a long way from the cmap output.

In developing GSUB, I tend to refer to 'input glyphs', which is the state of the glyph run at the point where the specific feature lookup is applied. This could be the cmap output, or the orthographic unit shaping output, or the output from any preceding feature lookup.

PeterCon added the OpenType spec label Aug 13, 2020

PeterCon assigned PeterCon and unassigned PeterCon Aug 25, 2020

PeterCon added the Priority 4 label Aug 25, 2020

PeterCon closed this as completed Sep 8, 2020

PeterCon mentioned this issue Sep 9, 2020

Contradictory statements on order of lookups #380

Closed

PeterCon changed the title ~~ccmp description~~ "[feature x] table maps the character sequence" Sep 13, 2020

PeterCon mentioned this issue Sep 13, 2020

'opbd' is weird #578

Closed

PeterCon added this to the OpenType 1.8.4 milestone Nov 17, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"[feature x] table maps the character sequence" #347

"[feature x] table maps the character sequence" #347

simoncozens commented Jan 27, 2020

PeterCon commented Aug 13, 2020

PeterCon commented Sep 7, 2020

khaledhosny commented Sep 7, 2020

PeterCon commented Sep 7, 2020

khaledhosny commented Sep 7, 2020

PeterCon commented Sep 7, 2020

PeterCon commented Sep 7, 2020

PeterCon commented Sep 7, 2020 •

edited

PeterCon commented Sep 8, 2020

PeterCon commented Sep 8, 2020

PeterCon commented Sep 8, 2020

PeterCon commented Sep 8, 2020

NorbertLindenberg commented Sep 8, 2020

PeterCon commented Sep 8, 2020 •

edited

NorbertLindenberg commented Sep 8, 2020

PeterCon commented Sep 8, 2020

tiroj commented Sep 8, 2020 •

edited

tiroj commented Sep 8, 2020

"[feature x] table maps the character sequence" #347

"[feature x] table maps the character sequence" #347

Comments

simoncozens commented Jan 27, 2020

Document Details

PeterCon commented Aug 13, 2020

PeterCon commented Sep 7, 2020

khaledhosny commented Sep 7, 2020

PeterCon commented Sep 7, 2020

khaledhosny commented Sep 7, 2020

PeterCon commented Sep 7, 2020

PeterCon commented Sep 7, 2020

PeterCon commented Sep 7, 2020 • edited

PeterCon commented Sep 8, 2020

PeterCon commented Sep 8, 2020

PeterCon commented Sep 8, 2020

PeterCon commented Sep 8, 2020

NorbertLindenberg commented Sep 8, 2020

PeterCon commented Sep 8, 2020 • edited

NorbertLindenberg commented Sep 8, 2020

PeterCon commented Sep 8, 2020

tiroj commented Sep 8, 2020 • edited

tiroj commented Sep 8, 2020

PeterCon commented Sep 7, 2020 •

edited

PeterCon commented Sep 8, 2020 •

edited

tiroj commented Sep 8, 2020 •

edited