Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"[feature x] table maps the character sequence" #347

Closed
simoncozens opened this issue Jan 27, 2020 — with docs.microsoft.com · 18 comments
Closed

"[feature x] table maps the character sequence" #347

simoncozens opened this issue Jan 27, 2020 — with docs.microsoft.com · 18 comments

Comments

Copy link

"The 'ccmp' table maps the character sequence"

ccmp is about glyph composition and decomposition, and acts (like any other feature) on glyph sequences, not character sequences.


Document Details

Do not edit this section. It is required for docs.microsoft.com ➟ GitHub issue linking.

@PeterCon
Copy link
Collaborator

This may be a single instance of a generic issue: features that describe actions on characters rather than on glyphs (for characters). General review of feature descriptions for this issue might be a good idea.

@PeterCon PeterCon assigned PeterCon and unassigned PeterCon Aug 25, 2020
@PeterCon
Copy link
Collaborator

PeterCon commented Sep 7, 2020

Proposed changes for partial fix, covering features_ae.htm:

'afrc':

Recommended implementation: The 'afrc' table maps sets of figures separated by slash (U+002F) or fraction (U+2044) characters to corresponding fraction glyphs in the font (GSUB lookup type 4).Sequences of default glyphs for figures (digits) separated by a slash (U+002F) or fraction slash (U+2044) are mapped to a corresponding ligature glyph for the fraction (GSUB lookup type 4).

Application interface: The application must define the full sequence of GIDs to be replaced. When the full sequence is found in the 'afrc' coverage table, the application passes the sequence to the 'afrc' table and gets a new GID in return.The application applies the feature to a complete sequence of figures separated by slash (U+002F) or fraction slash (U+2044).

'akhn':

Function: Preferentially substitutes default glyphs for a sequence of characters with a ligature. This substitution is done irrespective of any characters that may precede or follow the sequence.

'ccmp':

Example: In Syriac, the character 0x0732 is a combining mark that has a dot above AND a dot below the base character. To avoid multiple glyph variants to fit all base glyphs, the default glyph for the character is decomposed into two glyphs...: a dot above and a dot below. These two glyphs can then be correctly placed using GPOS. In Arabic it might be preferred to combine the shadda with fatha (0x0651, 0x064E) into a ligature before processing shapes. This allows the font vendor to do special handling of the mark combination when doing further processing without requiring larger contextual rules.

Recommended implementation: The 'ccmp' table maps the character sequence to its corresponding ligature (GSUB lookup type 4) or string of glyphs (GSUB lookup type 2). When using GSUB lookup type 4, sequences that are made up of larger number of glyphs must be placed before those that require fewer glyphs.Default glyphs for multiple characters are mapped to a single glyph (GSUB lookup type 4), or the default glyph for a character is mapped to a sequence of glyphs (GSUB lookup type 2).

'cjct':

Recommended implementation: The 'cjct' table maps the sequence of a consonant (the nominal form) followed by a virama (halant) followed by a second consonant (the nominal form or a half form) to the corresponding conjunct form (GSUB lookup type 4).A glyph for a consonant conjunct form is mapped from a sequence of two or more glyphs for consonants separated by virama (halant) (e.g., C H C, or C H C H C, etc.); or from sequences of glyphs involving simpler consonant conjuncts or conjoining consonant forms resulting from earlier substitutions from default consonant + virama + consonant (+ etc.) sequences (GSUB lookup type 4).

'dnom':

Recommended implementation: The 'dnom' table maps sets ofdefault glyphs for figures (digits) and related characters to corresponding numerator glyphs in the font (GSUB lookup type 1).

'dtls':

Recommended implementation: Single substitution (GSUB lookup type 1), for default glyphs of all dotted characters.

@khaledhosny
Copy link

Application interface: The application applies the feature to a complete sequence of figures separated by slash (U+002F) or fraction slash (U+2044)

What is the “Application” here? Are there any OpenType implementation that does the described behavior? What if the user wanted to apply the feature to things other than figures and slash?

@PeterCon
Copy link
Collaborator

PeterCon commented Sep 7, 2020

What is the “Application” here?...

I think the same thing that is meant by "the application" in every other feature description: any application that supports the feature. Whether, in fact, there is an actual application that supports it is a separate question.

@khaledhosny
Copy link

That is still not clear to me, in the case of, say, web browser, is it the responsibility of the web page author, the browser engine, or the OpenType layout engine to enforce this constraint? Or is it a mere recommendation and applications are free not to implement it?

@PeterCon
Copy link
Collaborator

PeterCon commented Sep 7, 2020

[are] applications... free not to implement it?

As far as the OT spec itself is concerned, applications are not required to support any particular features. (Just like Unicode conformance doesn't require applications to support any particular characters.)

But, I think you've raised a valid question (which had occurred to me while I made the revision): Does a user control what span of characters a discretionary feature is applied to? Or should an application do that? For most discretionary features, it can be entirely up to the user. In this particular case, it's less clear: what should happen if a user applied the feature only to "/", or only to digits after "/"? In this case, I think the intent of the vendor that registered the feature was for an app to apply the feature over a complete digits-slash-digits span. (Whether or not that's a useful model is a separate question.)

But this is all a separate topic from this issue. Please open a separate issue if you think it needs further discussion.

@PeterCon
Copy link
Collaborator

PeterCon commented Sep 7, 2020

A final note on discussion of 'afrc': looking at 'frac', that seems to suggest that the user determines the string the feature is applied to, like most discretionary features. So, that might have also been the intent when 'afrc' was registered.

@PeterCon
Copy link
Collaborator

PeterCon commented Sep 7, 2020

Proposed changes for partial fix, covering features_fj:

'frac':

Recommended implementation: The 'frac' table maps sets of figures separated by slash or fraction characters to corresponding fraction glyphs in the font. These may be precomposed fractions (GSUB lookup type 4) or arbitrary fractions (GSUB lookup type 1).Default glyphs for figures (digits) separated by a slash (U+002F) are mapped to variant forms (GSUB lookup type 1, or contextual substitutions that reference type 1 lookups), or sequences of such glyphs are mapped to ligature fraction glyphs (GSUB lookup type 4).

Application interface: The application must define the full sequence of GIDs to be replaced, based on user input (i.e. user selection determines the string’s delimitation). When the full sequence is found in the 'frac' coverage table, the application passes the sequence to the 'frac' table and gets a new GID in return. When the 'frac' table does not contain an exact match, the application performs two steps. First, it uses the 'numr' feature to replace figures (as used in the 'numr' coverage table) preceding the slash with numerators, and to replace the typographic slash character (U+002F) with the fraction slash character (U+2044). Second, it uses the 'dnom' feature to replace all remaining figures (as listed in the 'dnom' coverage table) with denominators.The application evaluates the sequence of default glyphs for the span of characters over which this feature has been applied by the user. If an associated lookup subtable matches the entire glyph sequence, the substitutions described in that lookup are applied to the glyph sequence directly. If no lookup subtable is found that matches the entire sequence, then the application does the following:

  • Apply the 'numr' feature to the figures within the span that precede the slash and also to the slash, and process associated lookups.
  • Apply the 'dnom' feature to the figures within the span that follow the slash and process associated lookups.

NOTE: The above changes are meant to pertain specifically to this issue. The draft attempts to retain the intent of the current 'frac' description. It is not attempting to address separate issues raised in #580.

'hlig':

Recommended implementation: The 'hlig' table maps default ligatures and character combinations to corresponding historical ligatures (GSUB lookup type 1).Sequences of default glyphs for certain character combinations are mapped to corresponding historical ligature glyphs.

'hngl':

Recommended implementation: This table associates each hanja character in the font with one or more hangul characters. The manufacturer may choose to build two tables (one for each lookup type) or only one which uses lookup type 3 for all substitutions.Default glyphs for hanja characters are mapped to corresponding glyphs for Hangul syllables (GSUB lookup type 1); or default glyphs for hanja characters are mapped to two or more corresponding alternate glyphs for Hangul syllables (GSUB lookup type 3). As in any one-from-many substitution, alternates should be ordered consistently across a family, so that those alternates can work correctly when switching between family members.

@PeterCon
Copy link
Collaborator

PeterCon commented Sep 8, 2020

Proposed changes for partial fix, covering features_ko:

'ljmo':

Recommended implementation: The 'ljmo' table maps the sequence required to convert a series of jamos into its leading jamo form (GSUB lookup type 4)The default glyph for a leading jamo is mapped into an alternate form required for conjoining in a syllable (GSUB lookup type 1, or a contextual substitution referencing a type 1 lookup).

(This also includes a fix for a separate issue pertaining to jamo sequences and 'ccmp'.)

'lnum':

Recommended implementation: The 'lnum' table maps each oldstyle figure, and any associated characters to the corresponding lining formDefault glyphs for figures (digits) or other characters used in numbers are mapped to corresponding lining forms (GSUB lookup type 1). If the default figures are non-lining, they too are mapped to the corresponding lining form.

'numr':

Recommended implementation: The 'numr' table maps sets of figures and related characters to corresponding numerator glyphs in the font. It also maps the typographic slash (U+002F) to the fraction slash (U+2044). All mappings are one-to-one (GSUB lookup type 1).Default glyphs for figures (digits) or other characters used in numbers (grouping or decimal separators) are mapped to corresponding numerator glyphs; and the glyph for slash (U+002F) is mapped to a fraction slash (glyph for U+2044). Substitutions are one-to-one (GSUB lookup type 1).

'onum':

Recommended implementation: The 'onum' table maps each lining figure, and any associated characters to the corresponding oldstyle formDefault glyphs for figures (digits) or other characters used in numbers (grouping or decimal separators) are mapped to corresponding oldstyle forms (GSUB lookup type 1). If the default figures are non-lining, they too are mapped to the corresponding oldstyle form.

'ordn':

Recommended implementation: The 'ordn' table maps various lowercase letters to corresponding ordinal forms in a chained context (GSUB lookup type 6), and the sequence No to the numero character (GSUB lookup type 4)Default glyphs for various lowercase letters are mapped to corresponding orginal forms using a chained-context substitution (GSUB lookup type 6); and the sequence of default glyphs for “No” are mapped to a numero ligature glyph (GSUB lookup type 4)..

'ornm':

Recommended implementation: The 'ornm' table maps all ornaments in a font to the bullet character (GSUB lookup type 3) and each ornament in a font to a corresponding alphanumeric character (GUSB lookup type 1)All ornament glyphs are mapped from the default glyph of the bullet character (U+2022) as alternates (GSUB lookup type 3); or ornament glyphs are mapped from the default glyph of corresponding alphanumeric characters (GSUB lookup type 1). The manufacturer may choose to build two tables (one for each lookup type) or only one which uses lookup type 3 for all substitutions. As in any one-from-many substitution, alternates present in more than one face should be ordered consistently across a family, so that those alternates can work correctly when switching between family members.

@PeterCon
Copy link
Collaborator

PeterCon commented Sep 8, 2020

Proposed changes for partial fix, covering features_pt:

'pcap':

Function: Some fonts contain an additional size of capital letters, shorter than the regular smallcaps and whimsically referred to as petite caps. Such forms are most likely to be found in designs with a small lowercase x-height, where they better harmonise with lowercase text than the taller smallcaps (for examples of petite caps, see the Emigre type families Mrs Eaves and Filosofia). This feature turns glyphs for lowercase characters into petite capitals. Forms related to petite capitals, such as specially designed figures, may be included.

'sinf':

Function: Replaces lining or oldstyle figures (digits) with inferior figures (smaller glyphs which sit lower than the standard baseline, primarily for chemical or mathematical notation). May also replace default glyphs for lowercase characters with alphabetic inferiors.

...

Recommended implementation: The 'sinf' table maps figures to the corresponding inferior formsDefault glyphs for figures (digits) are mapped to corresponding inferior forms (GSUB lookup type 1).

'smcp':

Function: This feature turns default glyphs for lowercase characters into small capitals. This corresponds to the common SC font layout. It is generally used for display lines set in Large & small caps, such as titles. Forms related to small capitals, such as oldstyle figures, may be included.

'stch':

Recommended implementation: The 'stch' table maps the character to a set containingThe default glyph for a character that requires streching is mapped to a sequence comprised of an odd number of corresponding glyphs (GSUB lookup type 2). The rendering engine reorders the last glyph from the substituted set to the end of the set of characters being enclosed. The remaining glyphs from the substituted set are positioned at the start of the set of characters being enclosed. Odd-numbered glyphs in the decomposition set are positioned so that they are distributed evenly over the width of the text being enclosed. Even-numbered glyphs in the decomposition set are repeated by the rendering engine so the width of the space between fixed, odd-numbered glyphs is filled by the spacing, even-numbered glyphs.

Application interface: For GIDs found in the 'stch' coverage table, the application passes the sequence of GIDs to the table, and gets back the GIDs for the multiple substitution.For characters that require stretching, such as Syriac abbreviation mark (U+070F), the 'stch' feature is applied. If the default glyph for the character is in the coverage of an associated lookup subtable, the mapped glyph sequence is retrieved. The last glyph of the substitute sequence is reordered to the end of the sequence of glyphs to be enclosed or encompassed. The remaining glyphs from the substitution sequence are inserted before the sequence of glyphs to be enclosed. Odd-numbered glyphs in the substitution sequence are positioned so as to be distributed evenly over the width of text being enclosed. Even-numbered glyphs are repeated so that the spaces between the odd-numbered glyphs is filled.

'tjmo':

Recommended implementation: The 'tjmo' table maps the sequence required to convert a series of jamos into its trailing jamo form (GSUB lookup type 4)The default glyph for a trailing jamo is mapped into an alternate form required for conjoining in a syllable (GSUB lookup type 1, or a contextual substitution referencing a type 1 lookup.

(This also includes a fix for a separate issue pertaining to jamo sequences and 'ccmp'.)

@PeterCon
Copy link
Collaborator

PeterCon commented Sep 8, 2020

Proposed changes for partial fix, covering features_uz:

'vjmo':

Recommended implementation: The 'vjmo' table maps the sequence required to convert a series of jamos into its vowel jamo form (GSUB lookup type 4)The default glyph for a vowel jamo is mapped into an alternate form required for conjoining in a syllable (GSUB lookup type 1, or a contextual substitutions that references a type 1 lookup).

(This also includes a fix for a separate issue pertaining to jamo sequences and 'ccmp'.)

@PeterCon
Copy link
Collaborator

PeterCon commented Sep 8, 2020

Proposed changes for the next version in features_ae, features_fj, features_ko, features_pt and features_uz are given in five preceding comments.

@PeterCon PeterCon closed this as completed Sep 8, 2020
@NorbertLindenberg
Copy link

The proposed new texts quite frequently use the phrase “default glyph”. If I understand correctly, this phrase means the glyph IDs that are obtained by mapping code points through cmap tables. In complex scripts it’s quite common that glyph IDs that are the result of applying one lookup become inputs for the next lookup. Therefore, it cannot be assumed that the inputs for lookups are always default glyphs.

@PeterCon
Copy link
Collaborator

PeterCon commented Sep 8, 2020

@NorbertLindenberg : understood. Of the revised feature descriptions, it's only an issue particularly for 'cjct', and I made a point to work around that in that case.

In some cases—akhn, ljmo, ornm, stch, tjmo, vjmo—the intent really is to act on the default glyph for a character.

For most discretionary features, the prototypical case is to act on the default glyph mapped from the cmap, and I think it's not unreasonable to describe the "recommended" implementation that way. But perhaps in some of these cases it would also work not to mention "default".

@NorbertLindenberg
Copy link

What is the “Application” here?...

I think the same thing that is meant by "the application" in every other feature description: any application that supports the feature.

See bug #290.

@PeterCon
Copy link
Collaborator

PeterCon commented Sep 8, 2020

Per review feedback, removed "default" in descriptions for the following features:

  • afrc
  • ccmp (recommended implementation)
  • dnom
  • dtls
  • frac
  • hlig
  • hngl
  • lnum
  • numr
  • onum
  • sinf
  • smcp

@tiroj
Copy link

tiroj commented Sep 8, 2020

On frac and afrc:

The run for application of the frac feature is usually user-selected, either manually or via markup. I have seen some fonts that use more complex contextual GSUB to try to identify fraction strings within a text, such that the frac feature could be applied to a whole document and correctly identify, by context, what are fractions and what are not. This has always seemed to me frightening and beyond what OTL contextual substitutions were designed to handle.

The afrc feature description presumes that the default form of fraction in a font will be the slash form, and the alternative will be the stacked form with horizontal bar. Would it be incorrect for a font to have default stacked fractions? Use of the term 'nut fraction' is perhaps too specific, since a nut fraction was historically so called because it was roughly an en width, but that presumes a single numerator and single denominator. A stacked fraction could have e.g. multiple denominators, and hence be wider than a nut.

The afrc feature presumes stacked fractions will be handled via a ligature substitution, probably because at the time no one had figured out a way to handle arbitrary fractions in stacked form (outside of TeX or other specialised math layout software and fonts). But I was nutso enough to try it.

@tiroj
Copy link

tiroj commented Sep 8, 2020

For most discretionary features, the prototypical case is to act on the default glyph mapped from the cmap, and I think it's not unreasonable to describe the "recommended" implementation that way. But perhaps in some of these cases it would also work not to mention "default".

I think that prototypical case is a bit Latin-centric, and even for such 'simple' scripts the output of preceding feature lookups sometimes needs to be accounted for. For complex cripts, by the time you get to discretionary features, orthographic unit shaping has already taken place, as has reordering, so the glyph run is a long way from the cmap output.

In developing GSUB, I tend to refer to 'input glyphs', which is the state of the glyph run at the point where the specific feature lookup is applied. This could be the cmap output, or the orthographic unit shaping output, or the output from any preceding feature lookup.

@PeterCon PeterCon changed the title ccmp description "[feature x] table maps the character sequence" Sep 13, 2020
@PeterCon PeterCon added this to the OpenType 1.8.4 milestone Nov 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants