Adobe predefinite slot for some glyphs and PUA

mauro sacchetto
mauro sacchetto Posts: 353
edited February 2020 in Font Technology
Adobe provides slots for: 1) oldstyle figures (from uniF730 to uniF739); 2) small caps (from uniF761 to uniF77A); 3) ligatures (from uniFB00 to uniFB06).
However in most fonts those glyphs are freely inserted in PUA. Not all, for in some cases ligatures are just in the predefined slots.
So: a) is it correct or by now obsolete to place those glyphs in the slots provided by Adobe, which it seems to me for some time advises against their use, and put them in PUA instead?
b) why instead in the case of ligatures do some fonts use the predefined slots, other ones put them in the PUA, and what is the best solution?

Thank you
ms

Comments

  • Adobe provides slots for: 1) oldstyle figures (from uniF730 to uniF739); 2) small caps (from uniF761 to uniF77A); 3) ligatures (from uniFB00 to uniFB06).
    However in most fonts those glyphs are freely inserted in PUA. Not all, for in some cases ligatures are just in the predefined slots.
    So: a) is it correct or by now obsolete to place those glyphs in the slots provided by Adobe, which it seems to me for some time advises against their use, and put them in PUA instead?
    b) why instead in the case of ligatures do some fonts use the predefined slots, other ones put them in the PUA, and what is the best solution?
    I'm confused about what you mean by “put them in the PUA instead”. All of the codepoints you mention above are already in the PUA.

    The question most frequently asked is whether to encode them at all — i.e. whether to assign them to the PUA or to leave them unencoded. While opinions on this vary, I think the majority these days recommend against assigning PUA codepoints.

    Also, note that the PUA code points used in the past by Adobe for small caps, ligatures, etc. are by no means standardized. Fonts from other vendors often assigned these same glyphs to entirely different PUA codepoints, so software cannot make any assumptions about what a particular PUA value is intended to represent.
  • Igor Freiberger
    Igor Freiberger Posts: 273
    edited February 2020
    All PUA positions are part of Unicode specification and are available to any use. Adobe does not provide them, they just use some slots for special situations.

    PUA use is not standardized. As André pointed, even Adobe does not use them in a solid way. Actually, the only standard for PUA I am aware of is the medieval set by MUFI.

    As a general suggestion, do not use PUA codes. You can have glyphs not coded in your font without any problem. These glyphs should be accessible through OpenType features, like small caps, discretionary ligatures, or alternate designs.

    You can search for John Hudson's posts about PUA and Unicode. John gave excellent explanations about the subject several times.
  • So ii's better to put all these glyphs in the slots from uni0030 on... Including all ligatures. Thank you
  • Thomas Phinney
    Thomas Phinney Posts: 2,888
    edited February 2020
    The codepoints at U+F7xx are indeed in the Private Use Area. That covers two of the three groups. HOWEVER, Adobe hasn’t used those PUA encodings for new fonts for at least a decade now, I think.

    So, current practice, for fonts aimed at professional users, is to not encode anything in the PUA that can reasonably be handled by OpenType features.

    For fonts aimed at non-professional users, there are a variety of opinions, including the possibility of having a bogus encoding that puts special characters in "normal" character slots. Or having extra fonts with small caps in lowercase slots and oldstyle figures in regular number slots.

    BUT, the codepoints at U+FB00–FF06 are standard Unicode codepoints for f-ligatures, due to legacy encodings. The latter have been in Unicode since 1993. However, one very rarely encounters such “hardcoded” ligatures in incoming text streams. It is a judgment call whether to bother using those slots for those particular ligatures, nowadays (versus leaving them unencoded—or even possibly doing both, but that is probably overkill).

    Note that using the FBXX codepoints does not replace having the 'liga' and 'dlig' features for those same ligatures.

  • mauro sacchetto
    mauro sacchetto Posts: 353
    edited February 2020
    thanks for the extended explanation. what you say clarifies the widespread use of u + Fb00-FF06 slots for f-ligatures. I know you need the appropriate lookups, which I have already created
  • So ii's better to put all these glyphs in the slots from uni0030 on... Including all ligatures. Thank you
    No, it's better to not put them in any slots at all (assuming by slot you mean code point). Apart from the PUA code points, all unicode code points either have well-defined meanings or are reserved for future use.

    As long as your small caps, ligatures, etc. are all accessible via opentype features, they don’t need to have any unicode value.
  • Helmut Wollmersdorfer
    edited February 2020
    If you create a precomposed glyph which has a precomposed Unicode code point, then define this code point for the glyph, and (maybe) provide a feature rule it.

    For the Latin script these ligatures are defined in Unicode 12:

    $ uni s ligature | grep -i latin

    'IJ'  U+0132 LATIN CAPITAL LIGATURE IJ (Uppercase_Letter)

    'ij'  U+0133  LATIN SMALL LIGATURE IJ (Lowercase_Letter)

    'Œ'  U+0152 LATIN CAPITAL LIGATURE OE (Uppercase_Letter)

    'œ'  U+0153 LATIN SMALL LIGATURE OE (Lowercase_Letter)

    'ff'  U+FB00 LATIN SMALL LIGATURE FF (Lowercase_Letter)

    'fi'  U+FB01 LATIN SMALL LIGATURE FI (Lowercase_Letter)

    'fl'  U+FB02 LATIN SMALL LIGATURE FL (Lowercase_Letter)

    'ffi'  U+FB03 LATIN SMALL LIGATURE FFI (Lowercase_Letter)

    'ffl'  U+FB04 LATIN SMALL LIGATURE FFL (Lowercase_Letter)

    'ſt'  U+FB05 LATIN SMALL LIGATURE LONG S T (Lowercase_Letter)

    'st'  U+FB06 LATIN SMALL LIGATURE ST (Lowercase_Letter)


    If your precomposed glyph, e. g. c_h ligature, doesn't have a code point in Unicode, then don't define a code point for it in your font.
  • I wouldn’t consider Œ, œ, IJ, or ij as ligatures — these are digraphs which represent a single character, not two ligated characters.
  • Also: Æ/æ.
  • I wouldn’t consider Œ, œ, IJ, or ij as ligatures — these are digraphs which represent a single character, not two ligated characters.
    This might be so in some languages, but Unicode (mis?)defined them as ligatures. And some fonts have contextual rules for Oe -> Œ etc.
  • The diphthongs ae and oe, which in classical Latin according to the "lectio restituta" were pronounced as two distinct vowels, in ecclesiastical Latin and therefore of medieval and modern age were pronounced instead as a single vowel. I'm not sure, but this may have also led to the collapse of the two glyphs
  • John Hudson
    John Hudson Posts: 3,190
    Latin may be a unique case for ae and oe handling. In other languages regularly using æ and œ these tend to be non-optional forms and are often considered distinct letters. In classical Latin, these would have been written AE and OE, and modern classicists now favour ae and oe. But ecclesiastical Latin developed æ and œ and one still finds those used in some traditionalist Roman Catholic publishing. So Latin would be the one language for which I might consider ae -> æ and oe -> oe ligation valid, so as to avoid having to re-encode text to get those forms.