Ordinal/superscript feature for French





If the above characters are included in a font, how – and to what extent – should OpenType be involved in rendering them correctly for French? Would it make sense to use the ordn feature? Is this a stylistic choice, on par with raised “st” in English 1st? I assume it is not preferred default behaviour, but I could be wrong.

Comments

  • Chris LozosChris Lozos Posts: 962
    Makes sense to me. Just include a sub for all the included sups.
  • It is the preferred default behaviour in typographic guides.
    If the above characters are included in a font
    You do mean glyphs, right? One shouldn’t use the characters 1ᵉʳ, but should use instead the ordinal feature on 1er to get those glyphs.
  • Thanks, Denis. I think I get the terminology wrong a lot of the time re. glyphs/characters. Would it be right to say character refers to a linguistic unit and glyph refers to a designed instance of that unit?

    It is the preferred default behaviour in typographic guides.

    Meaning it could be implemented as a calt feature that is always on by default in French? I would think it should be an active choice.

  • Yeah, that’s the difference between character and glyph one makes to avoid the confusion.

    I wouldn’t put these in a calt feature always on by default in French. The user, or a smart application, should decide when to activate the ordn feature.
  • edited May 2016
    This led me down a rabbit hole. The ordn feature commonly substitutes a for ª and o for º, but these ordinals have their own unicode value. Isn’t this really a hack that is disrupting the data stream?

    I am inclined to substitute letters (ex. ‘egrave’, U+00E8) for unencoded character variants (ex. ‘egrave.sups’, no unicode) in the sups feature, remove the ordn feature, and let users input ª and º manually.





  • Thank you, Denis.


    As for the rabbit hole, different usage of superior and inferior figures and letters seem to require different implementation. Some require a Unicode value. Examples include:

    Chemical formulas
    Pb⁴⁺
    C₆H₁₂O₆

    Math/measurements
    15 cm³
    60 m²

    Lingustics
    … scholarly outstripped long-vowel system (ē=h₁, ā=eh₂, ō=h₃).
    aidʰ-stu-s
    h₁ h₂ h₃ or hₑ hₐ hₒ



    Others seems more like “stylistic” variants:

    Note markers
    … as stated by Ulysses S. Grant.2

    Ordinals
    1st
    1a
    2éme

    Abbrevations
    Mlle
    Mgr


    If I wanted to cover all cases, would it be best to have both encoded and unencoded superiors/inferiors? In your sources I found some accented raised letters as well: í é ó How would I begin to decide what to include and not?

    Btw, here’s another, related, case from Proto Indo European. I have no idea how this might be encoded, but it probably should be, because if I copy this text, the semantic meaning is lost.









  • John HudsonJohn Hudson Posts: 1,144
    edited May 2016
    The Proto Indo-European transcription example should presumably be encoded as

    ch=gᵘ̯ʰ

    i.e. with superscript characters, not <sups> feature styling, and would rely on the font to position the subscript inverted breve combining mark appropriately (possibly with variant mark form for added refinement).

    [The fallback font used on my system to display those characters handles it well, albeit with too much space between the superscript letters.]
  • edited May 2016
    Ah, the U+032F, and then possibly a small size variant activated contextually + anchors on the superscripted characters. Did I understand that correctly?

    As for the other question –
    If I wanted to cover all cases, would it be best to have both encoded and unencoded superiors/inferiors?
    – do you have any thoughts?




  • John HudsonJohn Hudson Posts: 1,144
    Ah, the U+032F, and then possibly a small size variant activated contextually + anchors on the superscripted characters. Did I understand that correctly? 

    Correct.

    If I wanted to cover all cases, would it be best to have both encoded and unencoded superiors/inferiors?

    Yes, to the all encoded super/subscript characters if you want full coverage. For unencoded superscript variants, the most I've ever been asked for is A–Z, a–z[è], 0–1, Α–Ω, α–ω (for Brill), and only 0–1 for subscripts.

    If you're not fussed about PDF text reconstruction, you can use the same glyphs for both encoded and unencoded.

  • edited May 2016
    Thanks, John. I am still a little confused. You have been vocal in the past about the problems with substituting different semantic units for each other, like for example x by ×. Why is this different?

  • John HudsonJohn Hudson Posts: 1,144
    edited May 2016
    Using the same glyph for an encoded superscript and a <sups> superscript isn't confusing the semantics except in the unique case of Acrobat text reconstruction from glyph names in printstream-distilled PDFs. It's not messing with the semantics at the text creation level, and it isn't misrepresenting the encoded characters (since the visual representation of encoded and <sups> superscripts is the same). It's only an issue if someone is creating a PDF in a way that doesn't preserve the original text encoding. Some customers will care about this, but many others won't. Brill wanted clean text reconstruction, so their fonts contain duplicate superscript glyphs for encoded and unencoded.
Sign In or Register to comment.