Digital Greek Typography is Broken

2»

Comments

  • Some applications will normalize when using copy-paste or during other processes. It’s likely this was less frequent a couple of decades ago, this may not have been an apparent issue back then.

    Having U+0387 ANO TELEIA canonically normalized to U+00B7 MIDDLE DOT, or U+037E GREEK QUESTION MARK canonically normalized to U+003B SEMICOLON, means they should look the same anyway, as users should expect to be able to use them interchangeably.
    But, like the U+2019 RIGHT SINGLE QUOTATION MARK, they can sometimes look distinct in Greek than in Latin. It’s not that they must look distinct but that they sometimes do, for exemple if one uses an apla style for Greek and a modern style for Latin.

    Having smart font features that substitute the Latin glyphs for the Greek glyphs when in a Greek text run may be an adequate solution but it is not universal, neither in implementations (very few fonts do a substitution when they are different) nor in some contexts (a word in one script in text in the other script next to one of those common characters). There’s no clear solution besides using distinct fonts for Greek and Latin, when that is possible.
  • John Savard
    John Savard Posts: 1,148
    The Greek question mark is an interesting case. This also has a canonical decomposition, from U+037E GREEK QUESTION MARK to U+003B SEMICOLON.

    The ATD3 presentation and the petition both suggest that this is a problem because it prevents a disinct form being used for the Greek question mark. As with ano teleia, a grek locl glyph can be implemented, and should work so long as the script=common property of the semicolon means it is rolled into the adjacent Greek glyph run for OTL processing. And unlike ano teleia vs middle dot, I think there is no context in which the common semicolon might be used distinctively in Greek text.

    But I am also wondering what a distinct form of Greek question mark would look like? When does it not have the same shape as the semicolon?

    It's certainly possible that a typeface could include a Greek question mark which, while it still has the same general shape as a semicolon, is fatter or more cursive than a semicolon would be in a similar typeface for a Latin-alphabet language.
    This would be a problem if the typeface included both Greek and Latin characters, and therefore was intended to be used for both Greek text and, say, computer programs written in a language that used the Latin alphabet - and it did have a separate glyph defined for the semicolon. (In fact, a Greek-language textbook on programming in Pascal, say, would be an example of a context where the common semicolon and the Greek question mark are used distinctively within a Greek document, although, since code samples are set off distinctively from the actual text, it's not quite a case of "in Greek text".)
    I mean, it's not as if capital alpha gets normalized to Latin Capital A as a default decomposition. That is almost the same thing in one sense, although of course not as disastrous as we're talking about a non-alphabetic character.
  • Yannis Haralambous, Guidelines and Suggested Amendments to the Greek Unicode Tables, 2002 has this example:

  • Is this still relevant in Modern Greek in the absence of breathing marks?
  • Yannis Haralambous, Guidelines and Suggested Amendments to the Greek Unicode Tables, 2002 has this example:...
    In that example, given that the lowercase Greek characters have very different ductus from the Latin characters, that makes it unclear what to infer from the example. It's not even obvious that it's a single typeface.
  • John Savard
    John Savard Posts: 1,148

    In that example, given that the lowercase Greek characters have very different ductus from the Latin characters, that makes it unclear what to infer from the example. It's not even obvious that it's a single typeface.

    It may not be a single typeface. But while the Greek commas are different from the Latin commas, the Greek apostrophe differs from the Greek comma, while the Latin apostrophe is the same as the Latin comma.

    Is this still relevant in Modern Greek in the absence of breathing marks?

    And that's still true even if there are no Greek breathing marks to compare the Greek apostrophes to.

  • John Hudson
    John Hudson Posts: 3,349
    edited March 6
    @Denis Moyogo Jacquerye
    Having U+0387 ANO TELEIA canonically normalized to U+00B7 MIDDLE DOT ... means they should look the same anyway, as users should expect to be able to use them interchangeably.
    But this is exactly what Greek typography experts are saying is wrong, and have been saying so for decades. U+0387 sits at the x-height (or at the smallcap height, or at the cap height, depending on text formatting), and U+00B7 sits at the mid-x-height. The ano teleia should not sit that low. So ‘look the same anyway’ implies that either the ano teleia needs to sit too low, the middle dot needs to sit too high, or one compromises and makes them both at the wrong height.

    @Peter Constable
    In that example, given that the lowercase Greek characters have very different ductus from the Latin characters, that makes it unclear what to infer from the example. It's not even obvious that it's a single typeface.
    This is the dominant style of Greek typography from the late 18th Century onward, combined with exactly the style of Latin typography to which it was stylistically and historically matched. They may not be the ‘same typeface’ in whatever sense that might be understood at the time, but they are very much characteristic of Greek and Latin types that were used side-by-side. The style of both originates in the types of the Didot family in  France, who published books in both Latin and Greek in this kind of type.

    Regarding this shape of the ‘apostrophe’ in the Greek, this character might actually be the spacing koronis, and it is coordinated with the shape of the breathing marks because it is integrated in the polytonic orthography.

  • Denis Moyogo Jacquerye
    edited March 6
    So ‘look the same anyway’ implies that either the ano teleia needs to sit too low, the middle dot needs to sit too high, or one compromises and makes them both at the wrong height.
    Yes, that’s the issue. The "look the same anyway" means they should behave the same. Both middle dot and ano teleia should look the same in Greek, meaning middle dot should look like what ano teleia should look like in Greek, in Greek context.

    If we generalize this, any character may need to look different depending on what writing system it is used in. Having some of those characters duplicated as script specific characters, in particular those that will be normalized to common ones or that are not available on standard keybaord layouts, creates a really messy situation.
  • John Hudson
    John Hudson Posts: 3,349
    Both middle dot and ano teleia should look the same in Greek, meaning middle dot should look like what ano teleia should look like in Greek, in Greek context.
    It also means that both codepoints need to be supported in any Greek font, and then grek script locl substution applied to map the /periodcentered/ glyph to an appropriate form (probably by mapping to the ano teleia glyph). A grek/DFLT locl substitution at least doesn’t suffer from the fragility of langsys mechanisms for a language-specific substitution, but it does still rely on script itemisation and run segmentation correctly getting the mid-dot character into the Greek run for glyph processing.

  • Denis Moyogo Jacquerye
    edited March 6
    My guess is font editors that add automatic feature code and QA tools should ensure something like the following is present or is taking place on Greek text run:
      script grek;
      lookup grek_locl_lookup {
        sub 'periodcentered by anoteleia;
        sub $UC 'periodcentered by anoteleia.cap;
      } grek_locl_lookup;
    

    Looking at Greek corpora, the middle dot is sometimes used as a bullet at the beginning of lines, but that’s a corner case and U+2022 • is a better character for it. In this thread on periodcentered it is mentionned that periodcentered is sometimes used as subordinate bullet to a full dark U+2022 •. No solution is perfect here but the main use case for periodcentered should be as ano teleia in Greek.
  • Kent Lew
    Kent Lew Posts: 973
    A grek/DFLT {locl} sub should serve the purpose. But isn’t it telling that this means substituting one encoded glyph with a differently encoded one? (Which we font makers are usually told is bad form!)
  • Denis Moyogo Jacquerye
    edited March 6
    From a normalized-text point of view they are the same character. Ignore /anoteleia if that helps, then the substitution should be "sub 'periodcentered by periodcentered.loclgrek;"
  • Lucas de Groot
    Lucas de Groot Posts: 7
    edited March 7
      script grek;
        sub periodcentered' by anoteleia;
    
    With that, Greek dictionary division dots and math in Greek texts might look strange?
  • With that, Greek dictionary division dots and math in Greek texts might look strange?
    @Lucas de Groot Yes, it would. It currently looks strange when used as ano teleia (instead of U+0387). What compromise would Greek users rather have?

    Should dictionary division dots, Greek or otherwise, use U+2027 HYPHENATION DOT if possible? Its Unicode annotation is “visible symbol used to indicate correct positions for word breaking, as in dic·tion·ar·ies”).

    Should math in Greek texts use U+2219 BULLET OPERATOR or U+22C5 DOT OPERATOR if possible? These are specificaly math operator symbols.

    The substitution can be contextual (after letters) if that’s more adequate.

    U+00B7 is messy because, pre-Unicode, the same middle dot in less extensive encodings was used for many things. It’s a pity Greek has to deal with this or other languages/script systems having similar issues with it.