Greek all caps

I've noticed that in InDesign (ME version), Greek gets horridly mangled if you convert it to all caps. For example, a word like φύσης will get converted to ΦΎΣΗΣ which is just plain wrong. However, in a handful of fonts (e.g. Arno Pro) you do actually get the correct ΦΥΣΗΣ.

In Arno, this appears to be achieved via large amounts of 'calt' code. But is this really the responsibility of the font, or does Arno do this simply because lots of applications don’t handle Greek casing correctly?

This seems to me to be something the application rather than the font should handle.

Comments

  • John Hudson
    John Hudson Posts: 3,170
    If suppression of accents (and, properly, contextual insertion of dialytika in some instances) is handled at the character level, it is non-reversible without dictionary support (and not 100% reliable even then). Handling it at the glyph level in the font enables correct display without changing the underlying text strings.

    A case can certainly be made for handling this at the character level, even with the above technical caveats, but there is also the question of what is ‘correct’ at the script level. Suppression of diacritics in all-caps Greek is a modern convention (as is left-side positioning of accent and breathing marks on capital letters, which is what leads to the suppression convention). For most of the history of Greek script, accents were not suppressed in all caps.
  • Thanks, John.

    You wouldn't offhand know of any good sources of information on the leftward migration of the uppercase diacritics? I'm curious about when and why this occurred.
  • John Hudson
    John Hudson Posts: 3,170
    I am also very interested. A few years ago, when visiting a client, I looked through a book of reproductions of Greek typgraphy since its inception, and unfortunately I have not been able to locate a copy of the book or recall its title. What I do recall is that no left-side accents occurred until the 19th Century.

    Here are some nice examples of above-right and right-side Greek accents on caps, from a 1593 Estienne edition of Isocrates. The first shows both monumental, accentless all-caps for the main title, and then accented all-caps below.





  • This was quite confusing to me, some years ago, when I was looking at the 1756 Foulis Homer with Wilson's type:

    But I never found a modern font that placed accents that way. In a historically oriented font, would it make sense to offer that older placement as an option, or would no one ever use it?
  • John Hudson
    John Hudson Posts: 3,170
    I don’t think anyone would be likely to use right-side positioning, unless they were actually trying to represent the typography of an older edition.

    I have made a font with above-letter positioning of accents and breathing on caps, but that was for transcription of Byzantine seals and coins which, like icons, conventionally display the marks in that way.
  • Adam Jagosz
    Adam Jagosz Posts: 689
    edited October 2021
    AFAIK, the substitutions that go into supporting all-caps Greek do not involve specially designed glyphs (just copies or references of already existing glyphs, renamed as other also existing glyphs — e.g. Alphatonos.tonosless). So technically, an app could offer an all-caps style (not conversion), and web browsers in fact do. However this gets complicated with true small caps which again are handled by the font (whether as a feature or a separate small-caps font).
    Maybe in 30 years, we will have a standardized, app-level way of handling things like this, as well as Turkish support in all-caps/all-lowercase fonts, the Catalan punt volat and all the other copy-pasta that goes into each well-crafted font. But right now, ain't nobody got time fo' that. Regardless, all this being handled on the font level offers a unique opportunity for making fonts do surprising things.
  • John Savard
    John Savard Posts: 1,126
    edited October 2021
    Regardless, all this being handled on the font level offers a unique opportunity for making fonts do surprising things.
    Indeed, it does. But that is not necessarily a good thing.

    In the case of Turkish, to make this as simple as possible: where should the knowledge that in Turkish, the lower-case of I is ı and the upper-case of i is İ reside?
    In every font? In every application?
    Why, since there is only one Turkish language, and one Turkish writing system - and one Unicode standard?
    So the place where this belongs is in the operating system.



  • John Hudson
    John Hudson Posts: 3,170
    In every font? In every application?
    Case conversion tends to be something that happens at the application level. It may use libraries or components at the OS level, but since language tagging and things like dictionary-driven hyphenation are usually at the app level, Unicode special casing rules tend to be implemented at that level also.

    Case conversion is a character operation, so is not something that fonts need to handle. The exception is smallcap mapping, which is a glyph substitution and hence needs to provide for langsys exceptions like Turkish i/ı since not all lowercase-to-smallcap implementations go via buffered case conversion.
  • John Savard
    John Savard Posts: 1,126
    A crazy idea had occurred to me. B, the lower case of which is b, is not unified with В (U+0412), the lower case of which is в, or with Β (U+0392), the lower case of which is β. So why shouldn't Turkish I and i, which are not the upper and lower case of each other, have their own special code points?
    That would make the upper- and lower- case conversion independent of language.




  • This would solve the problem. The new problem then is that now there are two Latin capital I without dot and two Latin small i with dot in Unicode where all existing text (including a lot of ASCII documents) use the non-Turkish pair for Turkish and a lot of computer users will continue to input the non-Turkush pair on their keyboards. Effectively, it is too late to split these codepoints.
  • Nick Shinn
    Nick Shinn Posts: 2,200
    To make Greek (small caps and) cap conversions possible, I create and substitute special glyphs named, for instance, Alphatonos.alt. It’s a good job I know what I’m doing, because otherwise it would be very strange to have a glyph named “Alphatonos” that has no tonos!

    I suppose I could have just linked to plain Alpha, but I found it made things less confusing, when writing the substitution code, to have glyphs specifically earmarked for the caps conversion.
  • John Hudson
    John Hudson Posts: 3,170
    My preference is to keep my glyph set tidy and not add duplicate glyphs to represent mark-less Greek letters, but when I explained the Acrobat text reconstruction mechanism to Brill, they decided it was worth supporting and, since that mechanism relies on parsing glyph names, I needed to include a huge number of such duplicate glyphs in the fonts to cover all-caps and smallcaps display of polytonic Greek text with marks suppressed.

    Still feels wrong every time I look at it.
  • Nick Shinn
    Nick Shinn Posts: 2,200
    Is most of that because of polytonic accents on small caps?
  • John Hudson
    John Hudson Posts: 3,170
    Yes, but even just the all-caps duplicate set ends up being over a 100 duplicate glyphs.

    The smallcap set ends up being so large in this case because, again for Acrobat name parsing purposes, there are separate c2sc and smcp glyphs.
  • Oh my. I remember doing that for Hypatia Sans, I think it was.
    Traumatic memories. ugh!
  • John Savard
    John Savard Posts: 1,126
    Still feels wrong every time I look at it.

    Oh, of course. At the very least, it should be possible to only duplicate pointers to the glyphs rather than the actual glyphs themselves.
    But fixing that would require a change to the specification of the font format, leading to an incompatibility. So it's a pity it hadn't been gotten right the first time.
  • @John Hudson: Would you mind explaining why Acrobat name parsing requires duplicate sets of small caps? I noticed that in Brill, but I can't say I understand it.
  • Thomas Phinney
    Thomas Phinney Posts: 2,867
    edited October 2021
    The name parsing is used to determine “what’s the correct underlying character for this glyph?”

    If you have small caps, and need to know the underlying text, should it be lowercase or uppercase? Well, it depends on what the original text was: there are features that turn caps to small caps, and features that turn lowercase to small caps.

    If the font developer makes the small-caps-from-lowercase and small-caps-from-uppercase separate glyphs, then a PDF-consuming app can look at the glyph names, and know from that what the underlying text ought to be. Otherwise, there will be only one set of glyph names (a.sc? A.sc?) and reverse-mapping based on that alone, could be wrong.

    Of course, savvy PDF export apps will always include the actual text representation anyway. But this makes it even harder to lose the glyph-to-text mapping.
  • It's not actually Acrobat that's the culprit, but rather acrobat distiller (or other means of creating .pdfs from postcript files or print streams).

    When PostScript-flavoured fonts are embedded in a PostScript file, only the CFF table actually gets embedded, essentially as an old-style PostScript Type 1 font, a format which predates unicode. The cmap tables aren't includes, so all unicode information is lost.

    When you open a .pdf file created from a .ps file, Acrobat tries to reconstruct the unicode values but it has only the names to work with, so if you want it to be able to distinguish between (e.g.) a small capital A derived from a lowecase letter from one derived from an uppercase letter, you need to separate glyphs named A.sc and a.sc (Acrobat ignores anything after the first .). Similarly, if you substitute Atonos with Alpha in all-caps settings Acrobat would reconstruct the unicode as u0391 rather than u0386. If you want it to reconstruct the latter you'd need a duplicate glyph called Atonos.case.

    This issue typically doesn't arise from PDFs created directly from modern applications like InDesign which do include unicode information; only those distilled from a .ps file or print stream.
  • I just want to emphasize that “all Unicode information is lost” is only true if the PDF is created (1) from a PostScript print stream AND (2) without the original font available. This has always been uncommon, although there are doubtless some mass-PDF-generation scenarios in which it happens.

    PDF has a thing called “ActualText” to represent the underlying text and relate it to the glyphs as displayed. See for example https://blog.idrsolutions.com/2012/04/understanding-the-pdf-file-format-actualtext/
  • Thanks for the information. It's interesting and helpful.