I've noticed that in InDesign (ME version), Greek gets horridly mangled if you convert it to all caps. For example, a word like φύσης will get converted to ΦΎΣΗΣ which is just plain wrong. However, in a handful of fonts (e.g. Arno Pro) you do actually get the correct ΦΥΣΗΣ.
In Arno, this appears to be achieved via large amounts of 'calt' code. But is this really the responsibility of the font, or does Arno do this simply because lots of applications don’t handle Greek casing correctly?
This seems to me to be something the application rather than the font should handle.
2
Comments
A case can certainly be made for handling this at the character level, even with the above technical caveats, but there is also the question of what is ‘correct’ at the script level. Suppression of diacritics in all-caps Greek is a modern convention (as is left-side positioning of accent and breathing marks on capital letters, which is what leads to the suppression convention). For most of the history of Greek script, accents were not suppressed in all caps.
You wouldn't offhand know of any good sources of information on the leftward migration of the uppercase diacritics? I'm curious about when and why this occurred.
Here are some nice examples of above-right and right-side Greek accents on caps, from a 1593 Estienne edition of Isocrates. The first shows both monumental, accentless all-caps for the main title, and then accented all-caps below.
I have made a font with above-letter positioning of accents and breathing on caps, but that was for transcription of Byzantine seals and coins which, like icons, conventionally display the marks in that way.
Case conversion is a character operation, so is not something that fonts need to handle. The exception is smallcap mapping, which is a glyph substitution and hence needs to provide for langsys exceptions like Turkish i/ı since not all lowercase-to-smallcap implementations go via buffered case conversion.
I suppose I could have just linked to plain Alpha, but I found it made things less confusing, when writing the substitution code, to have glyphs specifically earmarked for the caps conversion.
Still feels wrong every time I look at it.
The smallcap set ends up being so large in this case because, again for Acrobat name parsing purposes, there are separate c2sc and smcp glyphs.
Traumatic memories. ugh!
If you have small caps, and need to know the underlying text, should it be lowercase or uppercase? Well, it depends on what the original text was: there are features that turn caps to small caps, and features that turn lowercase to small caps.
If the font developer makes the small-caps-from-lowercase and small-caps-from-uppercase separate glyphs, then a PDF-consuming app can look at the glyph names, and know from that what the underlying text ought to be. Otherwise, there will be only one set of glyph names (a.sc? A.sc?) and reverse-mapping based on that alone, could be wrong.
Of course, savvy PDF export apps will always include the actual text representation anyway. But this makes it even harder to lose the glyph-to-text mapping.
When PostScript-flavoured fonts are embedded in a PostScript file, only the CFF table actually gets embedded, essentially as an old-style PostScript Type 1 font, a format which predates unicode. The cmap tables aren't includes, so all unicode information is lost.
When you open a .pdf file created from a .ps file, Acrobat tries to reconstruct the unicode values but it has only the names to work with, so if you want it to be able to distinguish between (e.g.) a small capital A derived from a lowecase letter from one derived from an uppercase letter, you need to separate glyphs named A.sc and a.sc (Acrobat ignores anything after the first .). Similarly, if you substitute Atonos with Alpha in all-caps settings Acrobat would reconstruct the unicode as u0391 rather than u0386. If you want it to reconstruct the latter you'd need a duplicate glyph called Atonos.case.
This issue typically doesn't arise from PDFs created directly from modern applications like InDesign which do include unicode information; only those distilled from a .ps file or print stream.
PDF has a thing called “ActualText” to represent the underlying text and relate it to the glyphs as displayed. See for example https://blog.idrsolutions.com/2012/04/understanding-the-pdf-file-format-actualtext/