Monotonic Greek in double encoded unicase fonts

Question 1 — quadruple encoding

Since the letter shape of Alpha and Alphatonos looks identical in Monotonic Greek (and it is true for double encoded unicase fonts), I would like to encode all ~tonos versions to their base letters. Is it a good idea? If so, is my encoding map correct?

Alpha			0391 03B1 0386 03AC	← Alphatonos
Beta			0392 03B2
Gamma			0393 03B3
Delta			0394 03B4
Epsilon			0395 03B5 0388 03AD	← Epsilontonos
Zeta			0396 03B6
Eta			0397 03B7 0389 03AE	← Etatonos
Theta			0398 03B8
Iota			0399 03B9 038A 03AF	← Iotatonos
Iotadieresis		03AA 03CA 0390		← iotadieresistonos
Kappa			039A 03BA
Lambda			039B 03BB
Mu			039C 03BC
Nu			039D 03BD
Xi			039E 03BE
Omicron			039F 03BF 038C 03CC	← Omicrontonos
Pi			03A0 03C0
Rho			03A1 03C1
Sigma			03A3 03C3 03C2		← sigmafinal
SigmaLunateSymbol	03F9 03F2
Tau			03A4 03C4
Upsilon			03A5 03C5 038E 03CD	← Upsilontonos
Upsilondieresis		03AB 03CB 03B0		← upsilondieresistonos
Phi			03A6 03C6
Chi			03A7 03C7
Psi			03A8 03C8
Omega			03A9 03C9 038F 03CE	← Omegatonos

Question 2 — suppress tonos in ccmp

Also, in the case when tonos is a separate symbol in the text, I would like to compose it in ccmp, but directly to Alpha glyph instead of Alphatonos (which is not presented in the font and is encoded in Alpha glyph now). Will this substitution break the source text if the user copy it or change the font?

script grek;
language dflt;
lookup ccmp_grek_1 {
	sub Alpha acutecomb by Alpha;
	sub Epsilon acutecomb by Epsilon;
	sub Eta acutecomb by Eta;
	sub Iota acutecomb by Iota;
	sub Omicron acutecomb by Omicron;
	sub Upsilon acutecomb by Upsilon;
	sub Omega acutecomb by Omega;
} ccmp_grek_1;

Answers

  • John Hudson
    John Hudson Posts: 3,622
    To confirm my understanding: this is an all-caps font, and you are expecting text to be displayed with the tonos mark suppressed, even at the beginning of words, correct?

    It can get a little more complicated than your proposed multiple encoding scheme suggests, because the diaeresis can behave contextually in all-caps text. I suspect this is a little more frequent in polytonic, but I think there are also instances in monotonic where a sequence of two vowel letters not forming a diphthong are distinguished by the tonos being applied to the first vowel instead of the second, but in all-caps the suppressed tonos is replaced by a diaeresis on the second vowel:



    Will this substitution break the source text if the user copy it or change the font?
    No, because the substitution is happening at the glyph level, and does not affect the source text character string. Of course, if the user switches fonts, they will see different results presuming the other font does not include the kind of code you are proposing.

    However, it could affect downstream text in a print-stream-distilled PDF. Acrobat (and some other PDF viewres?) parse glyph names in the embedded font to attempt to reconstruct the text for searching and copying. This is always an issue with multi-encoded glyphs, because the glyph name will only map to one of the possible encodings.

  • this is an all-caps font
    Yes.
    and you are expecting text to be displayed with the tonos mark suppressed, even at the beginning of words, correct?
    I expect the font to visually appear according to Gerry Leonidas' Monotonic conversion, like when the text is set to all-caps.
    there are also instances in monotonic where a sequence of two vowel letters not forming a diphthong are distinguished by the tonos being applied to the first vowel instead of the second, but in all-caps the suppressed tonos is replaced by a diaeresis on the second vowel
    This is most complicated part, which I'm not sure I understand completely. John, are you talking about some exceptions cases? How rare they are?
    the substitution is happening at the glyph level, and does not affect the source text character string
    Thanks for confirming that. Things are easier to understand when uppercases and lowercases are encoded to different glyphs, and with a double encoding I was confused. So, it's good to know it now.
    However, it could affect downstream text in a print-stream-distilled PDF
    I remember a discussion about this. As I understand it, the problem occurs only when copying the text from such a PDF. But it should be not a problem for just watching this PDF, right?
  • John Hudson
    John Hudson Posts: 3,622
    This is most complicated part, which I'm not sure I understand completely. John, are you talking about some exceptions cases? How rare they are?
    I don’t know what the frequency is. The point is that this is an orthographic rule that is possible to catch at the glyph level, but not if the cmap has already removed the distinction between accented and unaccented letters. In order to handle this rule contextually, one needs to preserve the character distinction into the GSUB level.

    As I understand it, the problem occurs only when copying the text from such a PDF. But it should be not a problem for just watching this PDF, right?
    Text searching within the PDF would also be affected.