Monotonic Greek in double encoded unicase fonts

Michael Rafailyk · March 13

Question 1 — quadruple encoding

Since the letter shape of Alpha and Alphatonos looks identical in Monotonic Greek (and it is true for double encoded unicase fonts), I would like to encode all ~tonos versions to their base letters. Is it a good idea? If so, is my encoding map correct?

Alpha			0391 03B1 0386 03AC	← Alphatonos
Beta			0392 03B2
Gamma			0393 03B3
Delta			0394 03B4
Epsilon			0395 03B5 0388 03AD	← Epsilontonos
Zeta			0396 03B6
Eta			0397 03B7 0389 03AE	← Etatonos
Theta			0398 03B8
Iota			0399 03B9 038A 03AF	← Iotatonos
Iotadieresis		03AA 03CA 0390		← iotadieresistonos
Kappa			039A 03BA
Lambda			039B 03BB
Mu			039C 03BC
Nu			039D 03BD
Xi			039E 03BE
Omicron			039F 03BF 038C 03CC	← Omicrontonos
Pi			03A0 03C0
Rho			03A1 03C1
Sigma			03A3 03C3 03C2		← sigmafinal
SigmaLunateSymbol	03F9 03F2
Tau			03A4 03C4
Upsilon			03A5 03C5 038E 03CD	← Upsilontonos
Upsilondieresis		03AB 03CB 03B0		← upsilondieresistonos
Phi			03A6 03C6
Chi			03A7 03C7
Psi			03A8 03C8
Omega			03A9 03C9 038F 03CE	← Omegatonos

Question 2 — suppress tonos in ccmp

Also, in the case when tonos is a separate symbol in the text, I would like to compose it in ccmp, but directly to Alpha glyph instead of Alphatonos (which is not presented in the font and is encoded in Alpha glyph now). Will this substitution break the source text if the user copy it or change the font?

script grek;
language dflt;
lookup ccmp_grek_1 {
	sub Alpha acutecomb by Alpha;
	sub Epsilon acutecomb by Epsilon;
	sub Eta acutecomb by Eta;
	sub Iota acutecomb by Iota;
	sub Omicron acutecomb by Omicron;
	sub Upsilon acutecomb by Upsilon;
	sub Omega acutecomb by Omega;
} ccmp_grek_1;

John Hudson · March 13

To confirm my understanding: this is an all-caps font, and you are expecting text to be displayed with the tonos mark suppressed, even at the beginning of words, correct?

It can get a little more complicated than your proposed multiple encoding scheme suggests, because the diaeresis can behave contextually in all-caps text. I suspect this is a little more frequent in polytonic, but I think there are also instances in monotonic where a sequence of two vowel letters not forming a diphthong are distinguished by the tonos being applied to the first vowel instead of the second, but in all-caps the suppressed tonos is replaced by a diaeresis on the second vowel:

Will this substitution break the source text if the user copy it or change the font?

No, because the substitution is happening at the glyph level, and does not affect the source text character string. Of course, if the user switches fonts, they will see different results presuming the other font does not include the kind of code you are proposing.

However, it could affect downstream text in a print-stream-distilled PDF. Acrobat (and some other PDF viewres?) parse glyph names in the embedded font to attempt to reconstruct the text for searching and copying. This is always an issue with multi-encoded glyphs, because the glyph name will only map to one of the possible encodings.

Michael Rafailyk · March 13

John Hudson said:

this is an all-caps font

Yes.

and you are expecting text to be displayed with the tonos mark suppressed, even at the beginning of words, correct?

I expect the font to visually appear according to Gerry Leonidas' Monotonic conversion, like when the text is set to all-caps.

there are also instances in monotonic where a sequence of two vowel letters not forming a diphthong are distinguished by the tonos being applied to the first vowel instead of the second, but in all-caps the suppressed tonos is replaced by a diaeresis on the second vowel

This is most complicated part, which I'm not sure I understand completely. John, are you talking about some exceptions cases? How rare they are?

the substitution is happening at the glyph level, and does not affect the source text character string

Thanks for confirming that. Things are easier to understand when uppercases and lowercases are encoded to different glyphs, and with a double encoding I was confused. So, it's good to know it now.

However, it could affect downstream text in a print-stream-distilled PDF

I remember a discussion about this. As I understand it, the problem occurs only when copying the text from such a PDF. But it should be not a problem for just watching this PDF, right?

John Hudson · March 13

This is most complicated part, which I'm not sure I understand completely. John, are you talking about some exceptions cases? How rare they are?

I don’t know what the frequency is. The point is that this is an orthographic rule that is possible to catch at the glyph level, but not if the cmap has already removed the distinction between accented and unaccented letters. In order to handle this rule contextually, one needs to preserve the character distinction into the GSUB level.

As I understand it, the problem occurs only when copying the text from such a PDF. But it should be not a problem for just watching this PDF, right?

Text searching within the PDF would also be affected.

Michael Rafailyk · March 13

I see the pitfall, thanks for highlighting this. The reason is similar to why the lowercase "i" should be presented even in the all-caps fonts for correct Turkish localised substitution.

Nick Shinn · March 13

As the standard for fonts is upper and lower case, it’s helpful to make the distinction between:

Majuscule and Minuscule (Upper and lower case)
Majuscule and Small Majuscule (Caps with Small Caps)
Majuscule and Majuscule (Caps with same-size Caps)
Unicase and Unicase (Unicase with same-size Unicase)
Unicase and Small Unicase

Your project would appear to be Majuscule and Majuscule, or Majuscule and Small Majuscule.
But not unicase.

https://typedrawers.com/discussion/986/unicase-cyrillic-and-greek

Erik · March 14

Michael Rafailyk said:

Will this substitution break the source text if the user copy it or change the font?

script grek;
language dflt;
lookup ccmp_grek_1 {
	sub Alpha acutecomb by Alpha;
	sub Epsilon acutecomb by Epsilon;
	sub Eta acutecomb by Eta;
	sub Iota acutecomb by Iota;
	sub Omicron acutecomb by Omicron;
	sub Upsilon acutecomb by Upsilon;
	sub Omega acutecomb by Omega;
} ccmp_grek_1;

I expect the problem of PDF glyph names can be avoided by replacing the above substitutions with something like

sub [Alpha Epsilon Eta Iota Omicron Upsilon Omega] acutecomb' by acutecomb.suppressed;

…where acutecomb.suppressed is zero‐width and invisible.

Michael Rafailyk · March 14

Nick Shinn said:

Your project would appear to be Majuscule and Majuscule, or Majuscule and Small Majuscule.
But not unicase.

Sorry for the confusion. Most of letters have uppercase forms but some like Alpha or Epsilon appear in lowercase forms (stylistic decision), and they all have capital height. So, the font is unicase, but I want all the letters act as an uppercase ones, and apply the uppercase conversion rules to them even if Epsilon has a lowercase form. Nick, do you think that this is the right approach for such unicase fonts? I feel like for the reader it looks like an all-caps font in the first place, and the all-caps rules should be applied to it.

Erik said:
I expect the problem of PDF glyph names can be avoided by replacing the above substitutions with something like
sub [Alpha Epsilon Eta Iota Omicron Upsilon Omega] acutecomb' by acutecomb.suppressed;
…where acutecomb.suppressed is zero‐width and invisible.

It's a good find, thanks!

Nick Shinn · March 14

Nick, do you think that this is the right approach for such unicase fonts?

My unicase designs have all been lining, i.e. without extenders.*
But there are many other configurations, as Cassandre demonstrated with Peignot.
It’s all good.
*Even in typical Latin capitals, many typefaces have J and Q with descenders, and some of my unicase designs are like that.

As for what the reader expects, the reader is more flexible than the typographer, IMO.
That is particularly so for Greek—there is an orthodoxy which non-Greek type designers are encouraged to follow (which also makes adding Greek to a Latin design fairly straightforward), but Greek type designers and Greek typographic culture in general are not so dogmatic.

Monotonic Greek in double encoded unicase fonts

Question 1 — quadruple encoding

Question 2 — suppress tonos in ccmp

Answers

Categories