.sc and .a glyph types

mauro sacchetto · June 2019

I've noticed that some fonts have two versions of entire glyph sets.
Adobe Garamond Premiere Pro has the small caps (extension: .sc) and an almost equal version (extension: .a, which I believe is for alternative), but in which, with the exception of very few glyphs (such as the two versions of Q or W ), all others are equal. The names change: A.a = Latin Capital Letter A and a.sc = a small cap (corporate Use), but I do not understand the function of this so wide doubling.

The thing is even more confusing to me if I look, for example, at the 'locl' lookup for the two types of Turkish <i> (dotted and dottless).

Here even two different rules are used, apparently for the same substitution, which certainly has its own precise logic, which however escapes me:
----------------
| I.a | u0130.a |

| i | i.dot |

| i.sc | i.dotsc |
----------------

It is likely that I.a and i.sc have a different logical function, but the glyph is the same: the dottless small cap I. Similarly u0130.a and i.dotsc are the dotted small cap i.

Therefore, in cases like this, what is the function of the double glyphs .a and .sc?

André G. Isaak · June 2019

Which version of Adobe Garamond Premiere Pro are you using? Mine is version 1.014 from 2005 and it doesn't have any glyphs ending in .a

Is it possible that the .a glyphs are small caps from caps whereas the .sc are simply small caps? That would be consistent with what you write above (A.a vs. a.sc) but I can't confirm this from my more dated version. If this is the case, then this is simply to ensure that proper unicode values can be reconstructed from glyph names (i.e that the uppercase/lowercase distinction can be reconstructed from glyph names after c2sc is applied).

mauro sacchetto · June 2019

Version 2.0 rev. 2 (2007).
A lot of Adobe fonts from Font Folio 11 have .sc, .a and .alt glyphs....

John Hudson · June 2019

Adobe insist on a one-to-one mapping from characters to glyph variants that can be back-traced by parsing the glyph name. This means that they need to duplicate smallcap glyphs to distinguish those mapped from uppercase letters and those mapped from lowercase letters. This enables Acrobat reconstruction of text strings by parsing glyph names in PDFs distilled from print streams (which lack text encoding information).

Nick Shinn · June 2019

What André said about preserving the casing of the original text.

That is the correct way to do small caps, although I usually don’t bother.
It does seem like a lot of unnecessary work, as it really only addresses the rare occurrence where a PDF file has <c2sc> text applied, and someone wants to extract the original text in U&lc.

Any other situations where it would be useful?

mauro sacchetto · June 2019

</code>This means that they need to duplicate smallcap glyphs to distinguish those mapped from uppercase letters and those mapped from lowercase letters</div>But in any case from a graphical point if view is the glyph the same, right? A part from a few alternates...<br><div class="Quote">Any other situations where it would be useful?<br></div>In a font which I'm working around, I'm able to abilitate both Turkish <i>. But, with LaTeX, with this code:<br><pre class="CodeBlock"><code><div><br>\textbf{dotted}</div><div><br></div><div>dotted = İ i \textsc{i}</div><div><br></div><div>MakeUppercase \MakeUppercase{aabbccddiixx}</div><div><br></div><div>MakeLowercase \MakeLowercase{AABBCCDDİİ}</div><div><br></div><div>%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%<br></div><div><br></div><div>\textbf{dottless}</div><div><br></div><div>dottless = I ı \textsc{ı}</div><div><br></div><div>MakeUppercase \MakeUppercase{aabbccddeeffgghhıı}</div><div><br></div><div>MakeLowercase \MakeLowercase{AABBCCDDEEFFGGHHII}</div>

I'm not able to have correct upper- and lower-case. For example, in the first \MakeUppercase, I don't see any <İ> ... The resultant string is only AABBCCDDXX.
Moreover, in Adobe Garamond Premiere Pro doesn't work (correctly) for Turkish language the legatures f_i and f_f_i (to preserve the dot). So I think that in f_i and f_f_i it identifies the <i> not as the "normal" <i>

Thomas Phinney · June 2019

John Hudson said:

Adobe insist on a one-to-one mapping from characters to glyph variants that can be back-traced by parsing the glyph name. This means that they need to duplicate smallcap glyphs to distinguish those mapped from uppercase letters and those mapped from lowercase letters. This enables Acrobat reconstruction of text strings by parsing glyph names in PDFs distilled from print streams (which lack text encoding information).

John is correct.

But that doesn’t explain the choice of “.a” names.

For some larger fonts, built as OT-CFF with name-keyed glyphs (as opposed to CID-keyed), Adobe sometimes ran into issues with a subtable that uses the glyph names overflowing its 64K limit.

Changing the glyph name suffixes was one way to pull back the total size of the set of glyph name strings, without hurting the most important uses of parsing glyph names. (As described by John above.)

So, one sees a few Adobe fonts where many or even all the glyph names that have “.something” suffixes have those suffixes as .a, .b, .c, etcetera, to shorten the names.

In such a case instead of A.c2sc and a.smcp, you might have A.a and a.b. (Or whatever arbitrary extension was substituted for each feature.)

mauro sacchetto · June 2019

OK thank you for your explanation.
Now it remains to me to understand why substitution between uppercase and lowercase fails. In any case it doesn't depend from lookups, but from substitution tables. I've to see better inside them

A last dummy question. Can all these glyphs be placed in any place, for they have no predefinite slots? In Private Area or in Corporate Use area? Which is the difference? And why are not continous? I mean: inside an area a find the alphabet, in all another place some particular glyphs with dots, accents and so no. Why not all together?

PS
I'm no more able to insert images...

Mark Simonson · June 2019

What's "corporate use area"?

mauro sacchetto · June 2019

For example, I see Garamond Premiere Pro putting from 63329 to 63354 the small caps alphabet and read: Unicede Value U+... Corporate Use. Moreover: up to the slot 62718 I read: Private Use Area. From 62729 Corporate Use. I don't know, I'm asking...

Thomas Phinney · June 2019

Unicode has a Private Use Area (PUA). I have never heard of a "corporate use area" though. What you are seeing in Garamond Premier Pro is PUA usage. Perhaps some app you are using to view it chooses to label certain slots from the PUA as corporate use.

Adobe formerly chose to encode all alternates in new OpenType fonts, in the PUA (starting at E000 and upwards), in the early days of OpenType. The idea was that apps that were Unicode-savvy but not OpenType-savvy would still offer users some way to access these glyphs. Adobe’s choice of PUA slots was semi-standardized, at least within a family.

The counter-argument was that bogus codepoints would not help anyone in the long run, and once OpenType feature access was more common would just become another legacy irritant.

This was a contentious decision internally at the time, a tough call. I was one of the main people on the “encode” side of the fence, and that view won out. I was wrong.

Very few users made use of the capability (even I found myself reluctant to do it, it was a pain), and now years later it just confuses type designers and end users alike.

mauro sacchetto · June 2019

In practice: the distinction between glyphs .sc (smcp) and glyphs .a (.c2sc) serves to distinguish those mapped from uppercase letters and those mapped from lowercase letters.

Can I place these glyphs anywhere, except in slots that have a Unicode value already assigned?

Can I rearrange them in the order that is most congenial to me without jumps and gaps between the slots, obviously provided that the various replacement lookups (aalt, c2sc and smcp) are correctly composed?
Which is the more convenient and up-to-date behavior?

Thomas Phinney · June 2019

The most up-to-date behavior is not to encode them. Leave them accessible by features only. They get glyph names, but no codepoints.

Otherwise, if you choose to encode them for some reason, you probably should put them in one of Unicode’s defined Private Use Areas (see https://en.wikipedia.org/wiki/Private_Use_Areas). One is in the Basic Multilingual Plane (BMP) and sees the most use, but they are all valid.

If the font has no lowercase, of course, you could consider using lowercase slots. Or if it has no caps, you could use the cap slots.

One thing you should definitely NOT do is use unassigned bits of Unicode as places to encode your small caps. Such unassigned codepoints are subject to later use. There really is no reason I can think of to use them now. (Unless you are trying to make a fake font from the future as “evidence” of time travel? But even then you would not be putting small caps in those slots.)

Adam Jagosz · June 2019

Does this issue affect all PDFs? I always thought it was something of the past, only occurring with some outdated PDF generators...

mauro sacchetto · June 2019

Is the following correct then?

- 65 Latin Capital Letter A (u0041)

- 97 Latin Small Letter A (u0061)

- 63329 a.sc (Private Area, uF761)

- 66399 A.a Latin Capital Letter A (Private Area, u0041, as the first line), that is the same glyph of a.sc

@Jagosz

I don't understand what the question refers to. The fact that LaTeX does not correctly replace the Uppercase and Lowercase? I have to correctly re-encode the glyphs because I believe the error is in some lookup replacement table (aalt, c2sc, smcp). I have found that it is not in the local lookup for the Turkish language

Thomas Phinney · June 2019

Maybe I wasn't clear enough: Adobe formerly chose to encode all alternates in new OpenType fonts, in the PUA. I think they stopped doing this about a decade ago. Very few other people do it these days.

Is there a particular reason to encode these glyphs?

Kent Lew · June 2019

Adam — I believe it is still possible to create such a PDF, with a specific sequence of distillation, but it has become extremely rare. (It’s been a while since I tried to recreate the situation. Perhaps someone will correct me.)

Paul Miller · June 2019

Thomas Phinney said:

Unicode has a Private Use Area (PUA). I have never heard of a "corporate use area" though.

The Corporate Use Area was a proposal by Andreas Stötzner (search for LINCUA) as an attempt to harmonise the useage of the PUA between MUFI, TITUS and anyone else who may want to 'claim' space in the PUA.

I don't believe it was widely adopted.

Thomas Phinney · June 2019

@Kent Lew & @Adam Jagosz

PDF creation can be dependent on glyph names to determine encoding of glyphs, IFF all the following are true (rare workflows these days):

the PDF is created from a PostScript print stream instead of directly from a live document. Acrobat Distiller is an example of this.
the PDF creator does not have access to the original font (or does not make use of its access)
the font in the print stream has no inherent encoding info. With OpenType CFF, the usual print method for PS was one in which the CFF was lifted from its OT wrapper, ditching the cmap. With TTF (OpenType or not), as long as it isn’t a truly ancient workflow that converts the TTF to a PS font, I *think* you should be OK. (Native support for TTF was introduced early in PostScript Level 2 history, but not the initial release of it.)

So, yeah if the stars are badly aligned, it can happen. It isn't super common. A designer could reasonably decide they can't be bothered to worry about supporting such corner cases. If creating a font for a particular purpose or client, might need to check if it is needed.

Of course, just communicating back and forth about this with the client and trying to figure out if they care might be as much work as just doing the glyph names the good old safer way in the first place.

mauro sacchetto · June 2019

Maybe I wasn't clear enough: Adobe formerly chose to encode all alternates in new OpenType fonts, in the PUA. I think they stopped doing this about a decade ago. Very few other people do it these days.

Certainly it is I who have not completely understood how to move in practice.
Then: in slot 192 I find in the font the glyph called Agrave as Capital Letter and that of Agrave as small cap (therefore agrave.sc). I add (with a "paste copy") doubling the glyph of the small cap and I only attribute a glyph name to it, in the case of this example Agrave.a.
Then I find in the 7680 slot the glyph of the Latin Capital Letter A with Ring Below which has Unicode Value ini1E00 and also the small cap version at slot 63626, indicated only as uniF88A. I also copy this glyph "somewhere" in a slot without a Unicode value already assigned: to this point is it correct that I attribute to it as glyph name the Unicode value of the Capital Letter A with Ring Below and that is ini1E00.a? Because in itself this last glyph has no name and no Unicode value (the program gives it to me as NameMe.slot_number ...

mauro sacchetto · June 2019

...Or have I to call it Aringbelow.a ? Can you indicate me an up-to-date (possibly free) font that I can analyze?

Khaled Hosny · June 2019

PDF format has historically had bad support for mapping glyphs back to code points (and arguably it still not the most ideal in this area, IMHO XPS is the only document format I know in this category that does this mapping right).

Before PDF 1.4, the only way to embed such information in PDF files is the ToUnicode dictionary of the PDF font CMAP (not to be confused with the OpenType cmap table, they are totally different things). ToUnicode can only handle one to one and one to many glyph to code point relationships (single substitutions and ligatures). Also each glyph in the font can occur at most once (since it is the key in the dictionary), so a glyph used for both smcp and c2sc features can either map to the lower case or the upper case code point, but not both. The mapping can’t also handle many to one or many to many glyph to code point relationships, which makes it unsuitable for many complex scripts or advanced uses of OpenType layout.

PDF 1.4 introduced /ActualText tagging which can tag any number of glyphs with any number of code points, but many PDF producers still don’t use it and some PDF readers still does not support it or the support is buggy.

This, I think, is the main reason to still insist in unique one to one or one to many relationships between glyphs and code points in fonts, even if it means duplicating otherwise identical glyphs. Personally I have given up on this as the kind of fonts I usually make will have more serious text extraction problems with such PDF producers/readers.

mauro sacchetto · June 2019

ok, I'll proceed this way. Thank you

Howdy, Stranger!

Quick Links

Categories

.sc and .a glyph types

Comments