Hebrew composition in ccmp

Michael Rafailyk · July 2023

Hi! I'm trying to understand how the OpenType layout works for GSUB rules in a right-to-left direction scripts, and I have questions that I can't find the answer to.

In the case of precomposed Hebrew letters with combined marks (such as alefdagesh-hb uniFB30):

Does it make a sense to add a composition for them in ccmp?
If so, should I wrap these rules in a lookup with the script hebr and language IWR labels? The documentation says that ccmp is not script/languages sensitive, so looks like it doesn't make a sense.
Is it necessary to specify the right-to-left (RTL) direction for such a lookup?

sub alef-hb dagesh-hb by alefdagesh-hb;

Thanks in advance.

John Hudson · July 2023

1. Yes, this is fairly typical in Hebrew fonts. You can also rely on GPOS anchor positioning rules to place the dagesh dot within the letter, but in some cases, e.g. shin+dagesh in heavier typeface designs, a precomposed glyph provides an opportunity to use a slightly smaller dot or even to adjust the stroke weight or width of the letter slightly to accomodate the dagesh.

2. Yes, you should associate the relevant ccmp lookups with the hebr script tag, but you can use the dflt language system tag, which will work with any text in Hebrew script.

3. No, it is not necessary to set the RTL direction tag on the Hebrew lookups, but if you are using a graphical GSUB tool like VOLT you might want to, as it will flip the input UI so that the logical input and visual order correspond. The important thing to remember is that GSUB useි logical order, so you need to think in terms of RTL directionality of the glyphs even though the coding might specify the glyph sequence LTR, as in your example.

sub alef-hb dagesh-hb by alefdagesh-hb;

Simon Cozens · July 2023

Nobody understands the "RightToLeft" lookup flag, partly because it's terribly named. You should almost never use it. Here's what the spec says:

This bit relates only to the correct processing of the cursive attachment lookup type (GPOS lookup type 3). When this bit is set, the last glyph in a given sequence to which the cursive attachment lookup is applied, will be positioned on the baseline.

Michael Rafailyk · July 2023

@John Hudson
Thank you for detailed explanation about lookups and especially about RTL.

I still do not understand why the language dflt should be specified instead of IWR in this case. Does language dflt mean all possible languages in Hebrew script (such as Hebrew, Yiddish, Ladino, etc), or does it mean just the language activated by customer (default for his OS/application)?

And what will happen if specify the script but do not specify the language at all?

@Simon Cozens
Yes, that's really not what I was thinking.

John Hudson · July 2023

I still do not understand why the language dflt should be specified instead of IWR in this case. Does language dflt mean all possible languages in Hebrew script (such as Hebrew, Yiddish, Ladino, etc), or does it mean just the language activated by customer (default for his OS/application)?

The former. OpenType Layout begins with script itemisation based on Unicode script property: a string of Hebrew characters is recognised as Hebrew text and passed to the layout engine responsible for Hebrew OTL processing. The dflt language system tag is used to process the features and lookups unless a) the text has been tagged as a specific language in some way that the layout engine can associate with a different langsys tag and the font contains some language-specific shaping for that language that differs from dflt shaping for the script. So dflt means default shaping for the script.

A Hebrew font, for example, might contain some language-specific lookups for Yiddish digraphs, but most shaping would be dflt.

And what will happen if specify the script but do not specify the language at all?

You can’t really do that, because the table structure is an hierarchical tree: script -> langsys -> features -> lookups.

Every script tag should have a dflt langsys tag asssociated with it, while other langsys tags are all optional.

Michael Rafailyk · July 2023

John Hudson said:
the table structure is an hierarchical tree:
script -> langsys -> features -> lookups.

As I understand, to see this structure I need a special application that allows me to see how the tables stored inside the font files.

It is interesting that when the feature is generated automatically by type design applications, the script and language may be specified inside the lookup, so it may looks like this:

lookup ccmp_hebr_1 {
	script hebr;
	language dflt;
	sub alef-hb patah-hb by alefpatah-hb;
} ccmp_hebr_1;

And after the export it is converted into the structure you pointed out:

script hebr;
language dflt;
lookup ccmp_hebr_1 {
	sub alef-hb patah-hb by alefpatah-hb;
} ccmp_hebr_1;

This difference has confused me about which structure is correct. So thanks for clarifying.

Simon Cozens · July 2023

If you install the Python module "fontFeatures" you will get a utility called "otf2fea". Run this on your binary file and you will get back a feature file with a representation that's close to what's in the binary.

John Hudson · July 2023

This difference has confused me about which structure is correct. So thanks for clarifying.

You are not alone. I have met a number of people who assumed the way script and language tags are referenced in AFDKO feature code corresponded to the way the table data is structured.

Michael Rafailyk · July 2023

Simon Cozens said:

If you install the Python module "fontFeatures" you will get a utility called "otf2fea". Run this on your binary file and you will get back a feature file with a representation that's close to what's in the binary.

Just checked, cool, now I see everything.

Thomas Phinney · July 2023

I remember when I still thought that! It was several years after I got into feature code that I learned otherwise, and several more before I really understood the differences.

Hebrew composition in ccmp

Comments

Categories