Hi everyone,
I am not a font professional -- using an existing font and trying to map a Unicode range to another (Syriac font, the Hebrew Unicode points mapped to Syriac glyphs).
In FontForge, for each Hebrew alphabet Unicode point, I add an "Alternate Unicode Encoding" in Glyph Info > Unicode and the previously empty Unicode points now display the correct glyphs.
The problem is, the Substitutions are not applied, i.e. ligatures between adjacent glyphs etc. They are correctly applied when the original Unicode points of the glyphs are entered, but not the new "alternate encoding" (Hebrew) Unicode points.
Am I doing something wrong? I thought these Substitutions depended on glyphs, not Unicode points, so not sure why it doesn't work.
Thanks.
Benjamin
Comments
The first step of OpenType Layout processing is for software to itemise runs of text according to Unicode script property of the characters, and then to pass the itemised runs to the appropriate shaping engine. So when you have a sequence of Hebrew Unicode characters, these are going to be identified as Hebrew by the software, and passed to the Hebrew shaping engine. This means that any OpenType GSUB and GPOS lookups you want to be applied to the glyphs mapped from those characters a) need to be associated with the <hebr> script tag, and b) need to be associated with layout features that are processed by the Hebrew shaping engine.
The biggest difference between Hebrew and Syriac shaping engines is that the latter does joining property analysis on the text and applies associated shaping features <init> <medi> <fina> etc.. The Hebrew shaping engine does not do this kind of analysis and does not apply those features, because Hebrew is not treated as a joining script by Unicode.
So it is difficult to get Syriac shaping to happen on Hebrew characters.
Really, this sort of thing should be happening at the text processing level, not the font level. i.e. if you want to be able to display Hebrew text in Syriac, you should actually convert the text to Syriac characters using a macro of some kind.
Why is it not possible to make the substitution at the font level, for instance as a stylistic alternate, and then let the shaping engine do its job with the resulting Syriac characters? I just tried and that did not work.
No. OpenType works in glyph space. Text is encoded in character space. There are two interfaces between character space and glyph space: the font cmap table, which maps characters to their default glyphs; and the shaping engine, which activates some of the glyph features in the font based on analysis of the text string. There is no mechanism by which you can encode text as Hebrew and tell it to behave like Syriac, because Syriac shaping behaviour is based on the text being encoded as Syriac characters. If you want Syriac shaping, you have to provide Syriac characters to the shaping engine; if you provide Hebrew characters, you're going to get Hebrew shaping.
A glyph is just an index in a font. The shaping engine has no idea what the glyph looks like or whether it's shape is Hebrew or Syriac. The shaping engine is entirely dependent on the Unicode script property of the character in the text, and the mapping of the character code to a glyph index in the font cmap table. What comes after — the OpenType Layout features substituting and positioning glyphs — follows from what the shaping engine understands the text to be, and the path from the character code to the glyph index and through the layout features. But that path always begins with script itemisation, and once the shaping engine determines that the characters are Hebrew, that is going to determine how the runs are shaped, and there's nothing you can do in the font to tell the engine 'No, these are really Syriac!'
I think that the source of confusion here may reside in the (pseudo) syntax used above, where you appear to be identifying glyphs by the unicode values of their associated base characters.
GSUB tables deal exclusively with glyph IDs, not with unicode values, so even if you write a substitution which *appears* to change the underlying character, it really does no such thing -- it simply replaces one GID with another leaving the underlying character (and hence unicode value) unchanged.
As an example, consider the following (rather pointless) feature:
feature ss01 { # ROT-13
sub [A B C D E F G H I J K L M N O P Q R S T U V W X Y Z] by
[N O P Q R S T U V W X Y Z A B C D E F G H I J K L M];
} feature ss01;
This would implement ROT-13 within a font and applying this feature would result in text which looks like gibberish.
So, for example, "THE QUICK BROWN FOX" would be rendered as "GUR DHVPX OEBJA SBK".
However, if you were to apply this feature and then run your spell checker, it wouldn't find any errors because the applications program would still see this as 'THE QUICK BROWN FOX'. Similarly, in your example above, you can map alef to alaph, but anything outside the font (including the shaping engine) is still going to see this as alef (U05D0). All of the substitutions performed by your GSUB table take place after the shaping engine is already done its work.
André
When I have time, I will make another experiment: put all the Syriac characters in the Hebrew range (that's a big cheat, which may also require renaming derived glyphs), replace the tag syrc by hebr in the feature definitions and see if the converted "Syriac" features are applied on that "Hebrew" glyph run (to use Adobe's terminology).
PS. I expect that applying ttx, then a sed script on the resulting ttx file and finally applying again ttx should be enough to get the desired font.
This won't work -- the hebrew shaping engine doesn't know anything about the cursive properties of Syriac, and as I point out in my previous post, no changes made by your features is going to affect the fact that the underlying text is Hebrew, not Syriac.
I think they only way you'd be able to get this to work would be to define some sort 'calt' feature which basically does all the work normally done by the Syriac shaping engine (i.e. 'calt' would have to be used in place of 'init', 'medi', and 'fina'). As others have pointed out, though, this is probably not the best approach.
André
https://www.microsoft.com/en-us/Typography/OpenTypeSpecification.aspx
https://www.microsoft.com/typography/otfntdev/arabicot/shaping.htm
https://www.microsoft.com/typography/otfntdev/hebrewot/shaping.htm
The first describes the Arabic shaping engine (which is also used for Syriac), whereas the second describes the Hebrew shaping engine. The crucial point here is that the Hebrew shaping engine doesn't call 'init', 'medi' and 'fina' for you, whereas the Syriac one does, and is aware of which Syriac characters can join and which can't.
Andre
Those links describe the Uniscribe shaping engine. Is that considered a spec with which all applications on all platforms need to comply?
But similar principles are going to hold on other platforms such as DirectWrite, HarfBuzz, or ATS. if the input characters are Hebrew, whatever shaping engine is used is going to treat it as Hebrew, which means it isn't going to call on the relevant joining features in the font.
I should note that I'm actually a Mac person not a PC person. I realized after posting those links that uniscribe is dated, but I don't know the relevant DirectWrite links.
Andre
XeLaTeX does not and I can't guess what other application would.
Pretty much. When we find inconsistencies between other shaping engine behaviour and Uniscribe, we report it as a bug, and generally the developers acknowledge it as such. Microsoft led the way on complex script shaping for OpenType, so defined the standard.
The whole OpenType Layout model is predicated on layout engines looking after shaping intelligence above the font level, with the font supporting that process (contra Apple's AAT and SIL's Graphite models, in which the shaping intelligence is built into the fonts).
I can't read or write Syriac but I opened with Pages the .doc file provided by the link above, tried a few copy and paste and saw no bad word reordering. Pages says that the fonts used by the word document are missing and uses a default. The missing fonts are Talada and Adiabene.