Microsoft Word and OpenType Substitions for non-Latin Sequences?
Greetings All,
I’ve gone down the rabbit hole of OpenType feature support by various applications and could use some help in understanding the policies that Microsoft Word applies -it appears to be the odd application out that interprets features differently.
What I am observing is substitutions (in a calt, liga, other) appear to work for a Latin symbol sequence, but not non-Latin. Perhaps Word requires additional language configuration? I reduced the problem to this minimal features.fea file (FontLab 8 is my design tool):
languagesystem DFLT dflt; # I add this line only languagesystem latn dflt; # added by FL8 languagesystem ethi dflt; # added by FL8 languagesystem grek dflt; # added by FL8 feature calt { sub a b by x; # works everywhere sub h a by uni1203; # works everywhere sub uni1200 a by uni1203; # fails everywhere sub h uniFE00 by uni1203; # fails in MS Word, works for LibreOffice & Chrome sub uni1200 uniFE00 by uni1203; # fails in MS Word, works for LibreOffice & Chrome } calt;
In summary, I see for MS Word the following is ok:
sub <latin> <latin> by <any>;
but not:
sub <latin> <non-latin> by … ; sub <non-latin> <non-latin> by … ;
and universally failing is:
sub <non-latin> <latin> by … ;
I would like to understand why these fail and what missing OT statement MS Word is in need of. To be fair, I haven’t tested with Cyrillic, Greek, etc. , so “non-latin” is limited to Ethiopic, Sequence Variants, and PUA symbols in my trials.
Any help is appreciated!
Thanks,
-Daniel
Comments
-
Shaping engines divide a text string into separate segments. These segments can only hold one Script. The OpenType layout features are processed per segment, so therefore mixed scripts will never work.
The shaping engine used by Microsoft Word is outdated, so not all features work as they should.
1 -
Thank you @Erwin Denissen, that explains the scenario that "fails everywhere", which wasn't a true use case that I have. The Unicode Variation Sequence code points should be treated as script-independent though, this may be an area where the Word shaping engine is outdated and the others are more current.0
-
Unicode variation sequences are not supposed to be handled via OTL GSUB. They are pre glyph processing substitutions made at the cmap level using a format 14 cmap subtable. I don’t know if there is any way to generate a format 14 cmap subtable from within typical font development tools: I use DTL OTMaster to hand code mine, like this:
1 -
2
-
The UFO format supports it as well: https://unifiedfontobject.org/versions/ufo3/lib.plist/#publicunicodevariationsequencesFontmake or ufo2ft will use it when generating the OpenType files. Other tools might do so as well.2
-
Thanks @John Hudson, @Erwin Denissen, and @Denis Moyogo Jacquerye, the responses are eye-opening and fill in a critical knowledge gap on my end. It's great to see that there are a number of ways to tackle the problem. I look forward to trying them this week.
thanks again!0 -
While OpenType shapers do divide up text by script, what script a character is has some complications. Some characters have script Common or Inherited, or are listed in ScriptExtensions, and can therefore be included in text segments with other scripts. Not all applications handle this extra complication.
Word (on Windows) uses DirectWrite for text shaping. Notepad also uses DirectWrite, and for testing DirectWrite I would use both applications. I have encountered situations where Word would not apply liga by default (but could be enabled by the user), but would apply rlig by default. Notepad applied both with no user interaction.
From reading the Variation Sequences FAQ you need to be using a variation sequence that is known by Unicode.2 -
Not all applications handle this extra complication.
And those that do may not handle it consistently. There is no formal specification for how to perform OTL script itemisation and run segmentation, and different software makers have implemented it without common agreement. So, for example, I have found different results for script=Common integration into runs in Microsoft and Adobe shapers.
Really, this is something for which a standard algorithm is needed, one that would account even for edge cases such as adjacent sequence of different scripts with a script=Common character between them.0 -
I greatly appreciate all the help and insight today. I gave it a try with fontmake and FontCreator, skipping DTL OTMaster for now since I didn't see a trial version.
Adding a couple of VS entries into the UFO lib.plist, building and installing the font. The VSs were accepted by MS Word! Unfortunately, FL8 was not including color data in its UFO export so that became a new obstacle (I reported the issue to tech support, this may simply be an export limitation).
With FontCreator, I could open the COLR OTF file, add the VS mappings, and this finally worked as desired in Word. I'll test thoroughly during the week, at the moment it appears that I have a viable workflow.
thanks, again!0 -
As @bdevos notes, Unicode says, in answer to the question What variation sequences are valid? Only those listed in StandardizedVariants.txt, emoji-variation-sequences.txt, or the registered sequences listed in the Ideographic Variation Database (IVD).
I cannot find either "h uniFE00" or "uni1200 uniFE00" from the original question in any of the above references, though maybe I missed them. Are these the sequences you found were accepted by MS Word?
Also, as forsub <non-latin> <non-latin> by … ;
not seeming to work, I can confirm that in Word 2016 on Windows 10 such sequences do work for Arabic.0 -
I think you need to be more specific about "non-latin"
Are they the same non-Latin writing system?
e.g.sub "non-latin-A" "non-latin-B" by ...
is expected to fail, butsub "non-latin-A" "non-latin-A" by ...
should work
0 -
@bobh , those two sequences failed in MS Word. The 2nd case I did get accepted by word when I moved the substitution into a CMAP table.
Regarding the observation by @bdevos , I found that Word and other apps, and the font rendering stack, are not enforcing the definitions in the StandardizedVariants.txt, etc. , files as a permittable set. I was able to add my own custom Variation Sequences -fortunately (in the cmap). I interpreted those files as more of a reference for font vendors who want to support variation sequences.
@Thomas Phinney , it was the 1st, mixed-script case that was failing. I hadn't tried the 2nd case, it seems more sensible that it should work.
0 -
those two sequences failed in MS Word. The 2nd case I did get accepted by word when I moved the substitution into a CMAP table.I note that you were experimenting with these lookups in the calt feature, which would be dependent on the shaping engine applying that feature by default in Word. A more broadly reliable, and better suited feature to use, would be ccmp. However, the cmap 14 subtable mechanism is definitely the better bet for variation selector sequences, since that is what it was specified for, and bypasses shaping engine dependencies.
3 -
Can someone mention some fonts containing cmap format 14 tables, so that people can see them in action? I'd like to see both default and non-default UVS tables, if possible.
0 -
Here are some in the Google Fonts catalogue:ofl/mplus1pofl/notoserifscofl/notosanstcofl/notoserifhkofl/padaukofl/notosanshkofl/notoseriftcofl/notosansscofl/bizudgothicofl/bizudpgothicofl/bizudpminchoofl/bizudminchoofl/notosansjpofl/notoserifkrofl/notosanskrofl/notoserifjpofl/stixtwomath0
-
Thanks, Simon! These should keep me out of trouble for a while.
0
Categories
- All Categories
- 43 Introductions
- 3.7K Typeface Design
- 805 Font Technology
- 1K Technique and Theory
- 622 Type Business
- 444 Type Design Critiques
- 542 Type Design Software
- 30 Punchcutting
- 137 Lettering and Calligraphy
- 84 Technique and Theory
- 53 Lettering Critiques
- 485 Typography
- 303 History of Typography
- 114 Education
- 68 Resources
- 499 Announcements
- 80 Events
- 105 Job Postings
- 148 Type Releases
- 165 Miscellaneous News
- 270 About TypeDrawers
- 53 TypeDrawers Announcements
- 116 Suggestions and Bug Reports