OpenType to AAT conversion

satya · June 2018

Hello!
Is there anyone who can help me convert a Telugu OpenType font to AAT?

Unfortunately, Apple's own Final Cut Pro (FCP) still does not support Indic OpenType fonts correctly (including the fonts shipped with macOS) and we need to supply AAT fonts to one of our TV client urgently. FCP is used extensively in the broadcasting industry.

Thank you,
Satya

André G. Isaak · July 2018

That's going to be a rather non-trivial task. Most of the major font design applications don't handle AAT, and the apple-supplied tools are (in my very limited experience) extremely cumbersome to use. I suspect you're looking at a large expenditure if you can find someone qualified to take this on.

Adam Twardoch · July 2018

This is principally possible, but non-trivial. In OpenType, Telugu requires actions done by the font and actions done by the engine: https://docs.microsoft.com/en-us/typography/script-development/telugu

In AAT, all the engine actions need to be encoded in the font. So you’d need a large Telugu text corpus, you’d run the OpenType font on it using something like HarfBuzz, and then you’d somehow need to encode all the syllable actions/transformations performed on the text (including all the reordering that in OT is performed by the engine) into a “morx” table.

In a way, the resulting “morx” would need to be like a “compressed cache” of all the glyph transformations performed by the OT engine plus GSUB.

Quite a few font makers expressed the need for such conversion tool to be made, but as far as I can tell, nothing of that kind was ever made publicly.

It is possible that Monotype or someone else have developed such workflow in-house.

FontForge has some ability to convert between OT and AAT layout tables, but it doesn’t handle any of the complex-script processing so it’d be useless.

Adam Twardoch · July 2018

BTW, I think the Indic OT engine in HarfBuzz is still implemented using a state table (originally that part was done in ICU Layout, the now-extinct opensource OT engine), and “morx” is also state-table-based.

So most likely, it would be possible to make an analytical converter that would use the Indic HarfBuzz engine plus a font’s “GSUB” table and spit out a corresponding “morx”. But it would take a clever developer with deep knowledge of the technicalities of both processing systems, and a noticeable budget, to do such a tool.

I remember that the Google i18n once thought of perhaps attempting this, but I’m not sure of any outcome.

You may have more luck asking via https://github.com/harfbuzz/harfbuzz/issues

John Hudson · July 2018

As Adam says, this is non-trivial. In addition to standard reordering performed by the shaping engine, for Telugu AAT I think you'd also have to handle as reordering the ligation lookups that ignore mark classes in GSUB.

Adam Twardoch · July 2018

There are a number of techniques where a finite state machine (FSM) can be automatically built from a corpus, see https://borjaballe.github.io/other/phdthesis.pdf

Since "morx" is an FSM, it would be quite possible to write a general tool that takes an OTL font and a large text corpus, translates each Unicode of that corpus into an initial glyph ID, then runs the OTL processing of that text with that font via HarfBuzz, and captures the final stream of glyph IDs.

The initial and final glyph ID streams are the input and output of the desired transformation that would need to implemented with an FSM. Some of the methods described in the cited paper could be applied to automatically build such an FSM and then store it in form of a "morx" table.

This is actually not a very complex task for a person who deals with these topics. It has nothing to do with fonts or texts — it’s just pattern recognition and compression. Similar techniques have been used for a long time in simple machine translation solutions and other such fields where you need to build a machine that would map one set of data onto another.

John Hudson · July 2018

The initial and final glyph ID streams are the input and output of the desired transformation that would need to implemented with an FSM.

And positioning?

Dave Crossland · July 2018

Positioning seems amenable to Machine Learning

John Hudson · July 2018

I sure hope so. I've started thinking about a possible future TYPO Labs presentation on the hardest things I've had to deal with in the past quarter century, and realise that they're basically all positioning.

Adam Twardoch · July 2018

Positioning is of course more complex. Substitutions are in the end totally simple decisions: you have a small list of items (glyphs in the font), like 200 or 700, and you make a chain of swaps. Even if this is a decision tree that depends on linear chunks of glyph IDs and they can be of different size, it's still just binary swaps.

For positioning it's much trickier because suddenly, each glyph becomes a set of coordinates. So, in a simplified manner, each glyph is a set of 1000x1000 =1,000,000 coordinate points, instead of a single glyph ID. So conceptually, the positioning decisions are 6 orders of magnitude more complex than substitution decisions.

As Dave says, machine learning might be employed because it involves making decisions what are “good-looking” spatial interactions between glyphs, so it's much more a qualitative decisions.

With substitutions, it's only quantitative decisions — you can test if the resulting stream of glyphs is "right" or "wrong", "true" or "false". No fancy computer vision required, so you can use computing techniques that are 40 years old (conceptually a 100 years old) to build tools for that.

So a morx writer is effectively trivial — it hasn't been done yet because nobody in the type field got in touch with people who build FSM-based stuff. It's a matter of formulating the task well, gathering sufficient corpus data and then finding a person who writes the code. It really has nothing to do with fonts, language, aesthetics. It's the same this as building a mechanical calculator or a system of train switches or a mechanical clock.

Positioning is different because the results need to be judged subjectively. Substitutions can be judged objectively.

OpenType to AAT conversion

Comments

Categories