Open Fontdata Storage

Georg Seifert
Georg Seifert Posts: 674
edited April 2019 in Font Technology
Continuing a discussion from twitter

It is desirable to have a font format that can store all data needed for different tools and workflows. Here are some goals and challenges:
  1. Human readable
  2. Extendable
  3. File structure
    • Cloud services have problems with a lot files, so a single file might be better
    • putting several files in a .zip defeats point 1
  4. Multi master support
    • Global and master specific font info (features vs vertical metrics)

Comments

  • Here is my suggestion for the basic structure:

    1. Font
      • family name
      • OpenType features
      • masters
      • glyphs
      • instances
    2. Masters
      • metrics
      • designspace coordinates
      • guides
    3. Glyph
      • name
      • unicode
      • export state
      • color label
      • (kerning groups)
      • layers
    4. Layers
      • outlines, components ...
      • width
      • color label
      • guides
      • association with masters (either directly or by designspace coordinates)
  • Bahman Eslami
    Bahman Eslami Posts: 73
    edited April 2019
    One aspect of developing fonts that is not being stored in an ideal way is OpenType features. Shouldn’t features be attached to glyphs instead of just one giant feature file attached to the font? Here is my proposal that I think should be a property of a Font > Glyph object:
    - Font
        - Glyph
            - Features
                - Direction (RTL, LTR, …)
                - Substitutions
                        - Substitute (input_glyph_list, output_glyph_list, before_context,
                        after_context, feature_list)
                        - Substitute (input_glyph_list, output_glyph_list, before_context,
                        after_context, feature_list)
                        ...
                - Positionings
                        - Mark (before_context, after_context, feature_list)
                        - Kern (before_context, after_context, feature_list)
                        ...
                - Type (Ligature, Component, Mark, …)
                - Script (Arabic, Cyrillic, Latin, …)
                - Groups (kern_gourp1, mark_group1, …)
                - Final name (unixxxx)
                - Carets

    Some properies could be empty or non existent like Carets. The reason GSUB and GPOS rules are separated is that it’s less likely they will be shared in one feature. I could have proposed a feature (e.g: liga, calt) property that encloses the GSUB or GPOS rules but the problem with this is that GSUB or GPOS rules could be shared between different features. Another advantage here is that we don’t need to come up with naming schemes for glyph names (e.g. glyph1_glyph2.case) to indicate their features which is very limited (considering contextual rules and shared lookups between features). It's easier to change character set without having to change a huge feature file. At the end during the font generation, compiler gathers data glyph by glyph (could be cached if any glyphs is unchanged) and converts it to binary data.


  • TimAhrens
    TimAhrens Posts: 57
    edited April 2019
    Shouldn’t features be attached to glyphs
    This seems to be the right way of thinking. It will involve some sort of translation but that should be manageable.
    More generally speaking, we should use the OOP logic (object-oriented programming), trying to store properties of items directly in these items and not in parallel data structures. Similarly, the co-ordinates of the masters and instances should be considered properties of these objects, then we do not need a parallel “design space” data structure.
  • One thing I said in the Twitter thread was that UFO seems to be based on a design philosophy of easily human readable and easily parsable, even if some of the necessary detail is not there yet.

    I think this is a great design philosophy for a storage format, and so what is needed is not a total rethink of a new format but added detail to the UFO spec. Let’s keep UFO but add what we need to it.

    Per-glyph feature storage seems like a sensible idea, (especially in keeping with the version-control idea of letting different people work on different parts of the font simultaneously) but we would also need to ensure that this works alongside features which deal with classes or multiple glyphs. (Should the f_i ligature be shored with /f or /i? I think the answer is “neither”.)
  • Bahman Eslami
    Bahman Eslami Posts: 73
    edited April 2019
    (Should the f_i ligature be shored with /f or /i? I think the answer is “neither”.)
    It should be the ligature glyph f_i that holds the substitution. Right now it's part of the glyph name. For kerning it could be the first glyph in the pair that holds it.

  • John Hudson
    John Hudson Posts: 3,190
    edited April 2019
    With regard to storing layout data, yes, let's do something: I find the whole 'shove a .fea file into the storage format' to be not much better than 'shove a hex representation of a binary table into the storage format'. Yes, the .fea code is readable, but it is a syntax that was designed to enable Adobe to mass-convert their PS Type 1 library to OTL in the late 90s, and I've never been convinced that it is actually a good way to represent layout features in developing new fonts. It's telling that people who do a lot of complex script development often still prefer to use VOLT or have built their own tools. Frankly, 22 years into OpenType, I'm still waiting for a really good OTL editor, and while I don't much care what code and compiler sits in the background in a tool, my instinct is that the storage and editing format should be at a higher level (such that, for example, a single layout source might generate not only OTL but also AAT and Graphite).

    If we're going to consider storing layout data at the glyph level, I suspect that will have to involve doing so in terms of specifying participation in lookups and groups. Frequently in complex script work, managing lookup order is important — which is why I still favour VOLT —, and a single glyph participates in the layout in multiple ways: as output from some GSUB, as input to other GSUB, as context in other GSUB or GPOS, as a base or as a mark in multiple and contextual GPOS, in sequential spacing adjustments. These roles are defined primarily in terms of how the glyph participates in lookups and groups, and only secondarily in how those lookups are mapped to features.
  • Bahman Eslami
    Bahman Eslami Posts: 73
    edited April 2019
    a single layout source might generate not only OTL but also AAT and Graphite
    I second that. Glyphs app showed a great feature that lots of features could be automated. But we also need a higher level interface to interact with features and change them if we want. If the features are glyph seperated, accessing them gets much easier too. Not all of the features that a glyph is involved in should be part of the glyph  but lots of them can and that already gives better control to designer. The order of substitution also can be saved in the glyph level. So if there is a substitution it could be stored like this as part of glyph:
    Priority: integer
    Input: array of string glyph names/one string of class name
    Output: array of string glyph names/one string of class name
    Before context: array of string glyph names/one string of class name
    After context: array of string glyph names/one string of class name
    Languages: array of string languages (human readable)
    Features: array of string features (human readable)
    Flags: array of strings (human readable)
    One might say that any of Input, Output, Before context, After context values contain names of other glyphs that might not exist in the font but still this can be managed in the implementation or the compiler. If the Features array holds more than one value then the implementation could decide to create a single lookup that is shared in different features during the compile time.



  • John Hudson
    John Hudson Posts: 3,190
    I'm concerned that you talk in terms of glyphs being involved in features. I know that .fea syntax tends to encourage this view because of the efficiencies it uses at the code level, but it's not how OpenType Layout tables are actually structured. Glyphs are involved in lookups and groups, and it is the lookups and groups that are involved in the features (which in turn are involved in script and language system hierarchies). I'm not sure it makes sense to try to capture all this — and the multiple roles that a single glyph can play within this structure — at the glyph level. But I'm willing to be convinced.
  • Bahman Eslami
    Bahman Eslami Posts: 73
    edited April 2019
    I have made projects with VOLT and am familiar with the structure you described. What I'm proposing doesn't have to be an only glyph level features but we can have this as a feature. I don't know the reason VOLT or OpenType Layout tables are structured the way they are. Some reasons could include efficiency or having a compact format. But on the desing level these should not be important for the designer. The implementaions could convert what designer had made to something that is best for the machine. But what I see is a human readable structure of binary format even in VOLT. This structure is cumbersome and hard to manage for complex scripts.
    Most features are revolving around glyphs in reality. In VOLT you can have a list of input and output glyphs in a lookup and you can define the context shared for the lookup. Then you associate this lookup with a langauge and a writing system. So it's more lookup centric. A compiler could decide a how glyph centeric substitution could be converted to lookups and be associated with features. I'm willing to see complex examples that doesn't work with this philosophy. I've been generating complex features for Arabic script and the reason I'm proposing this is to make things less complicated. If I'm proven wrong still I would be happy to have the VOLT structure to work with but with a compiler that is platform independent.
  • John Hudson
    John Hudson Posts: 3,190
    It's pretty normal in my Indic and SE Asian script projects for lookup order to involve interleaving of lookups from different features, which is the order in which I want the lookups to be applied by the layout engine (talking primarily in terms of post-reordering typographic features). There's usually more than one way to go about making layout for a complex script, and I like being able to change my methods and improve them over time, rather than being tied to a single approach that someone else has decided is the way it will be implemented in a tool. So while I'm okay with a source format that can be used as a basis for automating layout, I also want the source format to record how I have actually chosen to do my layout, and to preserve that in the source.
  • Question: the plan is to define a data storage format that covers current state of business or one that has a look into future?
  • That became a very different discussion than I hoped (a very interesting for sure). Can we move that into its own thread?
  • I don't see any obvious advantages in grouping features with their associated glyphs. There are some that operate on a single glyph – aalt, smcp and so on – but the result is another glyph. Where would this glyph be stored? In an object "inside" the original glyph definition? But do note that definitions for onum, pnum, lnum, and tnum may refer to each other's glyphs in a sort of recursive way: you can transform lining digits to tabular and the other way around. That calls for a glyph independent definition.

    It gets more complicated with multi-glyph features. Storing all of /fi, /ffi, and /fl code under /f sounds nice but if you add some special handling for an /i glyph (say, to handle a Turkish dotless /i) you need to hunt down all previous definitions using that /i elsewhere.

    "Human readable" does not mean that one must be able to write (or debug) this source document. There could be software for that! From a purely organizational point of view, for me the current (as in: binary storage) system makes most sense:

    1. Glyph outlines are defined and associated with a fairly random name and/or number;
    2. Those glyphs are indexed in one or more character maps, which maps each glyph to a codepoint that can be used on a computer;
    3. OpenType features refer to the glyphs by their names.

    On a more personal note: for me a ttx dump is perfectly readable. I can find whatever I want, and edit and add to it, then re-compile into a proper font again. Admittedly, a font editor is easier and less prone to random typo errors.
  • Human readable also means that some can write software that deals with the file just by looking at the file. That is usually not possible with binary data (that can be reverse engineered, too but is MUCH more complicated).
    And the XML in ttx files is considered human readable.

    And about storing OpenType substitution on a glyph level: Isn’t that done like this in fontForge? At least it was the last time I looked at it a few years ago.
  • AbrahamLee
    AbrahamLee Posts: 262
    edited April 2019
    That’s correct, @Georg Seifert. In the source file, the OpenType subtable references exists at the font level, but the details are generally stored at the glyph level. Inside FontForge (the GUI), you can see a combined dataset of the entire table, but you can still see lookups an individual glyph is taking part in when you view its info data. Very helpful if you ask me.
  • Since the end of the 1990s, when the development of DTL FontMaster (FM) started, at the Dutch Type Library we use the IKARUS-based file system. After all, FM is a descendant of the IKARUS system. This file system dates back from the second half of the 1970s and it was enhanced and extended in the course of time, to support OpenType Layout features, for example. However, the structure remained fairly simple:


    More than a decade ago I advocated the file system at the ATypI TypeTech forums in Lisbon and St. Petersburg. This PDF explains a bit how things work. The format is very compact and versatile, but the fact that one can control (and consequently mess up) everything makes the learning curve also a bit steep. As any format it has its flaws: for example, we cannot store composites. These are built during the generation of TrueType fonts, using an entry in the .cha file that contains a reference to the two ‘base’ glyphs. During the design process we circumvent this limitation with an advanced find and research tool. That being said, the storage system does not have to be the same as the format that is used for designing, of course.

    As mentioned, we use the file system for almost 25 years now and I expect it to be used at DTL for the next 25 years (FoundryMaster, the successor of FM, supports the 4-byte glyph databases). The format is public, BTW.
  • Bahman Eslami
    Bahman Eslami Posts: 73
    edited April 2019
    Georg Seifert said:
    That became a very different discussion than I hoped (a very interesting for sure). Can we move that into its own thread?
    Sorry for hijacking but this is still related to the topic and others still can propose their ideas for different parts of the font and use quotes to make it easy to track.
    It's pretty normal in my Indic and SE Asian script projects for lookup order to involve interleaving of lookups from different features, which is the order in which I want the lookups to be applied by the layout engine (talking primarily in terms of post-reordering typographic features).
    Any substitution and positioning on glyph level can be associated with a lookup (with an arbitrary name) on the font level which in turn can be moved up or down. This will give you enough control to determine which one gets executed first. Implementation/compiler should decide if multiple substitution or positioning are allowed to be associated with one shared lookup by a user. This is getting closer to VOLT structure but with some pros and cons:

    + Easier subsetting and transferring glyphs to different fonts
    + Data that needs to be shown in the glyph interface gets more compact and easier to manipulate
    - Some data could be duplicated in glyphs (Storage space is not much of a concern here)

    I don't see any obvious advantages in grouping features with their associated glyphs.
    If you have only worked with Latin or Cyrillic fonts it's great that you like what you have already but that does not mean designers who work with complex scripts can't benefit from a design friendly structure. All developers/designers who haven't worked with the complex script should think about the fact that there is no user interface other than text to manipulate OT features data (VOLT is limited to Windows and most design software are made for Mac). This already sets back designers to engage and see the possibilities. Do you want to design the contours using text? How many contextual or positioning rules you need in your fonts which are not already automated by tools? OT features need a graphical user interface and it needs a better storage method to support that.
    There are some that operate on a single glyph – aalt, smcp and so on – but the result is another glyph. Where would this glyph be stored?
    You're looking at in a completely opposite perspective. A glyph is an object that holds the GSUB and the destination glyph holds the GSUB. If there is a glyph A and there is A.scmp, the substitution is sorted in glyph A.scmp (sub A by A.scmp; fea: smcp). If  A.scmp is removed then the feature is also gone. Other examples are also primitve and could work with similar structure wihout any recursion (e.g. ffi or one.onum hold their own GSUBs). Limitations of compilers and binary formats should not determine how we design fonts. Their structure is not meant for designers and user interfaces are not supposed to follow them.
  • For those interested, some examples of (plain) text files I made for the font production at DTL (I changed the suffixes ‘cha’, ‘fea’, and ‘ufm’ respectively, into ‘txt’ here), which are part of our IKARUS-based file system:

    Encoding (library-wide)
    Encoding (font-specific)
    GSUB (library-wide)
    GSUB (font-specfic)
    GPOS (library-wide)
    Metrics/meta (font-specific)

    My colleagues and friends at URW reworked the AFDKO a bit, so during the generation of fonts the GSUB and GPOS are subsetted (OTM does this subsetting too). Hence, the ‘library-wide’ features files. Of course, in case of a font-specific layout, a font-specific features file can be used, as was the case for DTL Flamande, for example. Also the reworked AFDKO will accept any naming entry from the UFM file.
  • Additionally, background information on the IKARUS format, by Dr. Peter Karow himself, can be found in this PDF version of an offset-printed booklet that was published by Adobe and DTL in 2013.
  • Partly related, slightly overlapping, undoubtedly a bit (or more) off-topic, and additionally somewhat focusing on the production of what are now ‘classic’ cars (in relation to classic typefaces), is this post on Facebook.