There exist many historical fonts crafted by amateurs using questionable techniques like:
- misusing code points (e.g. \$ for longs)
- precomposed ligature glyphs with code points in the PUA (not following the MUFI standard)
- names not always following best practises
- no feature rules
- overdone, problematic feature rules (e.g. orthography for longs with all irregularities, but it is not possible to render a single longs between spaces)
- not supporting base character + combining as preferred Unicode encoding, where Unicode has no precomposed code point (e.g. [AOUaou] + combining e above.
- normalise code points (for PUA use MUFI)
- normalise names of characters and variants
- support ZWJ and ZWNJ for explicit control of ligatures, parallel to hlig
- similar for fractions
- remove orthographic rules, resp. exchange all ligature and substitution rules by a basic standardised rule set
Main purpose is the training of OCR systems. Perfect quality doesn't matter as the the training files are usually created by rendering sample images from hundreds of fonts for a language/period, and the images are also artificially degraded in variations. Also font identification can be trained to some degree. Reconstructed fonts are always digitised from a specific optical size, e.g. 12 or 16 pt, which are different in Fraktur. The larger the more swashed are capital letters. Often they are reconstructed from a reconstructed cut (late 19th century or Linotype customisation).
My workflow is mostly developed in Perl, which has poor or no support for reading and manipulating OTF files.
But it's easy to work with XML (ttx). It's also easy to render specimens of characters and identify them optically with a relative high accuracy for unidentified glyphs in a font. Some remaining can be done manually.
Planned steps are roughly:
- craft a standard set of rules in fea syntax for historical German
For each font:
- convert to ttx
- normalise code points and glyph names
- render and identify remaining glyphs
- report what's missing for manual solving or accept it
- remove feature rules and apply the standard set
- compile the repaired font
Doesn't mean that this an easy development task done in one day. Some problems with the Python font libraries fonttools
by @Simon Cozens
can be expected as they maybe do not support some corner cases of this use case.