It would be nice to get some additional ideas, what font metrics could be interesting, before I rework my existing programs.
First some words about context and intention. My focus is automatic digital reconstruction of old books, mainly about natural history, 17th and 18th century, languages German, English, Latin. This includes OCR and image refinement, for some reasons also automatic reconstruction of fonts. First it can help scientists to have a reconstruction of the original in digital form and switch per click to a modern font. Second it helps to improve the OCR recognition rate, which is a hen and egg problem. With nearly original fonts the training data can be generated automatically. At the moment it's done by manual transcription which is error prone. It's not bad, as the error rate of my models is 0.3% compared to what's usual for average texts ~1800, error rates somewhere in the range 4-7%.
At the moment I use (not in production) font metrics along with image similarity for font identification and glyph clustering.
There are 3 ways to get the data:
1) interpreting the digital font directly
2) rendering the glyphs and measure using image processing
3) taking the measures from a real sample (scan or photo of a page)
For 2) and 3) I can use the same program.
What I measure at the moment:
- top, left, bottom, right (in relation to the baseline)
- descender, x-line, ascender (h-line), H-line
- aspect (height/width)
- density (black/white pixels)
- font size
For reconstruction additional metrics are needed:
- kerning if there is a overlap (in metal type it's negative spacing)
What would also be possible to measure:
- distance of diacritics
What else would be interesting? Sometimes information gets a new quality, if available across fonts. E.g. vertical proportions for reading sizes are not very different across fonts. Same for aspect.