It would be nice to get some additional ideas, what font metrics could be interesting, before I rework my existing programs.
First some words about context and intention. My focus is automatic digital reconstruction of old books, mainly about natural history, 17th and 18th century, languages German, English, Latin. This includes OCR and image refinement, for some reasons also automatic reconstruction of fonts. First it can help scientists to have a reconstruction of the original in digital form and switch per click to a modern font. Second it helps to improve the OCR recognition rate, which is a hen and egg problem. With nearly original fonts the training data can be generated automatically. At the moment it's done by manual transcription which is error prone. It's not bad, as the error rate of my models is 0.3% compared to what's usual for average texts ~1800, error rates somewhere in the range 4-7%.
At the moment I use (not in production) font metrics along with image similarity for font identification and glyph clustering.
There are 3 ways to get the data:
1) interpreting the digital font directly
2) rendering the glyphs and measure using image processing
3) taking the measures from a real sample (scan or photo of a page)
For 2) and 3) I can use the same program.
What I measure at the moment:
- top, left, bottom, right (in relation to the baseline)
- descender, x-line, ascender (h-line), H-line
- aspect (height/width)
- density (black/white pixels)
- font size
For reconstruction additional metrics are needed:
- spacing
- kerning if there is a overlap (in metal type it's negative spacing)
What would also be possible to measure:
- stroke-widths
- slant
- distance of diacritics
What else would be interesting? Sometimes information gets a new quality, if available across fonts. E.g. vertical proportions for reading sizes are not very different across fonts. Same for aspect.
1
Comments
For Blackletter (black is another word for bold) it's obvious that the same characters in a modern regular "Swiss" add up to a shorter line, or ugly wide space between the words.
The same problem appears, if a searchable PDF is generated: facsimile as picture with an overlay of text in an "invisible" font. The text can be marked with the mouse. Then some people claim that the length of a word is not same as in the underlaying picture. In a PDF single words can be positioned in the coordinate system. Thus the begin is correct. But different fonts have different run length.
Found only this example where it is nicely visible. In the second line, first word (right to left) shorter, second longer.
No, that’s not the issue I experienced, it was the justification algorithms in layout apps such as Quark and InDesign that caused the difficulty. Even with the same glyph metrics as foundry type, I found it impossible to duplicate hand-set spacing. Often, the word spaces in the originals were far, far larger than anything the software permitted in any of its automatic settings.
I haven’t done any systematic research on this, or investigated Linotype, I’m just relating a problem I faced when making “restoration” revivals, fine-tuning them by attempting to create facsimile settings of pages which contained the original, on which I based my digital versions.
In manual metal typesetting (everything before ~1900) the usual space was 0.25 em, in Unicode U+2005 FOUR-PER-EM SPACE. Then at end of line the remaining space was divided by the typesetter and he inserted spacing material between the words. This sometimes was ugly.
Look at the two last line of the sample. Space between word and comma, different space between the words.
But space between words is something that I see not as metric of a font or glyph. It only makes sense to measure in letterspacing.