Best practices for .null and .notdef?
Helmut Wollmersdorfer
Posts: 212
AFAIK the original specification of TTF by Apple recommends for
- .notdef: Glyph-ID = 0, Unicode-Value = undefined
- .null: Glyph-ID = 1, Unicode-Value: U+0000
Many fonts define a glyph for .undef with contours (e.g. bordered rectangle), with width and LSB for horizontal advance. Makes sense.
But should .null have a glyph with contours, and width, LSB other than zero?
The special context of my question are so called "invisible" or "glyphless" fonts used for searchable PDF. This a PDF having one image per page and an invisible text overlay per word. E.g. the utility hocr2pdf converts the result of OCR in the popular hOCR format + scanned images into such a PDF using a glyphless font. Tesseract-OCR can output PDF directly using a font called pdf.ttx which contains only .notdef and .null. Other codepoints and glyphs are inserted as needed.
- .notdef: width="0" lsb="0", no contours, no codepoint
- .null: width="1024" lsb="0", contour is a filled rectangle 1024x2048, no codepoint
The definitions of .null makes no sense for me.
Looking into a popular invisible font for hocr2pdf and friends:
- none of the ~650 glyphs has a contour
- .notdef: width="1536" lsb="0", no codepoint
- .null: width="0" lsb="0", codepoints: U+0000, U+0008, U+001d
The mapped codepoints of .null are interesting, because hOCR is an XML-format using xml version="1.0", which only allows U+0009, U+000A, U+000D below U+0020 (=space). If they appear in XML a correct parser should throw an exception.
Makes no sense. But maybe this font is also used for other purposes.
- .notdef: Glyph-ID = 0, Unicode-Value = undefined
- .null: Glyph-ID = 1, Unicode-Value: U+0000
Many fonts define a glyph for .undef with contours (e.g. bordered rectangle), with width and LSB for horizontal advance. Makes sense.
But should .null have a glyph with contours, and width, LSB other than zero?
The special context of my question are so called "invisible" or "glyphless" fonts used for searchable PDF. This a PDF having one image per page and an invisible text overlay per word. E.g. the utility hocr2pdf converts the result of OCR in the popular hOCR format + scanned images into such a PDF using a glyphless font. Tesseract-OCR can output PDF directly using a font called pdf.ttx which contains only .notdef and .null. Other codepoints and glyphs are inserted as needed.
- .notdef: width="0" lsb="0", no contours, no codepoint
- .null: width="1024" lsb="0", contour is a filled rectangle 1024x2048, no codepoint
The definitions of .null makes no sense for me.
Looking into a popular invisible font for hocr2pdf and friends:
- none of the ~650 glyphs has a contour
- .notdef: width="1536" lsb="0", no codepoint
- .null: width="0" lsb="0", codepoints: U+0000, U+0008, U+001d
The mapped codepoints of .null are interesting, because hOCR is an XML-format using xml version="1.0", which only allows U+0009, U+000A, U+000D below U+0020 (=space). If they appear in XML a correct parser should throw an exception.
Makes no sense. But maybe this font is also used for other purposes.
0
Comments
-
NULL* should be zero-width with no contour
*Name should be NULL not .null — .notdef should be the only glyph name in a font that begins with .
The NULL glyph is not really needed in modern fonts, and is regularly not included in CFF OpenType fonts. But I still set the first three glyphs of my sets to match the original Apple TTF spec:
.notdef
NULL (U+0000; zero-width)
CR (U+000D; match /space width)
_____
.notdef should always be the first glyph (GID 0) in the font, and is what a text engine will display for any character in the text string that is unsupported in the font cmap table. This means the .notdef glyph should provide visual feedback to the user that something is wrong. Microsoft recommends using one of three forms for the .notdef glyph:
I favour the second of these, sometimes reverse out, i.e. a white ? on a black box, as it stands out better in text.
The width and proportions of the .notdef glyph can be whatever you like. I typically align the height to the cap-height of a font, and space it so that a row of .notdef glyphs forms a box:
11
Categories
- All Categories
- 43 Introductions
- 3.7K Typeface Design
- 806 Font Technology
- 1.1K Technique and Theory
- 623 Type Business
- 446 Type Design Critiques
- 543 Type Design Software
- 30 Punchcutting
- 137 Lettering and Calligraphy
- 84 Technique and Theory
- 53 Lettering Critiques
- 489 Typography
- 304 History of Typography
- 115 Education
- 70 Resources
- 500 Announcements
- 80 Events
- 105 Job Postings
- 149 Type Releases
- 165 Miscellaneous News
- 271 About TypeDrawers
- 53 TypeDrawers Announcements
- 117 Suggestions and Bug Reports