Best practices for .null and .notdef?

Helmut Wollmersdorfer · October 2021

AFAIK the original specification of TTF by Apple recommends for

- .notdef: Glyph-ID = 0, Unicode-Value = undefined
- .null: Glyph-ID = 1, Unicode-Value: U+0000

Many fonts define a glyph for .undef with contours (e.g. bordered rectangle), with width and LSB for horizontal advance. Makes sense.

But should .null have a glyph with contours, and width, LSB other than zero?

The special context of my question are so called "invisible" or "glyphless" fonts used for searchable PDF. This a PDF having one image per page and an invisible text overlay per word. E.g. the utility hocr2pdf converts the result of OCR in the popular hOCR format + scanned images into such a PDF using a glyphless font. Tesseract-OCR can output PDF directly using a font called pdf.ttx which contains only .notdef and .null. Other codepoints and glyphs are inserted as needed.

- .notdef: width="0" lsb="0", no contours, no codepoint
- .null: width="1024" lsb="0", contour is a filled rectangle 1024x2048, no codepoint

The definitions of .null makes no sense for me.

Looking into a popular invisible font for hocr2pdf and friends:

- none of the ~650 glyphs has a contour
- .notdef: width="1536" lsb="0", no codepoint
- .null: width="0" lsb="0", codepoints: U+0000, U+0008, U+001d

The mapped codepoints of .null are interesting, because hOCR is an XML-format using xml version="1.0", which only allows U+0009, U+000A, U+000D below U+0020 (=space). If they appear in XML a correct parser should throw an exception.

Makes no sense. But maybe this font is also used for other purposes.

John Hudson · October 2021

NULL* should be zero-width with no contour

*Name should be NULL not .null — .notdef should be the only glyph name in a font that begins with .

The NULL glyph is not really needed in modern fonts, and is regularly not included in CFF OpenType fonts. But I still set the first three glyphs of my sets to match the original Apple TTF spec:

.notdef
NULL (U+0000; zero-width)
CR (U+000D; match /space width)

_____

.notdef should always be the first glyph (GID 0) in the font, and is what a text engine will display for any character in the text string that is unsupported in the font cmap table. This means the .notdef glyph should provide visual feedback to the user that something is wrong. Microsoft recommends using one of three forms for the .notdef glyph:

I favour the second of these, sometimes reverse out, i.e. a white ? on a black box, as it stands out better in text.

The width and proportions of the .notdef glyph can be whatever you like. I typically align the height to the cap-height of a font, and space it so that a row of .notdef glyphs forms a box:

Best practices for .null and .notdef?

Comments

Categories