Best practices for .null and .notdef?

AFAIK the original specification of TTF by Apple recommends for

- .notdef: Glyph-ID = 0, Unicode-Value = undefined
- .null: Glyph-ID = 1, Unicode-Value: U+0000

Many fonts define a glyph for .undef with contours (e.g. bordered rectangle), with width and LSB for horizontal advance. Makes sense.

But should .null have a glyph with contours, and width, LSB other than zero?

The special context of my question are so called "invisible" or "glyphless" fonts used for searchable PDF. This a PDF having one image per page and an invisible text overlay per word. E.g. the utility hocr2pdf converts the result of OCR in the popular hOCR format + scanned images into such a PDF using a glyphless font. Tesseract-OCR can output PDF directly using a font called pdf.ttx which contains only .notdef and .null. Other codepoints and glyphs are inserted as needed.

- .notdef: width="0" lsb="0", no contours, no codepoint
- .null: width="1024" lsb="0", contour is a filled rectangle 1024x2048, no codepoint

The definitions of .null makes no sense for me.

Looking into a popular invisible font for hocr2pdf and friends:

- none of the ~650 glyphs has a contour
- .notdef: width="1536" lsb="0", no codepoint
- .null: width="0" lsb="0", codepoints: U+0000, U+0008, U+001d

The mapped codepoints of .null are interesting, because hOCR is an XML-format using xml version="1.0", which only allows U+0009, U+000A, U+000D below U+0020 (=space). If they appear in XML a correct parser should throw an exception.

Makes no sense. But maybe this font is also used for other purposes.


Sign In or Register to comment.