Best practices for .null and .notdef?
 
            
                
                    Helmut Wollmersdorfer                
                
                    Posts: 212                
            
                        
            
                    AFAIK the original specification of TTF by Apple recommends for 
- .notdef: Glyph-ID = 0, Unicode-Value = undefined
- .null: Glyph-ID = 1, Unicode-Value: U+0000
Many fonts define a glyph for .undef with contours (e.g. bordered rectangle), with width and LSB for horizontal advance. Makes sense.
But should .null have a glyph with contours, and width, LSB other than zero?
The special context of my question are so called "invisible" or "glyphless" fonts used for searchable PDF. This a PDF having one image per page and an invisible text overlay per word. E.g. the utility hocr2pdf converts the result of OCR in the popular hOCR format + scanned images into such a PDF using a glyphless font. Tesseract-OCR can output PDF directly using a font called pdf.ttx which contains only .notdef and .null. Other codepoints and glyphs are inserted as needed.
- .notdef: width="0" lsb="0", no contours, no codepoint
- .null: width="1024" lsb="0", contour is a filled rectangle 1024x2048, no codepoint
The definitions of .null makes no sense for me.
Looking into a popular invisible font for hocr2pdf and friends:
- none of the ~650 glyphs has a contour
- .notdef: width="1536" lsb="0", no codepoint
- .null: width="0" lsb="0", codepoints: U+0000, U+0008, U+001d
The mapped codepoints of .null are interesting, because hOCR is an XML-format using xml version="1.0", which only allows U+0009, U+000A, U+000D below U+0020 (=space). If they appear in XML a correct parser should throw an exception.
Makes no sense. But maybe this font is also used for other purposes.
                        
- .notdef: Glyph-ID = 0, Unicode-Value = undefined
- .null: Glyph-ID = 1, Unicode-Value: U+0000
Many fonts define a glyph for .undef with contours (e.g. bordered rectangle), with width and LSB for horizontal advance. Makes sense.
But should .null have a glyph with contours, and width, LSB other than zero?
The special context of my question are so called "invisible" or "glyphless" fonts used for searchable PDF. This a PDF having one image per page and an invisible text overlay per word. E.g. the utility hocr2pdf converts the result of OCR in the popular hOCR format + scanned images into such a PDF using a glyphless font. Tesseract-OCR can output PDF directly using a font called pdf.ttx which contains only .notdef and .null. Other codepoints and glyphs are inserted as needed.
- .notdef: width="0" lsb="0", no contours, no codepoint
- .null: width="1024" lsb="0", contour is a filled rectangle 1024x2048, no codepoint
The definitions of .null makes no sense for me.
Looking into a popular invisible font for hocr2pdf and friends:
- none of the ~650 glyphs has a contour
- .notdef: width="1536" lsb="0", no codepoint
- .null: width="0" lsb="0", codepoints: U+0000, U+0008, U+001d
The mapped codepoints of .null are interesting, because hOCR is an XML-format using xml version="1.0", which only allows U+0009, U+000A, U+000D below U+0020 (=space). If they appear in XML a correct parser should throw an exception.
Makes no sense. But maybe this font is also used for other purposes.
0          
            Comments
- 
            NULL* should be zero-width with no contour
 *Name should be NULL not .null — .notdef should be the only glyph name in a font that begins with .
 The NULL glyph is not really needed in modern fonts, and is regularly not included in CFF OpenType fonts. But I still set the first three glyphs of my sets to match the original Apple TTF spec:
 .notdef
 NULL (U+0000; zero-width)
 CR (U+000D; match /space width)
 _____
 .notdef should always be the first glyph (GID 0) in the font, and is what a text engine will display for any character in the text string that is unsupported in the font cmap table. This means the .notdef glyph should provide visual feedback to the user that something is wrong. Microsoft recommends using one of three forms for the .notdef glyph: 
 I favour the second of these, sometimes reverse out, i.e. a white ? on a black box, as it stands out better in text.
 The width and proportions of the .notdef glyph can be whatever you like. I typically align the height to the cap-height of a font, and space it so that a row of .notdef glyphs forms a box: 
 11
Categories
- All Categories
- 46 Introductions
- 3.9K Typeface Design
- 485 Type Design Critiques
- 560 Type Design Software
- 1.1K Type Design Technique & Theory
- 654 Type Business
- 852 Font Technology
- 29 Punchcutting
- 519 Typography
- 119 Type Education
- 323 Type History
- 77 Type Resources
- 112 Lettering and Calligraphy
- 33 Lettering Critiques
- 79 Lettering Technique & Theory
- 549 Announcements
- 91 Events
- 114 Job Postings
- 170 Type Releases
- 173 Miscellaneous News
- 276 About TypeDrawers
- 54 TypeDrawers Announcements
- 120 Suggestions and Bug Reports
