Visible BOM

The SIL Recommended Characters for Non-Roman Fonts (https://scriptsource.org/entry/gg5wm9hhd3) suggests, for U+FEFF ZERO WIDTH NO-BREAK SPACE, that: "BYTE ORDER MARK, making this visible might be helpful".

What glyph should I use?

I currently have a glyph with no contours or advance width. To "make it visible", would I use:
(A) a dotted-box character with "ZWNBSP" in it (as shown in the Standard), or
(B) something like a vertical carat character as might be shown in a word processor?

Or ... could someone suggest a method (Windows-MS Office-based, preferably) that I could test this? I.E. create a scenario that would show U+FEFF as a visible character? I just don't know if all such apps will use their won concocted shape or if some of them might use the font's glyph if you "show invisibles" ...

Comments

  • MS Word (Office 365, Win10x64) seems to compose its own character, regardless of the font in use. With ...

    File => Options => Display => Show All Formatting Marks or
    File => Options => Display => Optional Breaks

    ... turned on, U+FEFF show up as:

    ... so the character does appear to be composed based on some font metric.

    I surveyed every installed font I have, and no font shows a glyph with contours. They are all either blank or default to .notdef.

  • I also see that the SIL Recommended Characters for Non-Roman Fonts document calls for setting the 16 BMP Variation Selector code points to an identifiable glyph (long rationale, don't really understand it). So I'm setting those 16 code points to glyphs like:



    and setting the BOM / ZWNBSP at U+FEFF to:

     

    ...
    I hope this is not a bad idea ... could anyone offer advice on the Recommendation of using visible glyphs for code point that typically render blank / empty?

    Direct link to the document: 
    https://scriptsource.org/entry/gg5wm9hhd3
  • John Hudson
    John Hudson Posts: 3,190
    Are you making fonts for scripts that have variation selector sequences specified by Unicode? If not, then there really isn't a good reason to include those characters in your fonts.

    If you are making fonts that include support for Unicode variation selector sequences, then it isn't critical whether you provide visible VS glyphs or not, because variation selector sequences are processed using a format 14 cmap table before any glyphs are painted, and a font doesn't need to contain any glyph at all for a variation selector character to be referenced in the format 14 cmap. If correctly handled, valid variation selector characters mapped to sequences in the format 14 subtable will never be displayed. A visible VS glyph would only display for invalid or unsupported sequences, or possibly in text editing modes that display formatting characters (in which case they could be displayed with generic symbols at the app level).
  • So here's the relevant text from Recommendations document. I think it addresses the situation of issuing a font today with no format 14 cmap table and how it handles text with legal (but atypical) embedded VS sequences.

    If I'm reading this correctly, the machinery for implementing (and testing) the recommended strategy (additional glyphs, triggered with OT features, and generating quixotic bytestreams with embedded VS characters in non-standard locations) seems daunting for this far-off scenario. 

    Excerpt from Direct: https://scriptsource.org/entry/gg5wm9hhd3

    Variation SelectorsFE00.FE0F

    We recommend that all fonts include support for Unicode variation selectors, even if the characters supported by a font don't combine with VSs — in fact, especially if they don't — i.e. add them to the cmap and point them to null glyphs.

    The reason is this: it's possible that at some point in the future a VS mapping could be defined for potentially any character in Unicode. It's not all that likely to happen for the characters in the standard now, but there is no way in principle to guarantee it. If at some point in the future text started appearing with VSs where you didn't expect them before (e.g. a VS within Cyrillic script), then you wouldn't want people (using your previously existing fonts) to suddenly start seeing boxes (or whatever is used to represent unsupported glyphs). Of course, your font would not display the variant glyph they would like to see, but the font would still display something legible.

    Related to this, people need a way on occasion to see hidden control characters such as VSs, ZWJs, viramas and other similar characters. All fonts should include control picture glyphs for all of these ... and there should be an OT feature that turns off any shaping based on these controls and causes these control pictures to be displayed.
  • John Hudson
    John Hudson Posts: 3,190
    I think SIL are overthinking this. It isn't necessarily the case the software encountering a VS character will display a .notdef glyph if the font doesn't contain a glyph (visible or otherwise), because as noted a font doesn't need to contain a VS glyph in order for a format 14 cmap to be valid and work. In the same way, software shouldn't need to display any VS character except in specific text editing situations, and those can be supported without the need of a visible glyph in the font.

    I've included visible glyphs for various control characters in fonts for many years — based on the mechanism developed by Microsoft to display these in text editing situations where such characters are displayed —, and I've come to the conclusion that the whole paradigm is a bad idea. The display of control and formatting characters in text editing should be standardised, not font dependent.
  •  — based on the mechanism developed by Microsoft to display these in text editing situations where such characters are displayed —
    Extremely helpful, John!

    Could you provide a pointer to the Microsoft mechanism? ... or the name of what specifically you are referring to ??
  • ClintGoss Some programs can show invisibles.

    E. g. for the string <space>a<tab>b<tab>c<nl>

    TextWrangler/BBEdit:



    LibreOffice:



    This works on application level. There is no standard AFAIK. You should provide many punctuation characters and symbols, if you work on a text or book font.

    Some also use these characters for display of ZWJ, ZWNJ and other non spacing characters:



    Also this one (not sure, if the correct code point) for display of single (e. g. start of line) combining characters:



  • Very helpful, Helmut! U+25CC is in the SIL Recommendations, but not some of the others you point out.

    It would be great to have a central resource of these recommendations ... First place I thought of was the Font Development Best Practices  resource (https://silnrsi.github.io/FDBP/en-US/index.html), but I could not find much there on characters to include.
  • Sorry, forgot where I read it. I only remember, that I tried it in BBEdit and others, but they didn't display LRM or RLM on View -> Text Display -> Show Invisibles. They just ignore them completely.

    Look into one of the big console fonts (Consolas, Courier, Menlo, Monaco, Noto), how they handle it. The above is from Consolas.

    My uni utility on the command line does not display them, because it exchanges them against U+FFFD REPLACEMENT CHARACTER:




  • Another way of looking at it is this:

    An application developer can be pretty sure that most fonts do not have visible characters in these slots. So, when their programmers add a feature to show formatting codes, they are not going to rely on or assume that every arbitrary font has these characters. Instead, they’ll use some standard system font or their own internal font for the purpose. Likely they do this across the board for all fonts, because that’s simplest and provides a consistent user experience.

    But if they wanted to get fancy, they could check for a marking glyph from the current font, and use it if present. I can’t say how rare that is without testing, but I can guarantee it is at least uncommon. It’s extra work for minimal gain.

    Of course, less well-developed apps may make bad assumptions. But they’ll be a small minority of the app usage of most font users.
  • John Hudson
    John Hudson Posts: 3,190
    edited February 2020
    Could you provide a pointer to the Microsoft mechanism?
    See Helmut's reply and examples. Apps like MS Word and Libre Office have an editing mode that displays formatting characters. To support this, Microsoft recommended having visible glyphs for e.g. Zero-Width Non-Joiner, the display of which is normally suppressed in text, but is made visible in editing mode.

    The trouble with this is that a) it is inconsistent, and some formatting characters still need to have invisible glyphs (e.g. Combining Grapheme Joiner); and b) it is inconsistent because not all font makers use the same glyph shapes for formatting characters, or make them visible at all. I think the editing mode depictions of formatting characters should be standardised — at the app level if not at the system level —, so users have a consistent editing experience regardless of what font is used.