Punctuation Space U+2008

Out of all the encoded fixed-width spaces, this is the one whose purpose completely eludes me.

The Unicode Standard defines it thus: “U+2008 punctuation space is a space defined to be the same width as a period.” [Version 8.0 p. 266]

I have seen Adobe documentation about their Insert White Space > Punctuation Space (which does indeed insert the U+2008 codepoint) described as “Same width as an exclamation point, period, or colon in the typeface” — which is, of course, somewhat arbitrary, given that in any given font those three punctuation marks are not necessarily going to have the same advance width.

Does anyone know precisely why the Unicode Consortium chose to encode this space and what the proposed usage was that made this a convincing case for inclusion?

I have my own thoughts about what such a space could/should actually be useful for, practically speaking; but they don’t necessarily coincide with the minimal definition, and I have no reason to believe that others would expect it to be designed for such usage.

Tagged:

Comments

  • I’ve always understood its use to be practical across tables – for example, in Norwegian numbers above the hundreds are divided with a space instead of a period or comma. But that’s me reading into how it would make sense to use it, rather than me interpreting the spec. I’m curious as well to hear what the intentions are.
  • Hrant Հրանդ Փափազեան Papazian
    edited February 2017
    It sounds like they just wanted to have a space that was always a predictable width. (And Rob'in's Norwegian example is rather convincing.)

    The Adobe description though is indeed wishy-washy.
  • Khaled Hosny
    Khaled Hosny Posts: 289
    edited February 2017
    From Wikitionary:

    Etymology

    Used to give consistent presentation of quotation marks irrespective of comma or period placement.

    Noun

    punctuation space sg

    1. (in dated typography) A space of non-variable width: ⊣ ⊢ equal to the width of a period (full stop) or comma, inserted after ⟨“⟩ or ⟨‘⟩ and before ⟨”⟩ or ⟨’⟩ (and sometimes ⟨?⟩ or ⟨!⟩), unless a punctuation mark occurs there. It is Unicode character U+2008. When using X11 input method, the conventional key combination is Compose+Space+Period.
      Many examples occur on page 204 of the 1837 New Sporting Magazine XIII.
    No idea how accurate is that, though.

  • Kent Lew
    Kent Lew Posts: 937
    Khaled — I saw that. But I’m not inclined to take this at face value. For one thing, the 1837 example cited is not really an example of a specific “punctuation” space. That’s kinda just how English was typeset back then — with various spaced punctuation marks and double word-spaces between sentences, etc. (Even the positions described in that definition are not non-variable in that 1837 example; some spaces after an opening quotation mark are larger than others, for instance.)

    But I can imagine the Unicode punctuation space being intended for use in traditions that feature some spaced punctuation marks, like with some French punctuation. However, I doubt that any French typographers view that codepoint in that way. I believe they tend to have their own conventions, using other fixed-width spaces, like the thin space.

    If that was indeed the intention of the punctuation space, it would be nice to have it confirmed in some Unicode documentation. And have French experts confirm that the width of a period is indeed the proper reference (which I kinda doubt, but who am I to say).
  • Kent Lew
    Kent Lew Posts: 937
    Rob'in — That is basically another of the potential practical uses I can imagine. But, as you say, that’s reading into things a bit.

    To expand upon this a little: to be really practical for tabular use, in my opinion, such a space would be fixed across styles (RIBBI at least), which is what I personally advocate for tabular figures. At which point, if it is going to align with default period and comma, that means those punctuation would also have to be duplexed across RIBBI styles, which begins to place an unnatural constraint on them for regular, non-tabular purposes.

    It is not uncommon for fonts to feature separate tabular punctuation for such purposes, often on half-tabular width. In which case, such a half-tabular space could be encoded as the U+2008, I suppose. But I’m not sure the value of having that encoded, given that the tabular punctuation are not separately encoded. Usually these are just deployed with an OTL feature.

    Which brings me back to my question: What is the real purpose for which Unicode enshrined such a space?
  • Bhikkhu Pesala
    Bhikkhu Pesala Posts: 210
    edited February 2017
    I believe that the practice in French typography is to use a thin space or punctuation space before a semicolon or colon. A regular space is unsuitable because a line-break can occur there. A non-breaking space will often be too big, so a punctuation space or thin space is best. 

    It costs nothing to include all available Unicode spaces from em-space to zero-width space in one's fonts, so I usually do. It's a simple cut and paste operation, with some minor adjustments needed for figure space and punctuation space. 

    In practice, in my publications I sometimes use the zero-width space or the hair space, but have not used the punctuation space much, if at all. 

    Looking at my fonts, I probably don't always remember to adjust for the font. Garava is my base font, which I use for testing FontCreator, and learning how to design OpenType features etc. For that, the spaces are all what they should be:

    en-space = 1024 
    em-space = 2048 (2048 funits per em)
    3 per em = 683
    4 per em 512
    6 per em = 341
    figure space = 1014 (tabular figures for Garava Regular)
    punctuation space = 557 (comma, period, colon and semi-colon are all the same advance width)
    thin space = 410 (5 per em)
    hair space = 128 (16 per em)
    zero width = 0
  • Kent Lew
    Kent Lew Posts: 937
    Thanks, Si.

    Interesting that the XCCS definition provides no particular reference for the fixed width itself (not referencing a period or comma or anything).

    I read “nonprinting” as meaning that there is no printing portion of this character — i.e., it is an encoded white space.

    Not sure what to make of “device dependent.” Presumably the fixed-width spaces that are defined as rational increments of the em — e.g., en space, four-to-em space, et al. — are device independent. So, what’s so device-dependent about a punctuation space and who used it for what?

    Curiouser and curiouser.