Variants of the decimal point

Hi,

in a sample of Austrian newspapers 1850-1911 different styles of decimal points appear:

1) 21˙3 dot aligned with top of the figures, e. g. temperature
2) 21˙³³ dot aligned with top of the figures followed by superscript, e. g. exchange rates
3) 17·12, 17·— dot aligned near the hight of a dash, e. g. prices
4) 21.3 dot aligned with baseline
5) 21.₇₇ dot aligned with baseline followed by subscript, e. g. exchange rates

Assuming a moderate level of transcription, i. e. as near as possible to the original typography and orthography but no use of PUA, it's questionable which codepoints to use.

Here are the candidates I found so far in Unicode Version 12:

'.'  U+002E  FULL STOP (Other_Punctuation)
'·'  U+00B7  MIDDLE DOT (Other_Punctuation)

'˙'  U+02D9  DOT ABOVE (Modifier_Symbol)
'·'  U+0387  GREEK ANO TELEIA (Other_Punctuation)
'᛫'  U+16EB  RUNIC SINGLE PUNCTUATION (Other_Punctuation)
'․'  U+2024  ONE DOT LEADER (Other_Punctuation)
'‧'  U+2027  HYPHENATION POINT (Other_Punctuation)
'∙'  U+2219  BULLET OPERATOR (Math_Symbol)
'⋅'  U+22C5  DOT OPERATOR (Math_Symbol)
'⸱'  U+2E31  WORD SEPARATOR MIDDLE DOT (Other_Punctuation)
'⸳'  U+2E33  RAISED DOT (Other_Punctuation)
'・' U+30FB  KATAKANA MIDDLE DOT (Other_Punctuation)
'ꞏ'  U+A78F  LATIN LETTER SINOLOGICAL DOT (Other_Letter)

MIDDLE DOT appears frequently in current and old typography and is available in most fonts.

But I hesitate to use DOT ABOVE, because it's a modifier symbol. 

In a text format allowing activation of OT features that's a possible solution.

I wonder a little bit, that subscript and superscript code points exist in Unicode, but no sub/sup punctuation. Did I miss something?

Apart from the appropriate encoding the second question is, which features to use in case of font reconstruction. In case 2) the figures after the dot are not positioned like the usual sups but aligned at the top, and in case 5) the bottom of the subs is at baseline.

Comments

  • John HudsonJohn Hudson Posts: 1,776
    I note that U+1BC84 DUPLOYAN AFFIX HIGH DOT is annotated as 'French number thousands'. I've not been able to track down any references for French using a high dot as a thousands separator, but possibly this was an historical usage? In that case, the same codepoint might be suitable for historical Austrian decimal separator. Unfortunately the Unicode website is down at the moment, so I can't investigate further.

    re. U+02D9  DOT ABOVE this might work, but would depend on the height in particular fonts. This character is used for Mandarian Chinese tone 5, which is a high tone. I'd be inclined to align the dot to the top of the cap height, but some people might make it higher.
  • I don't know that I’d want to use codepoints from the Duployan block for things set in standard type.

    With the possible exception of the mid dot, My instinct would be to encode most of these using just the standard 0x002E codepoint with subscript or superscript features applied.
  • John SavardJohn Savard Posts: 521
    edited March 11
    I had seen the middle dot used in some old books from the UK. I had thought Austria used the comma, like most of Continental Europe, so this is interesting news. U+00B7 is, of course, the obvious choice for the middle dot, but it's interesting that Unicode has no good choice for the dot aligned with the top line. (Actually, U+2E33 would be a good codepoint, it seems, from its name, even if it didn't print here as the right character.)
  • John Hudson Thanks, didn't know U+1BC84 DUPLOYAN AFFIX HIGH DOT so far.

    From the annotation in Unicode 

    DUPLOYAN AFFIX HIGH DOT

    • French number thousands

    • French suffix -eur

    • Romanian affix trans-/-lui

    • not Romanian hundreds - use 0307 

      combining dot above and 0308̈ combining

      diaeresis
      → 02D9  dot above

    it's not clear if it's French use for number thousands is meant in general French typography or only within the Deployan shorthand also used for Chinook Jargon, a nearly extinct American indigenous language. But I understand it as abbreviation sign and not as separator.

    The vertical alignment is interesting. Superscript numbers were used very early in historic typography for footnotes. But they used smaller fonts and raised them by non printing material. Later they casted extra ones for speed of typesetting, and also special use in fractions, tables, calendars, train schedules.

    In current fonts and revivals many don't have subscripts and the alignment is also different. Hard to find good fonts for transcriptions. 

    Some examples:


  • John SavardJohn Savard Posts: 521
    While Duployan Affix High Dot looks right, if it's an affix, doesn't that mean that it's also a combining form, even if it's off to the side instead of on top or on the bottom?
  • I had seen the middle dot used in some old books from the UK. I had thought Austria used the comma, like most of Continental Europe, so this is interesting news. U+00B7 is, of course, the obvious choice for the middle dot, but it's interesting that Unicode has no good choice for the dot aligned with the top line. (Actually, U+2E33 would be a good codepoint, it seems, from its name, even if it didn't print here as the right character.)
    Using a dot also surprised me. But usually my focus is on historical scientific books in the field of natural history. They seldom contain decimal fractions, more old measures like rod, Klafter (~2 yards), foot ', inch '', part (1/4 inch), line ''' (1/10 inch).

    U+2E33 RAISED DOT according to Unicode is "glyph position intermediate between U+002E . [full stop] and U+00B7 · [middle dot]".

  • John HudsonJohn Hudson Posts: 1,776
    edited March 11
    While Duployan Affix High Dot looks right, if it's an affix, doesn't that mean that it's also a combining form, even if it's off to the side instead of on top or on the bottom?
    Unicode doesn't classify the Duployan affixes as combining marks. But quite probably it isn't appropriate for this use, and Helmut is right that it was perhaps used as an abbreviation sign rather than a seperator.

    Unicode has a problem with dots. It encodes quite a lot of individual and patterns of dots in various scripts, and the UTC is very hesitant to encode any more. So whenever someone finds a new use of a dot and proposes to encode it, the standard response now is 'Couldn't you use one of the existing dot characters?' So it's very likely that, over time, various dots will accumulate additional annotations and recommended uses, regardless of in which block they are encoded.

  • John SavardJohn Savard Posts: 521
    edited March 11
    U+2E33 RAISED DOT according to Unicode is "glyph position intermediate between U+002E . [full stop] and U+00B7 · [middle dot]".

    Oh. In that case, it's not an option.


    Unicode has a problem with dots. It encodes quite a lot of individual and patterns of dots in various scripts, and the UTC is very hesitant to encode any more. So whenever someone finds a new use of a dot and proposes to encode it, the standard response now is 'Couldn't you use one of the existing dot characters?' So it's very likely that, over time, various dots will accumulate additional annotations and recommended uses, regardless of in which block they are encoded.


    In general, this is not an unreasonable policy. However, as this thread highlights, Unicode does not appear to have a good alternative for a dot that can be depended on to have a glyph which lines with the tops of characters. And yet that would seem to be a fairly basic character.

    Also, I don't see how "superscript" and "subscript" will help with this. I suppose using superscript on a center dot might end up moving it somewhere vaguely near the top of a character, but that's not the reference point for its location - it is moved to the center of a superscript character, wherever that might be.

    As an emergency stopgap that is better than nothing, yes, it might be considered, but only as that.

    As well, these example Austrian texts highlight another deficiency in our current word processing software. We have superscripts. We have subscripts. And we can also change the point size of type, which will keep the baseline constant.

    So it's easy to do style 5 of decimal point; 21. followed by 77 in a smaller point size. To do style 2, we don't just need a dot at the top of the character, we also need a way to say "reduced size characters, aligned at cap height instead of at the baseline". That's more fundamental than just adding a character to Unicode.

  • With the possible exception of the mid dot, My instinct would be to encode most of these using just the standard 0x002E codepoint with subscript or superscript features applied.
    Your "instinct" has more pros than cons. My rule of thumb is, if not sure which codepoint to use, take the one with the lower value, i. e. 0x002E (ASCII range) available in any font and any keyboard.
  • John HudsonJohn Hudson Posts: 1,776
    I'd be cautious about relying on superscript styling to get an appropriate specific height and size of the period (full stop) dot. Some fonts are going to include a .sups variant of the period that will be scaled, weighted, and aligned to superior letters and numbers, and alogorithms to fake superiors will tend to align similarly.

Sign In or Register to comment.