Special dash things: softhyphen, horizontalbar
softhyphen (uni00AD), aka shy or ­. The Microsoft Standards have a section on this which reads like use of this is highly application-specific. Is it wise to include this character (I presume, as a component of the regular hyphen*), as it is (or can be) the thing used/displayed for automatic hyphenation in page layout applications? Any other reasons? Do browsers’ implementation of hyphenation display this glyph, or are they fine with the base hyphen glyph?
horizontalbar (uni2015). This is identical to the emdash in some fonts, and slightly shorter in others. I’m not quite clear on what this does. I’ve seen it referred to as a “quotation dash”, would that suggest anything in particular for its design? Wondering about the usefulness of supporting this at all, and any input on design parameters (including the need/usefulness of a case-sensitive form).
Thanks!
Comments
-
I might like to see horizontalbar as a variant of the emdash: either with the same black body but connecting (if the emdash doesn't, otherwise not connecting); or a longer black body (shorter if the emdash does connect) but with the same set width.
The softhyphen sounds like it should be identical to the hyphen. Unless it's for Armenian then make it a yentamna. :->1 -
Soft Hyphen = Discretionary Hyphen.
In InDesign: <Type> <Insert Special Character> <Hyphens> <Discretionary Hyphen>
When you insert it in a word, it looks like nothing happens. But when that word breaks between lines, then the hyphen appears at the pre-determined position.
I would assume an extra, special character is required, so that the text can be copied to other applications with the same breaking stipulation.2 -
Horizontal Bar
Comments: quotation dash
long dash introducing quoted text
In my fonts, I make it 150% of the em-dash. In the Unicode charts it is the same size as the em-dash. I don't see much point in including it in fonts if it's identical to the em-dash. Give the user options.1 -
My preferred approach to U+00AD softhyphen is the same as U+00A0 nbspace — which is to say, I prefer to double-encode them with their default references hyphen and space.
These two are unique characters whose distinct properties I feel should be properly handled at a higher level than the font: the discretionary character of the softhyphen and the non-breaking character of the nbspace are characteristics that should be managed at the layout level. The glyphs themselves should be the exact same as the hyphen and space in every other way. I would guess that the fact that they are encoded at all is a legacy residue.
In practical reality, I believe most most modern apps treat them this way. If these codepoints are not present, then the renderer should use the U+002D hyphen and U+0020 space. In some apps, the renderer does so regardless. However, in some apps, the renderer uses the mapped glyph if the 00AD/00A0 codepoints are actually present. And if the glyphs differ from the 002D/0020 glyphs, then unexpected behavior can happen.
If you double-encode then you don’t need to be concerned with .case versions of U+00AD, since the hyphen glyph will be utilized and still be a proper target for any existing relevant GSUBs.
I wouldn’t even really bother with the codepoints in <cmap> except that I suspect there are a number of validators that would flag a font as not supporting various legacy codepages that include these codepoints (especially U+00AD) if these codepoints are not in fact present.
I interpret that the horizontalbar U+2015 is the dash that is [supposed] to be used as an introduction to speech in those traditions that utilize this style: https://en.wikipedia.org/wiki/Quotation_mark#Quotation_dash
As such, I generally draw it as a nominal em dash — i.e., I make it a full em wide with zero sidebearings (which is not typically how I draw my emdash U+2014 these days).
In fact, I have a simple script that just creates the horizontalbar glyph with a width from f.info.unitsPerEm, reads the thickness and vertical position from the bounding box of my emdash, and draws the relevant rectangle. Never give it a second thought. ;-)
6 -
The soft-hyphen (U+00AD) is a control character which tells the application that this is possible hyphenation place (to supplement automatic hyphenation or supplement it), and is such it does not need a glyph at all or even be present in the font.
Application that handle this “correctly” will use the soft-hyphen as a line breaking opportunity and if the line is broken there it will insert the glyph of hyphen (U+2010) or hyphen-minus (U+002D), so the glyph for soft-hyphen (U+00AD) will never be used.
However there is no shortage of broken applications; some applications will not use soft-hyphen unless it has glyph in the font, and others will use its glyph when breaking the line at it. So I agree with @Kent Lew that the best approach is to double encoded it with hyphen (U+2010).
5 -
I think horizontalbar is intended to provide the functionality of 2em and 3em dashes. These were used in metal type to indicate that a word or part of a word was missing or omitted. Some people mention it having no sidebearing so it can be joined up to fill large spaces. I’m not sure if these were the same as a quote dash.
Felici shows the 3em dash as being used for indentation bibliography entries, see page 235. Indexes are sometimes indented this way. But most examples on my shelves use em dash. Reed’s History of the Old English Letter Foundries uses what looks like two em dashes. The only 3em dash I found is in The Indic Scripts, Paleographic and Linguistic Perspectives, but it’s typeset digitally, so I think it’s three em dashes in succession.
My hypothesis is that 2em, 3em, and quote dashes were all rare in the metal type days, and easily confused, so they’re lumped into one unicode entry2 -
Further to what @Khaled Hosny said about U+00AD being a control character, in the Unicode Standard, you will find this description in the chapter on punctuation:Soft Hyphen. Despite its name, U+00AD soft hyphen is not a hyphen, but rather an invisible format character used to indicate optional intraword breaks. As described in Section 23.2, Layout Controls, its effect on the appearance of the text depends on the language and script used.[Unicode Standard, version 8.0, p.268]
1 -
The quotation dash is also the normal way, in Norwegian, of indicating quotes. They usually forego other typical quote marks, and tend to open with a dash. As for how they do that, well, that’s another matter. A lot of newspapers and magazines tend to use the en dash, and in print I have seen one that could be the em dash but also plausibly the quotation dash. The real problem is that it can’t be typed normally, and none of my Norwegian peers have ever explicitly missed the glyph. Perhaps it’s another piece of Unicode optimism.1
-
Because of lack of standard advice regarding the length of U+2015, I've tended to literally interpret the name HORIZONTAL BAR, and make it as wide as my /bar glyph is tall. Obviously, this needs the additional observation that my /bar glyph is always full height: extending from the descender to the ascender height (with overshoot), so close to /emdash length.
Interestingly, U+2015 is included in Windows codepage 1253 Greek. I've not checked to see whether it might be accessible from some deep level of the Greek keyboard layout.1 -
Interestingly, U+2015 is included in Windows codepage 1253 Greek. I've not checked to see whether it might be accessible from some deep level of the Greek keyboard layout.
A "find -exec grep" search tells me that x2015 is output by none of the Standard Keyboards coming with SIL Ukelele. On the other hand, if I search (from the right folder) for the character itself, that I can copy from the Character Viewer and eventually paste in a terminal window, I get the following output
<p>% find . -iname "*.keylayout" -exec grep -l 'output="―"' {} \;</p><p><br></p><p>./Unicode.bundle/Contents/Resources/Greek Polytonic.keylayout</p><p><br></p><p>./Unicode.bundle/Contents/Resources/Greek.keylayout </p>
You had thus guessed correctly.
1 -
I've not checked to see whether it might be accessible from some deep level of the Greek keyboard layout.
Shift-Option-Q (―) on Mac Greek keyboard.
But I don’t actually see it on the Mac Greek Polytonic keyboard (unless it’s the function of some dead key that I’m not seeing.)
0 -
Thanks all for the input!Kent Lew said:Further to what @Khaled Hosny said about U+00AD being a control character, in the Unicode Standard, you will find this description in the chapter on punctuation:Soft Hyphen. Despite its name, U+00AD soft hyphen is not a hyphen, but rather an invisible format character used to indicate optional intraword breaks. As described in Section 23.2, Layout Controls, its effect on the appearance of the text depends on the language and script used.[Unicode Standard, version 8.0, p.268]
Re. quotation dash, I have typeset a grand total of one book that used quotation dashes (a translation from Norwegian whose editor opted to preserve this style), and have found that endashes seemed a little short and emdashes maybe a little long, so I could see how that would imply having a length somewhere in between. Wonder how much it would actually get used though.
Devil’s Advocate question: Is there any serious downside to just not having this character? It seems relatively uh, nonessential?
0 -
Oh wait, but would that imply that it should better not be double encoded with hyphen but rather be present as a non printing character? (Sorry, confused)
You're not alone. I suspect it really doesn't matter at all what a font does with U+00AD: whether it includes it as a duplicate of the hyphen glyph, double-encodes the hyphen glyph, or doesn't support it in the glyph set at all. Any software that implement soft hyphen support is not going to be relying on the presence of a glyph, and won't be displaying a glyph if it is present, but will instead be using the presence of the control character in text to identify a preferred hyphenation point. If the text is hyphenated, then the appropriate hyphen glyph will be displayed (which will depend on the script and language involved).
That said, Khaled, do you have any experience with working with soft hyphen in text editing software in show-control-character mode? I wonder if this is a situation in which having a visible glyph — as we do for ZWJ and ZWNJ — might be useful?
1 -
Three type designers walked into a bar.The first two said, “I must dash,” and left without another word.The third muttered to himself—after a space—“What’s wrong with ’em?”5
-
Nina — As I said, I only double-encode U+00AD in case there is a validation tool that needs the codepoint in order to permit indication of certain codepage coverages. I suspect, as John does, that it doesn’t actually matter what’s in the font or not.
2 -
My interpretation of the "representation" of U+00AD in the Unicode font chart http://www.unicode.org/charts/PDF/U0080.pdf is that the character has no associated glyph, as all the other characters for which the representation is a dashed square, kind of representation that is used in a fall out font.0
-
My interpretation of the "representation" of U+00AD in the Unicode font chart http://www.unicode.org/charts/PDF/U0080.pdf is that the character has no associated glyph, as all the other characters for which the representation is a dashed square, kind of representation that is used in a fall out font.
Take caution applying this interpretation to other blocks: a dotted square is also used in Unicode code charts to indicate an enclosing character, such as the first six codepoints in the Arabic block.0 -
> Take caution applying this interpretation to other blocks
... and if I add that the “name” of the character is written in capital letters inside the dashed rectangle?
1 -
I think it could be useful to include the character.
This is a somewhat hypothetical premise, but I do recall in the past, working as a graphic designer, situations where a client or product name was broken in a strange place—they don’t like that. And I see words broken in ways that disrupt expectation daily in my newspaper.
So entering the discretionary hyphen in a document, once, will ensure that the break will be transported with the text when it is copied to other documents.
For some reason, I am reminded of a small Yorkshire town, Penistone (pronounced pen-iss-tun).
0 -
So entering the discretionary hyphen in a document, once, will ensure that the break will be transported with the text when it is copied to other documents.
Right, but the point of control characters is that they don't need to be displayed under normal circumstances, and hence don't require glyphs to be present in the font. The soft-hyphen in a document is just a code in the text string.
3 -
Nick, in TeX, soft hyphens are written as \- in the source file and I have been using them for years be it only to force breaking a word that the hyphenation dictionary does not know. I guess that your Yorkshire Town's name break pattern is Pen\-istone. Fine. However, TeX fonts do not contain a soft hyphen.
The question is what is needed in the font to produce the final text. Clearly TeX or InDesign will eventually use those soft hyphens to decide where the word is to be broken at the end of a line. Does Indesign or LaTeX need a special character to put as hyphen before the line break? No, the hyphen needs to look exactly as the other hyphens that were not triggered by a soft hyphen and what is used in TeX is the hyphen character. The application needs to know it is there in the source file but only needs a hyphen to do its job.
Can you give me the name of an application that will handle correctly soft hyphens in the source file if the soft hyphen is encoded in the font and that will break if it is not encoded in the font?
1 -
Penistone would be a terrible Crayola color name.4
-
Oh wait, but would that imply that it should better not be double encoded with hyphen but rather be present as a non printing character? (Sorry, confused)
For a well behaving implementation, it does not matter what you put there as it will be ignored anyway. The double encoding is needed for broken implementations that will try to use the font glyph (e.g. some versions of Google Chrome used the glyph for U+00AD when breaking the line at it, but I think this is fixed now).That said, Khaled, do you have any experience with working with soft hyphen in text editing software in show-control-character mode? I wonder if this is a situation in which having a visible glyph — as we do for ZWJ and ZWNJ — might be useful?
There are not that many applications that has this feature, but the two I’m familiar with, LibreOffice and Scribus, do not use the U+00AD glyph. Scribus actually just draws some predefined shapes for the few “invisible” characters it supports.
HarfBuzz, however, has an option to preserve Default_Ignorable characters which will just keep whatever glyph the font has for them, but I don’t know any applications that use it.
4 -
If I run the command (from the Mac font tools)
ftxinstalledfonts -f -U00AD
I get that (among others) Apple chancery, Chalkboard, Chalkduster, Copperplate, Didot Italic, Hoefler text and Skia do not have the character U+00AD. What can be the consequences?
I also checked that all the ttf and otf files for which ftxinstalledfonts -f -U00AD gives a YES answer have a contour or a reference for it (by checking U+00AD with the fontforge function isWorthOutputting)
0 -
> Penistone would be a terrible Crayola color name.
We once visited Peníscola, Spain. When I called Europcar to reserve a car there, the lady thought I was crank-calling.
2 -
Scribus actually just draws some predefined shapes for the few “invisible” characters it supports.
InDesign also has its own predefined symbols for displaying invisible characters when Show Hidden Characters is activated, independent of what might be encoded in the font.
FWIW, TextWrangler does appear to actually display the encoded outline for discretionary hyphen U+00AD. So, in Nick’s example above, pasting such a text into TextWrangler makes the discretionary hyphen point visible. As far as I can tell, if one is not encoded, it will use a fallback font. (Not a good reason to draw one, I’m just reporting . . . )
But this is a text-processing and coding app, not a layout app. Most people don’t go around applying different fonts in TextWrangler. ;-)
2 -
Regarding the horizontal bar, I wanted to add that I have since learned that the U+2015 has a different linebreaking behavior in “at least some software” (as per Wikipedia Quotation Dash), in that automatic line breaks directly following this character are suppressed, unlike for the standard dashes. (Thanks to Frode for pointing this out.) I did some quick testing and it seems that Word (for Mac 2011) and TextEdit do appear to honor this distinction whereas current versions of InDesign and Illustrator do not. Hm.Abovementioned Wikipedia link also names a whole row of languages that use quotation dashes: Bulgarian, French, Greek, Hungarian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Turkish, and Vietnamese. Still curious to hear in which of these (if any) use of the dedicated codepoint is frequent vs. people just using en/emdashes as seems to be the case for Norwegian.1
-
We once visited Peníscola, Spain.
That is why in Florida, they changed the spelling, swapping the i with s ;-)2 -
Nina Stössinger said:[...] Abovementioned Wikipedia link also names a whole row of languages that use quotation dashes: Bulgarian, French, [...]0
-
Nina Stössinger said:Still curious to hear in which of these (if any) use of the dedicated codepoint is frequent vs. people just using en/emdashes as seems to be the case for Norwegian.
0
Categories
- All Categories
- 43 Introductions
- 3.7K Typeface Design
- 798 Font Technology
- 1K Technique and Theory
- 617 Type Business
- 444 Type Design Critiques
- 541 Type Design Software
- 30 Punchcutting
- 136 Lettering and Calligraphy
- 83 Technique and Theory
- 53 Lettering Critiques
- 483 Typography
- 301 History of Typography
- 114 Education
- 68 Resources
- 498 Announcements
- 79 Events
- 105 Job Postings
- 148 Type Releases
- 165 Miscellaneous News
- 269 About TypeDrawers
- 53 TypeDrawers Announcements
- 116 Suggestions and Bug Reports