Avoiding hyphenation of digraphs through opentype features
I am trying to write a feature to avoid the hyphenation of certain digraphs in Basque language (e.g. tx, tz, dd, tt...). Hyphenation plug-ins don't work very well for Basque and disconnecting those digraphs is considered unacceptable in a formal text. I thought it could be solved through localized substitutions (locl) of these combinations by their respective ligatures. Apparently I was wrong. Ligatures seem to substitute the digraphs at a presentation level, but they are still coded as two letters (which make sense on the other hand) and they remain subject to hyphenation when the edge of the column is close enough.
I also considered using wordjoiner (U2060) or zerowidthjoiner (200D) to keep the two letters together but a) It would involve a third glyph and I don't know if such substitution is posible (I mean, substituting two glyphs by three), and b) if the former is possible, would it remain the same word, a different word or two distinct words?
Comments
-
Does (rlig) behave differently than (liga) in this respect?0
-
I can understand your frustration with existing Basque hyphenation resources, but really hyphenation dictionaries/rules are the proper avenue for solution.Hyphenation happens at the text-processing level. It really shouldn’t be attempted at the font level.That said, in response to your specific questions:a) It might be possible to insert a wordjoiner U+2060 into a sequence, but you would have to be tricky about it. There are one-to-many substitutions, but I don't think you can contextualize that. So, you would probably have to use a global substitution and then contextually undo that substitution for the majority of situations. Maybe something like this:
feature ccmp {<br> script latn;<br> language EUQ;<br> sub t by t wordjoiner;<br> sub t' wordjoiner' @nondigraph by t;<br>} ccmp;<br>
where that @nondigraph class includes all glyphs that do *not* form an unbreakable digraph with t.(This is untested. I can’t say with confidence that it would work in all environments, but I believe would compile as legitimate code, albeit probably pretty bloated.)b) Assuming that these substitutions worked in theory, I can’t say whether text rendering environments would interpret it and respect the hack you are trying to implement.Note that the non-breaking quality is a function of the character codepoint, not the glyph itself. So it would depend upon where in the process the substitution is happening and whether the underlying codepoint that corresponds with the substituted glyph is being presented to the layout engine at that point in order to interpret the non-breaking quality of that inserted wordjoiner. (I have a feeling probably not.)You might read more in the Unicode Standard about the specific control characters you are contemplating. Check out Section 23.2 on Line and Word Breaking.3 -
Hyphenation happens at the character processing and line-layout level, and there isn't much you can reliably do to influence it at the glyph processing level. You could try applying an 'rlig' feature substitution, which might be enough to prevent breaking at line boundaries in some software, but I wouldn't bet on it and I would be very doubtful indeed that it would work everywhere.
@Kent LewIt might be possible to insert a wordjoiner U+2060 into a sequence...It's possible to insert a wordjoiner glyph into the run using GSUB as you suggest, but that isn't the same as inserting the U+2060 character into the text string. The hyphenation engine doesn't get feedback from the glyph level.5 -
@Nick Shinn @John Hudson
"rlig" does not solve the problem. Not in inDesign at least, which is the software we are using to test the font, and the one that will be more likely to be used by potential users here. It behaves the same way other kind of ligatures do in this respect.
@Kent Lew @John Hudson
I agree, this issue should be addressed at a character processing and line-layout level. There are actually some attempts but, besides not being reliable enough the range of software they cover is pretty short (e.g inDesign versions from CS6 on are not covered). I was looking for a provisional solution to circumvent a very common problem among graphic designers laying out Basque texts.
As John anticipated, the trick of inserting the wordjoiner U+2060 did not work. It seems the solution will have to come from the developers of the plug-ins and a good management and implementation of hyphenation dictionaries.
Anyway, thank you very much for your comprehensive and informative replays0 -
@John Hudson That’s what I suspected. I figured you’d chime in to confirm. ;-)0
-
How about if you made alternates for these combinations with reeeeeally wide right sidebearings for the left glyph, then compensated by kerning—would the wide glyphs prevent breaking by exceeding the line length? Just an idea—I haven’t experimented.2
-
Nick Shinn said:nuHow about if you made alternates for these combinations with reeeeeally wide right sidebearings for the left glyph, then compensated by kerning—would the wide glyphs prevent breaking by exceeding the line length? Just an idea—I haven’t experimented.
Sounds pretty damn smart to me0 -
@Nick Shinn
I will definitely try and let you know the result :-) Many thanks!0 -
Nick Shinn said:How about if you made alternates for these combinations with reeeeeally wide right sidebearings for the left glyph, then compensated by kerning—would the wide glyphs prevent breaking by exceeding the line length? Just an idea—I haven’t experimented.1
-
I just had an idea for an funny hack. To prevent hyphenation between t and x, substitute the t with an t_x and the x with an empty (zero width) glyph. That way it would look like it breaks after the x.3
-
@Nick Shinn and @Georg Seifert, thanks a lot for such clever suggestions. I tried both but unfortunately none of them worked. The text flow behaves exactly as if the original characters were there.0
-
Juan: as you are using InDesign, the better solution seems to use a character style with the no-break attribute and trigger it through a GREP in paragraph style. My interface is in Portuguese, but you can easily find the matching controls due to their positions.
Character style:
Paragraph style:
(As InDesign interface does not shows the whole GREP code, I copied it in red.)
The GREP code means: find a digraph (dd or ll or rr or ts or tt or tx or tz) preceeded by a vowel (?<=) and followed by a vowell (?=) and attribute the Nobreak style to it.
I build it supposng that digraphs in Basque are always preceeded and followed by vowels. If this is not true, the GREP code could be improved, but the idea is the same and so you don't need to edit the font.5 -
Juan said:
I am trying to write a feature to avoid the hyphenation of certain digraphs in Basque language (e.g. tx, tz, dd, tt...). Hyphenation plug-ins don't work very well for Basque and disconnecting those digraphs is considered unacceptable in a formal text. ...
Now, not being a native Basque speaker, I am uncertain how well it works in wider usage than I have done (everything gets run past editors and I don't always see the results of edits).
If you want a Zipped bundle, let me know and I can get it to you.
Mike1 -
I just tried this. And you are right, my suggestion doesn't work in Indesign.
But then I tried something else. I made a stylistic set with much smaller letters. But the line breaks are calculated from the default glyphs and not from the shorter alternates. In my case, the stylistic alternates would have made the word fit the line easily bit it would still break. I know that this is very difficult to compute because the opentype feature changes the context that was is basis. But it is still disappointing.0 -
The method I suggested does work in InDesign.
The trick is to give both the default left letter glyph and the default hyphen extra wide right sidebearings.
I’m not sure how robust this method is in other applications, or what the optimum width of the super-wide glyphs should be.
2 -
@Igor Freiberger — I had been thinking a GREP style in InDesign might do the trick. Nice.But I think I might see a typo in your GREP. If I’m not mistaken, the first closing parenthesis and bracket are transposed. I believe the bracket needs to close first and then the parenthesis.And there might be an extraneous right parenthesis in the final pattern.I think maybe you meant it to be:
(?<=[a|e|i|o|u|ü])[dd|ll|rr|ts|tt|tx|tz](?=[a|e|i|o|u|ü])
1 -
@Nick Shinn Probably looks interesting with an underline!1
-
Thanks @Igor Freiberger and @Kent Lew for the GREP approach. I did not know it could be so powerful. I will explore it.The trick is to give both the default left letter glyph and the default hyphen extra wide right sidebearings.@Nick Shinn do you mean increasing the right sidebearing of the default hyphen or of an alternate hyphen?0
-
The default. Because as John Hudson has noted:Hyphenation happens at the character processing and line-layout level,0
Categories
- All Categories
- 43 Introductions
- 3.7K Typeface Design
- 803 Font Technology
- 1K Technique and Theory
- 622 Type Business
- 444 Type Design Critiques
- 542 Type Design Software
- 30 Punchcutting
- 136 Lettering and Calligraphy
- 83 Technique and Theory
- 53 Lettering Critiques
- 485 Typography
- 303 History of Typography
- 114 Education
- 68 Resources
- 499 Announcements
- 80 Events
- 105 Job Postings
- 148 Type Releases
- 165 Miscellaneous News
- 270 About TypeDrawers
- 53 TypeDrawers Announcements
- 116 Suggestions and Bug Reports