Avoiding hyphenation of digraphs through opentype features

I am trying to write a feature to avoid the hyphenation of certain digraphs in Basque language (e.g. tx, tz, dd, tt...). Hyphenation plug-ins don't work very well for Basque and disconnecting those digraphs is considered unacceptable in a formal text. I thought it could be solved through localized substitutions (locl) of these combinations by their respective ligatures. Apparently I was wrong. Ligatures seem to substitute the digraphs at a presentation level, but they are still coded as two letters (which make sense on the other hand) and they remain subject to hyphenation when the edge of the column is close enough.

I also considered using wordjoiner (U2060) or zerowidthjoiner (200D) to keep the two letters together but a) It would involve a third glyph and I don't know if such substitution is posible (I mean, substituting two glyphs by three), and b) if the former is possible, would it remain the same word, a different word or two distinct words?

Comments

  • Nick Shinn
    Nick Shinn Posts: 2,225
    Does (rlig) behave differently than (liga) in this respect?
  • Kent Lew
    Kent Lew Posts: 959
    I can understand your frustration with existing Basque hyphenation resources, but really hyphenation dictionaries/rules are the proper avenue for solution.

    Hyphenation happens at the text-processing level. It really shouldn’t be attempted at the font level.

    That said, in response to your specific questions:

    a) It might be possible to insert a wordjoiner U+2060 into a sequence, but you would have to be tricky about it. There are one-to-many substitutions, but I don't think you can contextualize that. So, you would probably have to use a global substitution and then contextually undo that substitution for the majority of situations. Maybe something like this:

    feature ccmp {<br>    script latn;<br>    language EUQ;<br>        sub t by t wordjoiner;<br>        sub t' wordjoiner' @nondigraph by t;<br>} ccmp;<br>
    where that @nondigraph class includes all glyphs that do *not* form an unbreakable digraph with t.

    (This is untested. I can’t say with confidence that it would work in all environments, but I believe would compile as legitimate code, albeit probably pretty bloated.)

    b) Assuming that these substitutions worked in theory, I can’t say whether text rendering environments would interpret it and respect the hack you are trying to implement.

    Note that the non-breaking quality is a function of the character codepoint, not the glyph itself. So it would depend upon where in the process the substitution is happening and whether the underlying codepoint that corresponds with the substituted glyph is being presented to the layout engine at that point in order to interpret the non-breaking quality of that inserted wordjoiner. (I have a feeling probably not.)

    You might read more in the Unicode Standard about the specific control characters you are contemplating. Check out Section 23.2 on Line and Word Breaking.
  • @Nick Shinn @John Hudson
    "rlig" does not solve the problem. Not in inDesign at least, which is the software we are using to test the font, and the one that will be more likely to be used by potential users here. It behaves the same way other kind of ligatures do in this respect.

    @Kent Lew @John Hudson
    I agree, this issue should be addressed at a character processing and line-layout level. There are actually some attempts but, besides not being reliable enough the range of software they cover is pretty short (e.g inDesign versions from CS6 on are not covered). I was looking for a provisional solution to circumvent a very common problem among graphic designers laying out Basque texts.

    As John anticipated, the trick of inserting the wordjoiner U+2060 did not work. It seems the solution will have to come from the developers of the plug-ins and a good management and implementation of hyphenation dictionaries.

    Anyway, thank you very much for your comprehensive and informative replays
  • Kent Lew
    Kent Lew Posts: 959
    @John Hudson That’s what I suspected. I figured you’d chime in to confirm. ;-)
  • Nick Shinn
    Nick Shinn Posts: 2,225
    edited June 2018
    How about if you made alternates for these combinations with reeeeeally wide right sidebearings for the left glyph, then compensated by kerning—would the wide glyphs prevent breaking by exceeding the line length? Just an idea—I haven’t experimented.
  • Nick Shinn said:nu
    How about if you made alternates for these combinations with reeeeeally wide right sidebearings for the left glyph, then compensated by kerning—would the wide glyphs prevent breaking by exceeding the line length? Just an idea—I haven’t experimented.

    Sounds pretty damn smart to me
  • @Nick Shinn
    I will definitely try and let you know the result :-) Many thanks!
  • Craig Eliason
    Craig Eliason Posts: 1,441
    edited June 2018
    How about if you made alternates for these combinations with reeeeeally wide right sidebearings for the left glyph, then compensated by kerning—would the wide glyphs prevent breaking by exceeding the line length? Just an idea—I haven’t experimented.
    The hack of all hacks!
  • I just had an idea for an funny hack. To prevent hyphenation between t and x, substitute the t with an t_x and the x with an empty (zero width) glyph. That way it would look like it breaks after the x. 
  • @Nick Shinn and @Georg Seifert, thanks a lot for such clever suggestions. I tried both but unfortunately none of them worked. The text flow behaves exactly as if the original characters were there.
  • Juan said:

    I am trying to write a feature to avoid the hyphenation of certain digraphs in Basque language (e.g. tx, tz, dd, tt...). Hyphenation plug-ins don't work very well for Basque and disconnecting those digraphs is considered unacceptable in a formal text. ...

    Juan, I have a working Hunspell set (spelling and hyphenation) I use in ID. It's sort of a pita adding them to ID, but if you already have the working dictionary/spelling parts, it shouldn't be hard to add the hyphenation part.

    Now, not being a native Basque speaker, I am uncertain how well it works in wider usage than I have done (everything gets run past editors and I don't always see the results of edits).

    If you want a Zipped bundle, let me know and I can get it to you.

    Mike
  • I just tried this. And you are right, my suggestion doesn't work in Indesign.

    But then I tried something else. I made a stylistic set with much smaller letters. But the line breaks are calculated from the default glyphs and not from the shorter alternates. In my case, the stylistic alternates would have made the word fit the line easily bit it would still break. I know that this is very difficult to compute because the opentype feature changes the context that was is basis. But it is still disappointing. 
  • Nick Shinn
    Nick Shinn Posts: 2,225
    The method I suggested does work in InDesign.
    The trick is to give both the default left letter glyph and the default hyphen extra wide right sidebearings.

    I’m not sure how robust this method is in other applications, or what the optimum width of the super-wide glyphs should be.

  • Kent Lew
    Kent Lew Posts: 959
    @Igor Freiberger — I had been thinking a GREP style in InDesign might do the trick. Nice.

    But I think I might see a typo in your GREP. If I’m not mistaken, the first closing parenthesis and bracket are transposed. I believe the bracket needs to close first and then the parenthesis.

    And there might be an extraneous right parenthesis in the final pattern.


    I think maybe you meant it to be:
    (?<=[a|e|i|o|u|ü])[dd|ll|rr|ts|tt|tx|tz](?=[a|e|i|o|u|ü])

  • notdef
    notdef Posts: 168
    @Nick Shinn Probably looks interesting with an underline!
  • Thanks @Igor Freiberger and @Kent Lew for the GREP approach. I did not know it could be so powerful. I will explore it.
    The trick is to give both the default left letter glyph and the default hyphen extra wide right sidebearings.
    @Nick Shinn do you mean increasing the right sidebearing of the default hyphen or of an alternate hyphen?
  • Nick Shinn
    Nick Shinn Posts: 2,225
    The default. Because as John Hudson has noted: 
    Hyphenation happens at the character processing and line-layout level,