Dutch IJ exceptions

Hoi!
I rarely outsource my type design research but I'm hitting a wall and would love to know if I'm the only with this particular problem:
When it comes to creating a locl feature for the Dutch language to substitute /i/j with the digraph /ij, how do you handle exceptions to the rule, e.g. loan words like bijou, Fiji, or even the Dutch word bijectie etc? Do you turn a blind eye to this and let the user figure out how to disable the substitution? Do you make exceptions for the most common words? Did anyone come up with a rule that would take care of all exceptions neatly and efficiently without weighing down my font files?
Thanks in advance for your help!
BB

Comments

  • And: Yes, I agree, this shouldn't be handled on font level but to my knowledge applications don't care. But my users do.
  • Theunis de Jong
    Theunis de Jong Posts: 112
    edited July 2018
    Applications should care – through their users.

    The common method of manually breaking a ligature is to insert a Zero-Width Joiner between those characters. This will, for instance, typeset the – made up word – zalfindustrie 'ointment industry' as two separate words:




    The default use of the ligature in the top encourages reading as "zalfin-dustrie". (I tried to come up with a more funny example on short notice, where the actual meaning depends on the reading. I might come back on that later.)

    The second line has a ZWJ, which IMO helps with reading. Of course you get that f/i clash in return, which was pretty much the point of having a ligature in the first place.

    The third and fourth lines suggest doing something else: compound hyphen, 1/6th space.

    Whatever the method a user prefers, I think this should definitely not be something to try and fix in the font file. The default form of your /ij should be "correct" for most regular cases, but in addition to that also "acceptable" for when it rather should not appear.

    It's up to the users – and their software – to prevent its use when inappropriate, but if they don't, you still should have a reasonable /ij.
  • Please CMIIW. Shouldn’t that be a ZWNJ, a zero-width non-joiner?
  • Given that features like an /ij ligature are usually not on by default, I'd trust a user who knows how to switch it on to know how to switch it off where needed, too...

    As for /fi, I remain convinced that it doesn't break compound work recognition, unless it is spectacularly badly designed, and that the prescription against avoiding such cases is purely academic. I don't have research to back this up, though. Hyphens seem like the simplest solution to break such compounds. This doesn't work for /ij in «Fiji», of course...
  • John Hudson
    John Hudson Posts: 3,229
    edited July 2018
    Given that features like an /ij ligature are usually not on by default, I'd trust a user who knows how to switch it on to know how to switch it off where needed, too... 

    That suggests implementation in a standard or discretionary ligature feature substitution; whereas when we've been asked to implement /ij/ substitution support by Dutch publishers they've preferred this to be a language system <ccmp> or <locl> substitution, i.e. behaviour linked to tagging of text as Dutch and not something that can be disabled by turning a feature off at the UI level.

    So disabling the substitution requires either a) intervention at the text level, whether by tagging a word as something other than Dutch or by inserting ZWNJ between the letters, or b) by trying to define all exception words that might be expected to occur in Dutch text as contextual exceptions in the <ccmp> feature code. Obviously, the latter doesn't address Bianca's '
    neatly and efficiently without weighing down my font files', and the set of such words is open ended and liable to change. [I am reminded of efforts to support Urdu newspaper setting with fonts that contained 20,000+ complete word ligatures. The system struggled to accommodate the rapid succession of Soviet leaders in the 1980s whose names needed to be transliterated into nastaliq. There will always be new foreign words and proper nouns being introduced into a language.]

  • Michel Boyer
    Michel Boyer Posts: 120
    edited July 2018
    I just downloaded the OpenOffice Dutch spelling dictionary and it contains 10254 entries with the ij digraph and 99 with IJ. That means that if a user types the word when the dictionary is active, the substutition is made in the source file. I think you can also ask for spell checking a selection.

    Does'nt InDesign use a similar dictionary? Why then would publishers ask for an /ij/ substitution in liga (or even dlig)?

  • @Michel Boyer The OpenOffice or Libreoffice Dutch spelling dictionary do not make the difference between words with the single Dutch alphabet letter ij (like in blij, makkelijk where ij represents one diphthong or vowel, [ɛi] or [ə]) and words with the two separate letters i and j (like in bijectie, Fiji, bijou where ij represents a vowel followed by a consonant, [i] and [j] or [dʒ] or [ʒ]), they are all stored with the single characters ij and IJ, which themselves are only used in the dic file as they are substituted for the digraphs ij (i+j) and IJ (I+J) when spellchecking.

    So those dictionaries aren’t directly helpful on the matter.



  • Michel Boyer
    Michel Boyer Posts: 120
    edited July 2018
    Indeed, unfortunately the dictionary contains the following entries with the digraph ij.
    Bijou/PN
    bijouterie/Zb
    bijou/Ya
    bijous
    Beijing
    Fiji
    
    The problem is with the list, not with the error correcting program. Such dictionaries should be useful when corrected, no? The word list comes from https://www.opentaal.org/bestanden
  • James Puckett
    James Puckett Posts: 1,998
    edited July 2018
    Given that ij is an obsolete/novelty glyph anyway perhaps it’s best to just make it a character variant so it’s easily accessed by software features rather than used automatically.
  • John Hudson
    John Hudson Posts: 3,229
    edited July 2018
    Given that ij is an obsolete/novelty glyph anyway perhaps it’s best to just make it a character variant so it’s easily accessed by software features rather than used automatically.

    There are a couple of different aspects to the Dutch IJ/ij vowel.

    One is as a glyph that can take a distinctive visual form, especially in display typefaces. I suppose this could be considered a 'novelty glyph', although it gives Dutch display typography a distinctive national character.

    The other aspect is as a digraph letter with specific textual layout behaviours with regard to letterspacing (tracking), vertical text, and accentuation.

    I'm generally making text types for continuous reading, so the form of IJ/ij in my fonts is typically identical to a regularly spaced I+J and i+j combination; indeed, I build the IJ/ij glyphs as composites. But at least some Dutch publishers are concerned with those layout behaviours and have requested glyph level support for these as default for Dutch language processing.
  • Michel Boyer
    Michel Boyer Posts: 120
    edited July 2018
    According to https://hunspell.github.io, the spell checker for LibreOffice, OpenOffice and InDesign is the same, namely Hunspell:
    Hunspell is the spell checker of LibreOffice, OpenOffice.org, Mozilla Firefox 3 & Thunderbird, Google Chrome, and it is also used by proprietary software packages, like macOS, InDesign, memoQ, Opera and SDL Trados.
    That list is far from being exhaustive as can be seen from https://en.wikipedia.org/wiki/Hunspell. The Hunspell site mentions the project, "Dutchspell in Hunspell", sponsored by the OpenTall Foundation and the Dutch Language Union. The word lists from OpenTaal for OSX products, Mozilla products and OpenOffice are UTF-8 encoded and use U+0132 and U+0133 for IJ and ij.

    That looks like U+0132 and U+0133 may soon be found in almost all texts written in Dutch.
  • Denis Moyogo Jacquerye
    edited July 2018
    @Michel Boyer As I have said earlier, U+0132 and U+0133 are only used inside the dictionary, they are converted to the digraphs IJ and ij when spellchecking. It will output words with i+j when correcting spelling, but keeps the words as is regardless of whether they use U+0133 or i+j when already spelled “correctly”.
    See the ICONV and OCONV definitions in the affix file.
  • Michel Boyer
    Michel Boyer Posts: 120
    @Michel Boyer As I have said earlier, U+0132 and U+0133 are only used inside the dictionary, they are converted to the digraphs IJ and ij when spellchecking. It will output words with i+j when correcting spelling, but keeps the words as is regardless of whether they use U+0133 or i+j when already spelled “correctly”.
    See the ICONV and OCONV definitions in the affix file.
    It is indeed the case that  the spell checker currently accepts both rijbroek (with digraph) and rijbroek (without digraph); does your reading of the file tell you that this is to remain that way? 
  • Denis Moyogo Jacquerye
    edited July 2018
    Yes, testing in LibreOffice, neither character sequences are flagged as incorrectly spelled. But if you swap letters, the correction has ij as two characters.