How do you want your text today?

The video of my 2024 Unicode Technology Workshop presentation is now online. Due to a last minute situation, I wasn’t able to attend the conference in person, but Unicode kindly invited me to record my talk for inclusion with the other videos.

How do you want your text today?

The presentation introduces a concept of ‘text modes’ defined as contextual rules for insertion of Unicode formatting control characters such as zero-width joiner (ZWJ) and non-joiner (ZWNJ) or variation selectors in text—possibly as a buffered state between document encoding and local display—as a means to apply governmental standards, publishing house styles, community norms, or individual reader preferences.

The examples in the presentation focus on use of ZWJ and ZWNJ in Indic scripts, with particular attention to Sinhala and Malayalam as examples.

Unicode have not made comments open for the video on YouTube, so I have started this thread to invite feedback and suggestions regarding text modes or other aspects of the presentation.

Comments

  • Thanks for a very informative and useful presentation 

    I had a similar problem with Bengali Font development
    Bengali also has a traditional orthography  containing opaque conjuncts visible in newspapers story books and road signs; and a modern version, used in school books since begining of  this millenia in schools of West Bengal and Bangladesh as per recomendation of Bangla Academy for overhaul of Bangla Spelling.
    So to create material for school children who are not exposed to traditional conjunct font need to support both Traditional and modern conjucts. 
    Creating modern conjucts using zwj is difficult in Bengali, because unlike Devanagari, the second consonant also changes form to a below base form. 
    While creating my Monospace Bengali Font in 2002, I tried to create a contextual lookup where the base consonant changes shape to belowbase following a half form.

    However for my other font I decided to keep the modern form as stylistic alternate of traditional form, that way there is no need to insert ZWNJ or ZWJ.

    That way one can select stylistic alternate in Libreoffice or set css for web typography to show modern forms 

     Now what is desirable, a unicode solution or a typographic one, using "salt" feature? 
  • John Hudson
    John Hudson Posts: 3,206
    I find the ‘modern version’ introduced in education very problematic. As you observe, the standard form of the script is used in newspaper and road signs, and also in both fiction and non-fiction books, in a lot of online and broadcast media. Teaching children something different is not preparing them to be able to read in the world outside of school, and only means they will need to learn a different set of forms in order to operate in that world. [I’ll also note that introducing different forms for conjuncts introduces problems for optical character recognition.]

    To consider your specific questions, though...

    The behaviour of ZWNJ in Bengali to force explicit hasanta (hôsônto) is defined in Unicode; however, no general behaviour is specified for ZWJ in conjunct display (with the special exception of using ZWJ before the hasanta to trigger yaphala in the sequence র‍্য). This is because Bengali script traditionally does not have anything like the Devanagari half form mechanism: conjuncts are either written with ligatures or with explicit hasanta. Some font makers introduced small prescript forms as a kind of Bengali half form, but these are a novelty, and the publishers and broadcasters I have worked with rejected them as a Devanagarisation of the writing system. So this is not standard behaviour defined in the Unicode Standard:


    That said, I believe the bng2 OpenType Layout shaping is built on top of the model of dev2 shaping, so if you have a ‘half’ feature in a Bengali font implementing this kind of small prescript form substitution, then it is quite likely that it will be triggerable using ZWJ after hasanta as in Devanagari.

    So these aspects of traditional versus various kinds of reformed, novel, or pedagogical Bengali could be handled at the formatting control character level using ZWNJ and ZWJ, even though the latter is not official and would be considered a hack.

    However, the distinction between different forms of ligatures, between e.g.

    and

    is not something that can be managed using formatting control characters. That needs to be a discretionary selection at the glyph processing level applied by the user.

    My recommendation would be to class the various kinds of forms you want to apply together, and put them into one of the ssXX stylistic set features. You might have a stylistic set feature that includes all of the ‘modern version’ educational forms and shaping for example, so that someone could switch between standard and this other form using a single toggle. Or you could split the forms and behaviour across multiple stylistic set features based on the kinds of visual shaping in each, e.g. to be able to distinguish
    from  in different features.

    Note that this might mean using different GSUB lookup types within the same feature, for one-to-one or one-to-many substitution.

    It is also worth noting that the xxx2 Indic shaping models all apply a rule derived from modern Hindi practice in Devanagari, in which repha and ikar are not reordered past an explicit virama. Since reordering happens at a specific stage in OTL processing, which precedes the stage at which discretionary features such as stylistic sets are applied, this means that you would get different results introducing explicit hasanta using ZWNJ vs using a stylistic set feature. So, for example, the standard ligature form vs form with explicit hasanta forced using ZWNJ:

    but possibly this form if one were to decompose the ligature to an explicit hasanta sequence in a post-reordering stylistic set feature:

    [This is the reason why things like the BIS Hindi recommendations for Devanagari cannot be implemented via a stylistic set.]

  • But what about contextual changing of base glyphs to below base form in case it follows an explicit half form, as in my first example? Is it a valid solution?
  • Sorry, you have already answered that

    "So these aspects of traditional versus various kinds of reformed, novel, or pedagogical Bengali could be handled at the formatting control character level using ZWNJ and ZWJ, even though the latter is not official and would be considered a hack."