Language-specific considerations/best practices

Daniel Benjamin Miller
edited January 2023 in Technique and Theory
Most fonts are designed for the Latin script, but many do not support certain diacritics and extensions which are necessary for certain languages. Or, even when they do, the appearance is not always in line with the expectations of readers of that language. Even many professionally designed fonts have issues with certain characters.
As many of you have probably seen before, @Adam Twardoch  wrote a series of webpages (https://www.twardoch.com/download/polishhowto/intro.html) on considerations specific to the Polish language.
The localization features in OpenType are useful in order to ensure that users get the best output from our fonts. But does anyone know of a list of resources like Adam's which discuss considerations applicable to other languages? For example, I know of issues with Ş/Ș and Ţ/Ț in Romanian fonts (which is well-known). But what about more niche examples like the different appearances of capital Ŋ, or different placements/accents in Najavo or Marshallese?
Of course, not all fonts will support all Latin-based languages. But while it would take a lot of effort to add an entirely new character set for non-Latin languages, for these languages, it would probably be easier to support them — if only the information were collected for the benefit of designers.
Does anyone know of any resources or have any tips to share?

Comments

  • There are several of such language specific exceptions that can be fixed through Localized Forms. Here are tutorials from FontCreator, but the information is also useful if you use another font editor:
    And there is the Dutch IJ. In the Dutch language, IJ is sometimes considered a ligature, or even a letter in itself. You could add a locl feature with substitutions for IJ and ij as well as variants with acute accents (Iacute_J_acutecomb and iacute_j_acutecomb).

    There are several more, but I do not have a list of those.
  • John Hudson
    John Hudson Posts: 3,255
    I strongly advocate the use of form (1) as the new default.

    I think that makes sense if one is supporting large character sets that cover African languages as well as European ones. African use of the letter, in its (1) form, is much more widespread than any European use.

    A lot of fonts, for simply market reasons, target smaller sets of languages, often focusing on European languages and following standardised character sets for that purpose. I don’t think it is possible to determine what the correct default form of Eng should be in a font without considering the remainder of the character set and what target languages it supports.
  • Yes, I agree. But to include the n-like Eng only makes sense if the font supports African languages. If it is focused on European languages, there is no need of two Engs and thus no default to be set.
  • I like the presentation of such differences on the website ScriptSource, under the Design & Typography sections of scripts. Sadly, the site does not appear to be actively maintained.

    For example, for Latin: https://scriptsource.org/cms/scripts/page.php?item_id=script_detail_des&key=Latn
  • Maybe not as detailed information as you are looking for, but the Hyperglot database has some of these, in the data they are noted with "design_requirements", see for example Bosnian or Romanian. It doesn't give you visual instructions or samples, but has some pointers to start your research on. Feel free to submit new ones :)
  • What should also be mentioned in this thread is language specific considerations for kerning pairs and designing glyphs to avoid clashing with proceeding glyphs. It's one aspect to design a glyph specifically to support a specific Latin language, yet attention should also be given to clashes in pairs which aren't typical in Latin. 

    This thread would benefit from having a list of words with those tricky glyph pairings.
  • John Hudson
    John Hudson Posts: 3,255
    Test words—as the phrase suggests—are useful for testing, but if one is trying to make a general purpose font, not targeting specific languages but able to cleanly display any language you throw at it, you have to approach spacing and kerning as aspects of the font as a system. Inevitably, that means providing for character sequences that may never occur, but the obverse is also true: if you don’t implement spacing and kerning systematically, you are inevitably going to fail on unanticipated sequences that do occur.

    Unfortunately, the place where this systematic analysis and implementation has to happen is exactly the place where OpenType GPOS and most font tools are weakest: in the interaction between spacing, kerning, mark positioning and, in some scripts, things like cursive attachment positioning.
  • Pairs with repeated i and diacritics (like ïï) are a sure bet for positive kerning. I also expand this to include l, j, and Latin iota. In South American and African languages, there are several encounters of double accented vowels.
  • Nick Shinn
    Nick Shinn Posts: 2,224

    Two practices I’ve occasionally indulged in:

    German-tagged Ä, Ö and Ü, with lowered dieresis.

    English-tagged vertically-flipped left quote marks.


  • Denis Moyogo Jacquerye
    edited January 2023
    Regarding some of these, we might want to revisit them and see if the pros still beat the cons, especially the ones that may change the meaning of some characters.

    For Romanian, some users actually prefer to have the S cedilla and T cedilla look different from the S comma below and T comma below as this helps them identify what text was written with the deprecated characters. Additionally, Turkish or Gagauz names used in Romanian text should retain their cedilla. Most Romanian users have keyboard layouts that use the comma below characters nowadays and the text that still uses the cedilla characters instead is becoming more rare. Maybe doing the substitution from cedilla to comma below should be optional in new fonts and not the default.

    For Dutch ij:
    There are a few Dutch words that have ij as two separate letters: minijurk (officially spelled mini-jurk since 2005, but users do whatever they want), strooijonker (same officially strooi-jonker), bloeijaar (bloei-jaar), groeijaar (groei-jaar), but also odd ones like holadijee, and of course borrowed words like hijab, Beijing, Khadija, etc. Depending on the scope of a font, it may be a terrible idea to ligate ij in those words.

    For Dutch ij-acute:
    For the ij-acute, the current spelling rule that places two acutes is from 1995. Before that a common rule was to put one acute on digraph composed of different letters, like níet or zíjn, and two acutes on repeated letters, like één or nóóit. Several users still use that rule, several users use the current rule and many users don’t seem to care and write whatever is fastest on their keyboard randomly.
    Considering the official rule are only mandatory in government documents and education, a random user may very prefer something else. (See this previous discussion )
    Text written before 1995 cannot be assumed to follow the current spelling rules either.
    There are also a few foreign names that have íj that may occur in Dutch text, like Spanish or Hungarian place names or family names.
    Maybe doing the substitution from iacute-j to iacute-jacute is a terrible idea if the font will be used by users with different spelling habits.

    For Catalan punt volat:
    Maybe Catalan names with l·l like Marcel·lí or Gal·la should look correct in all languages and not just in Catalan?






  • jeremy tribby
    jeremy tribby Posts: 252
    edited January 2023
    Christoph Koeberlin generously wrote up his approach to some extended characters when he published his extended Latin glyph sets along with his Pangea font family: https://github.com/koeberlin/Designing-Latin-S
    As the initial post in this thread suggests, Latin-M is pretty manageable once you have a base Latin set