Language-specific considerations/best practices

Daniel Benjamin Miller · January 2023

Most fonts are designed for the Latin script, but many do not support certain diacritics and extensions which are necessary for certain languages. Or, even when they do, the appearance is not always in line with the expectations of readers of that language. Even many professionally designed fonts have issues with certain characters.

As many of you have probably seen before, @Adam Twardoch wrote a series of webpages (https://www.twardoch.com/download/polishhowto/intro.html) on considerations specific to the Polish language.

The localization features in OpenType are useful in order to ensure that users get the best output from our fonts. But does anyone know of a list of resources like Adam's which discuss considerations applicable to other languages? For example, I know of issues with Ş/Ș and Ţ/Ț in Romanian fonts (which is well-known). But what about more niche examples like the different appearances of capital Ŋ, or different placements/accents in Najavo or Marshallese?

Of course, not all fonts will support all Latin-based languages. But while it would take a lot of effort to add an entirely new character set for non-Latin languages, for these languages, it would probably be easier to support them — if only the information were collected for the benefit of designers.

Does anyone know of any resources or have any tips to share?

Erwin Denissen · January 2023

There are several of such language specific exceptions that can be fixed through Localized Forms. Here are tutorials from FontCreator, but the information is also useful if you use another font editor:

And there is the Dutch IJ. In the Dutch language, IJ is sometimes considered a ligature, or even a letter in itself. You could add a locl feature with substitutions for Ĳ and ĳ as well as variants with acute accents (Iacute_J_acutecomb and iacute_j_acutecomb).

There are several more, but I do not have a list of those.

Igor Freiberger · January 2023

About the capital Eng

Eng (1) is used in several African languages. There are two regional variants, (3) and (4). AFAIK, they are not tied to languages but to geography. So, you can't use OpenType features to set an automatic substitution for these variants. The best one can do is to offer (3) and (4) as alternates. The number of fonts that do this is near zero —I only see this in my own projects and in SIL fonts.

There is also the (2) form, used in four of the five Sámi languages. In all fonts I know, the standard is to use the Sámi variation as the default, what is clearly a historical remainig and Eurocentric approach. The number of languages and users for the African forms is huge and its adoption is much more common.

I strongly advocate the use of form (1) as the new default, letting the form (2) as an automatic substitution with OpenType when Sámi is used. And the inclusion of forms (3) and (4) if the font aims to properly support languages beyond Europe.

Image: https://us.v-cdn.net/5019405/uploads/editor/e8/2or3x0et11mb.jpg

Denis Jacquerye is the specialist to watch about Eng and other African issues. Victor Gaultney and John Hudson are also amazing sources of good, trustful info.

[My 200th post in TD, I hope it becomes useful for some of my type fellows!]

John Hudson · January 2023

I strongly advocate the use of form (1) as the new default.

I think that makes sense if one is supporting large character sets that cover African languages as well as European ones. African use of the letter, in its (1) form, is much more widespread than any European use.

A lot of fonts, for simply market reasons, target smaller sets of languages, often focusing on European languages and following standardised character sets for that purpose. I don’t think it is possible to determine what the correct default form of Eng should be in a font without considering the remainder of the character set and what target languages it supports.

Igor Freiberger · January 2023

Yes, I agree. But to include the n-like Eng only makes sense if the font supports African languages. If it is focused on European languages, there is no need of two Engs and thus no default to be set.

Florian Pircher · January 2023

I like the presentation of such differences on the website ScriptSource, under the Design & Typography sections of scripts. Sadly, the site does not appear to be actively maintained.

For example, for Latin: https://scriptsource.org/cms/scripts/page.php?item_id=script_detail_des&key=Latn

Johannes Neumeier · January 2023

Maybe not as detailed information as you are looking for, but the Hyperglot database has some of these, in the data they are noted with "design_requirements", see for example Bosnian or Romanian. It doesn't give you visual instructions or samples, but has some pointers to start your research on. Feel free to submit new ones

Paul Hanslow · January 2023

What should also be mentioned in this thread is language specific considerations for kerning pairs and designing glyphs to avoid clashing with proceeding glyphs. It's one aspect to design a glyph specifically to support a specific Latin language, yet attention should also be given to clashes in pairs which aren't typical in Latin.

This thread would benefit from having a list of words with those tricky glyph pairings.

John Hudson · January 2023

Test words—as the phrase suggests—are useful for testing, but if one is trying to make a general purpose font, not targeting specific languages but able to cleanly display any language you throw at it, you have to approach spacing and kerning as aspects of the font as a system. Inevitably, that means providing for character sequences that may never occur, but the obverse is also true: if you don’t implement spacing and kerning systematically, you are inevitably going to fail on unanticipated sequences that do occur.

Unfortunately, the place where this systematic analysis and implementation has to happen is exactly the place where OpenType GPOS and most font tools are weakest: in the interaction between spacing, kerning, mark positioning and, in some scripts, things like cursive attachment positioning.

Igor Freiberger · January 2023

Pairs with repeated i and diacritics (like ïï) are a sure bet for positive kerning. I also expand this to include l, j, and Latin iota. In South American and African languages, there are several encounters of double accented vowels.

Nick Shinn · January 2023

Two practices I’ve occasionally indulged in:

German-tagged Ä, Ö and Ü, with lowered dieresis.

English-tagged vertically-flipped left quote marks.

Denis Moyogo Jacquerye · January 2023

Regarding some of these, we might want to revisit them and see if the pros still beat the cons, especially the ones that may change the meaning of some characters.

For Romanian, some users actually prefer to have the S cedilla and T cedilla look different from the S comma below and T comma below as this helps them identify what text was written with the deprecated characters. Additionally, Turkish or Gagauz names used in Romanian text should retain their cedilla. Most Romanian users have keyboard layouts that use the comma below characters nowadays and the text that still uses the cedilla characters instead is becoming more rare. Maybe doing the substitution from cedilla to comma below should be optional in new fonts and not the default.

For Dutch ij:

There are a few Dutch words that have ij as two separate letters: minijurk (officially spelled mini-jurk since 2005, but users do whatever they want), strooijonker (same officially strooi-jonker), bloeijaar (bloei-jaar), groeijaar (groei-jaar), but also odd ones like holadijee, and of course borrowed words like hijab, Beijing, Khadija, etc. Depending on the scope of a font, it may be a terrible idea to ligate ij in those words.

For Dutch ij-acute:

For the ij-acute, the current spelling rule that places two acutes is from 1995. Before that a common rule was to put one acute on digraph composed of different letters, like níet or zíjn, and two acutes on repeated letters, like één or nóóit. Several users still use that rule, several users use the current rule and many users don’t seem to care and write whatever is fastest on their keyboard randomly.

Considering the official rule are only mandatory in government documents and education, a random user may very prefer something else. (See this previous discussion )

Text written before 1995 cannot be assumed to follow the current spelling rules either.

There are also a few foreign names that have íj that may occur in Dutch text, like Spanish or Hungarian place names or family names.

Maybe doing the substitution from iacute-j to iacute-jacute is a terrible idea if the font will be used by users with different spelling habits.

For Catalan punt volat:

Maybe Catalan names with l·l like Marcel·lí or Gal·la should look correct in all languages and not just in Catalan?

jeremy tribby · January 2023

Christoph Koeberlin generously wrote up his approach to some extended characters when he published his extended Latin glyph sets along with his Pangea font family: https://github.com/koeberlin/Designing-Latin-S
As the initial post in this thread suggests, Latin-M is pretty manageable once you have a base Latin set

Language-specific considerations/best practices

Comments

Categories