Latin Extended-B Languages?
Fernando Díaz
Posts: 133
Hey there,
I wanted to know which languages are part of the Latin Extended-B set. I can find the glyphs but not the languages that they support.
Are they very 'important'? Or should I focus on other languages like Cyrillic/Greek?
Up to now, no client has ever asked for Latin Extended-B, never the less, a lot of 'pro' typefaces seem to have his character set covered.
Thank you
I wanted to know which languages are part of the Latin Extended-B set. I can find the glyphs but not the languages that they support.
Are they very 'important'? Or should I focus on other languages like Cyrillic/Greek?
Up to now, no client has ever asked for Latin Extended-B, never the less, a lot of 'pro' typefaces seem to have his character set covered.
Thank you
Tagged:
1
Comments
-
The Unicode Latin Extended-B block contains a mix of diacritic characters from different sources:
Non-European and historic
This section includes a lot of letters used in African languages, as well as an assortment of Zhuang Chinese tone letters, a couple of Vietnamese letters (see also the Latin Extended Additional block for Vietnamese tone vowel diacritics), and a small number of archaic letters from older regional European alphabets.
African letters for clicks
Self-explanatory. Used for the Khoisan languages of southern Africa.
Croatian digraphs
These are an historical oddity. They were inherited into Unicode from a Yugoslav 8-bit national standard, and were encoded to provide a one-to-one mapping from the Serbo-Croat Cyrillic alphabet to the Serbo-Croat Latin alphabet. This allowed Yugoslav documents to be easily presented in either orthography simply by changing the font. Croatian nationalism in the 1990s made much of differences between Serbian and Croatian, so these characters are presumably obsolete.
Pinyin diacritic combinations
For Mandarin Chinese romanisation.
Phonetic and historical letters
A few used in African languages, some Uralist phonetic transcription characters, a couple used in one or more Sami aphabets, and more regional historical letters such as wynn.
Additions for Slovenian and Croatian
These are specialist diacritics used in prosody (analysis of metrical and stress patterns in poetry); they are not used for everyday Slovenian and Croatian text.
Additions for Romanian
Disunifying earlier encoding with corresponding Turkic -cedilla diacritics. Important.
Miscellaneous Additions
A couple of regional historic letters, letters from native North American alphabets, and the rest phoneticist characters.
Additions for Livonian
Recently moribund Finnic language; object of study but last native speaker died in 2013.
Additions for Sinology
IPA extensions used in transcription of classical Chinese.
(more) Miscellaneous Additions
More native North American letters, case pair additions for IPA letters previously encoded with only lowercase, odds and ends, dotless j.
My take on this is that there are very few fonts that would need to support the whole block. Unless one is either setting out to support all of Unicode, à la Noto, or providing fonts for broad academic publishing, à la Brill, only a subset of this block is likely to be necessary. The four Romanian diacritics are the most important for a font targeting European languages, as they correct an earlier encoding issue. If your font is supporting Vietnamese, then obviously you'll need the horn letters.15 -
Latin Extended-B is needed for
Languages: Romanian, Azeri, Vietnamese, Slovenian (Latin), Croatian (Latin), Sami, Khoisan, Zulu, a number of native american languages from West Canada, and several West-African languages which use the pan-African and pan-Nigerian alphabets. Also supports minority languages which use pan-Turkic alphabet, mainly less known idioms from small comunities inside Russia with roots linked to Latin script.
Transliterations: Pin Yin, Serbian Cyrillic translated to Croatian Latin
Old languages and orthographies: Zhuang, Gothic, Scots, Old Norse, Old English, Old Saxon and also legacy orthographies of West African languages.
Phonetics: sparse additions to IPA, APA and UPA.
Of course, the relevance of this block need to be evaluated in face of your audience and targets. But if you are aiming to wider market, Cyrillic represent more potential licensees.5 -
Thanks John and Igor, this really helps!0
-
Yup. Don’t do the whole block just for the sake of doing the whole block, unless you’re aiming to cover all Latin characters that are in Unicode.
There are two things you need to consider when looking at Latin Extended-B.
As John says, this is a mix of characters from various sources. The way they are grouped is also problematic, in particular the 'Non-european and historic' group which contains both characters used by millions and character used only in historical documents or document related to them.
As a client, it would make more sense to look for specific characters rather than the whole block.
The other thing is that some of the letters in Latin Extended-B have their uppercase or lowercase in a different Unicode Block, or some letters are only used in orthographies that also use some characters in other Unicode Blocks : IPA Extensions, Latin Extended-C, Latin Extended-D, Combining Marks.
2 -
Is there a list of the language tags corresponding to latin extended A and B ? Otherwise what must I do to for example if I want that all my subcaps work for any language ?
0 -
Is there a list of the language tags corresponding to latin extended A and B ? Otherwise what must I do to for example if I want that all my subcaps work for any language ?
Latin Extended-A corresponds to Central European encodings. Pretty much all the other Latin Extended-N blocks don’t correspond to any well-defined set of languages. Rather, they are “dumping grounds” for Latin characters that didn’t fit into the first three blocks of latin characters. So it makes far more sense to think first in terms of which languages and special needs (poetics, phonetics, historical uses, etc.) you want to support rather than thinking in terms of unicode blocks.5 -
Thanks ! That's a good idea. But for Roman language for example it seems that some characters are lacking in Latin Extended A and are included in Latin Extended B. Thus I should probably check which set of characters are needed for the language I would support.
0 -
I added the scommaaccent and tcommaaccent (which are part of Extended B ) and now my small caps in Roman language work fine. Thanks again !
0 -
Do you mean "Romanian"?1
-
@Mark Simonson Yes sorry0
-
I use the OTM by URW++ to check which languages are covered by my font.
0 -
-
@Vasil Stanev & @Paul Miller thanks a lot for these resources ! I am a newbie in this area and that's so valuable !
0 -
OTM and several other tools/resources use Unicode CLDR database. It's a good starting point for mapping characters and languages, but should be used with caution if trying to determine what characters are needed for a language. A good example is the legacy digraph characters that CLDR identifies as Croatian: these were inherited into Unicode from Yugoslav 8-bit encodings that were designed to enable one-to-one transcription between Cyrillic and Latin orthographies for Serbo-Croatian. So in a sense, yes, these are Croatian characters in that they represent sequences of letters used in the Croatian Latin orthography to write phonemes represented by a single letter in Cyrillic, but these legacy characters are not actually needed for Croatian, don't appear on Croatian keyboards and so forth.1
-
Thanks again.And a related question (which could even open another topic) : is it possible to define several different language specific kerning sets into the same font ?0
-
I have defined different kerning sets for different languages with Font Creator, I expect it is also possible with Fontographer.
0 -
You can define different kerning sets for different languages using OpenType GPOS kerning. In basic terms, you create separate lookups and assign them to different language system tags. Note, however, that application of language-specific kerning is dependent on a) text language being correctly tagged, and b) software recognising the tags and knowing to apply the specific OT language system instead of the default.
0 -
...Fontographer?0
-
@John Hudson Thanks a lot ! That's exactly what I expected but I wasn't absolutely sure of that.0
-
Fontographer certainly does not handle that. I think you could supplement/replace the usual kerning with manual coding in FontLab 5 or VI to get language-specific variations, if you wished, but you'd need to take care not to overwrite your special code with an auto-generated kern feature.1
-
I use the OTM by URW++ […]
For the record: DTL OTMaster (OTM) is a product of the Dutch Type Library (DTL). OTM is jointly developed with URW Type Foundry (formerly URW++). The programming is done at URW in Hamburg, Germany. DTL and URW work together since 1991.
0 -
Mark Simonson said:...Fontographer?
0 -
@Thomas Phinney Thanks, at the moment I work with FontForge and it can do it (and I am astonished about what it can do). I had Fontographer in the past and I liked it a lot. But there was a long gap (before OTF) where it was unusable by Windows users… and after that time I went to Linux. For some time I was also a (very newbie) Fontlab IV user, and I installed it also successfully on Linux with Wine, but I was already fairly comfortable with FF on Linux and its very useful cut and paste capabilities with Inkscape.I tried but never used auto-generated kerning in any program even if I understand that it can be an effective first step.0
-
I think what Thomas meant by the “auto-generated kern feature” comment was that most commercial tools compile the {kern} feature out of the source file’s kerning data at the time of generation, and if you’re manually writing a {kern} feature into your .fea file in order to implement some kind of language-specific kerning, you may need to take care to see that the compiler gives precedence to your manual {kern} feature and doesn’t replace it with whatever native kerning data you might have in the source file when compiling.
2 -
@John Hudson Where does the CLDR identify those legacy characters as Croatian? Maybe that used to be the case but has been fixed since? The default characters exemplar sets of Bosnian, Croatian and Latin Serbian are all [a b c č ć d {dž} đ e f g h i j k l {lj} m n {nj} o p r s š t u v z ž], and the auxiliary characters exemplar sets of Bosnian and Croatian are [q w x y] and that of Latin Serbian [å q w x y].1
-
I did see the digraphs recently reported as Croatian characters in tools that I believe use CLDR. It may be an issue of how those tools choose to interpret the {} inclusions.0
-
What I understand from the Unicode collation chart for Croatian is that the characters between braces need to be taken as a whole in alphabetical ordering of Croatian words.
I seem to be able to select individual characters (I mean select L and J individually) in that chart. On the other hand, if I copy the first line, paste it in a text editor and dump the unicode content of that file, I get those characters01C4 LATIN CAPITAL LETTER DZ WITH CARON 01C5 LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON 01C6 LATIN SMALL LETTER DZ WITH CARON 01C7 LATIN CAPITAL LETTER LJ 01C8 LATIN CAPITAL LETTER L WITH SMALL LETTER J 01C9 LATIN SMALL LETTER LJ 01CA LATIN CAPITAL LETTER NJ 01CB LATIN CAPITAL LETTER N WITH SMALL LETTER J 01CC LATIN SMALL LETTER NJ
I don't quite understand what is going on. I just checked and those characters appear to be hidden. They are not those that I could select as individual characters (LJ, lj etc).
PS: If I select only the section between LJ and m, I get the following output from hidden text (with blanks removed)01C7 LATIN CAPITAL LETTER LJ 006C LATIN SMALL LETTER L 0135 LATIN SMALL LETTER J WITH CIRCUMFLEX 004C LATIN CAPITAL LETTER L 0135 LATIN SMALL LETTER J WITH CIRCUMFLEX 004C LATIN CAPITAL LETTER L 0134 LATIN CAPITAL LETTER J WITH CIRCUMFLEX 006C LATIN SMALL LETTER L 01F0 LATIN SMALL LETTER J WITH CARON 004C LATIN CAPITAL LETTER L 01F0 LATIN SMALL LETTER J WITH CARON
Three hours later: On a better screen at home I can now see those "hidden" characters (and much better with a large font; my eyesight is no longer what it once was).0 -
The MySQL 8.0 documentation clearly saysCroatian collations are tailored for these Croatian letters: Č, Ć, Dž, Đ, Lj, Nj, Š, Ž.Three of those letters are digraphs, i.e. composed of two unicode characters (implying, so it seems, that we need to distinguish not only glyphs from characters, but also characters from letters!). There is no guarantee that the corresponding NFKC precomposed characters would be handled properly by MySQL when sorting on fields containing Croatian text and, for inter operability, I would not use the Unicode characters in the 01C4,01CC range in input files.0
-
@Kent Lew Yes, exactly what I was trying to say. Thank you for putting it so clearly.
0 -
Hudson's alert about CLDR is quite important. The CLDR has many errors. Some take it as main reference, like Underware, what makes their Latin Pro inconsistent. Take Portuguese, for example:
- The ò is not part of Portuguese alphabet. It can be called auxiliary to support older (pre-1973) orthographies, but not more than this.
- The ü is part of alphabet for countries which still did not adopt the 1990 reform, like Angola or Moçambique. Thus, to list it as auxiliary is wrong.
- The other characters (marked) are definetively not used.
1
Categories
- All Categories
- 43 Introductions
- 3.7K Typeface Design
- 806 Font Technology
- 1.1K Technique and Theory
- 622 Type Business
- 446 Type Design Critiques
- 543 Type Design Software
- 30 Punchcutting
- 137 Lettering and Calligraphy
- 84 Technique and Theory
- 53 Lettering Critiques
- 489 Typography
- 304 History of Typography
- 115 Education
- 70 Resources
- 500 Announcements
- 80 Events
- 105 Job Postings
- 149 Type Releases
- 165 Miscellaneous News
- 271 About TypeDrawers
- 53 TypeDrawers Announcements
- 117 Suggestions and Bug Reports