Hey there,
I wanted to know which languages are part of the Latin Extended-B set. I can find the glyphs but not the languages that they support.
Are they very 'important'? Or should I focus on other languages like Cyrillic/Greek?
Up to now, no client has ever asked for Latin Extended-B, never the less, a lot of 'pro' typefaces seem to have his character set covered.
Thank you
1
Comments
Non-European and historic
This section includes a lot of letters used in African languages, as well as an assortment of Zhuang Chinese tone letters, a couple of Vietnamese letters (see also the Latin Extended Additional block for Vietnamese tone vowel diacritics), and a small number of archaic letters from older regional European alphabets.
African letters for clicks
Self-explanatory. Used for the Khoisan languages of southern Africa.
Croatian digraphs
These are an historical oddity. They were inherited into Unicode from a Yugoslav 8-bit national standard, and were encoded to provide a one-to-one mapping from the Serbo-Croat Cyrillic alphabet to the Serbo-Croat Latin alphabet. This allowed Yugoslav documents to be easily presented in either orthography simply by changing the font. Croatian nationalism in the 1990s made much of differences between Serbian and Croatian, so these characters are presumably obsolete.
Pinyin diacritic combinations
For Mandarin Chinese romanisation.
Phonetic and historical letters
A few used in African languages, some Uralist phonetic transcription characters, a couple used in one or more Sami aphabets, and more regional historical letters such as wynn.
Additions for Slovenian and Croatian
These are specialist diacritics used in prosody (analysis of metrical and stress patterns in poetry); they are not used for everyday Slovenian and Croatian text.
Additions for Romanian
Disunifying earlier encoding with corresponding Turkic -cedilla diacritics. Important.
Miscellaneous Additions
A couple of regional historic letters, letters from native North American alphabets, and the rest phoneticist characters.
Additions for Livonian
Recently moribund Finnic language; object of study but last native speaker died in 2013.
Additions for Sinology
IPA extensions used in transcription of classical Chinese.
(more) Miscellaneous Additions
More native North American letters, case pair additions for IPA letters previously encoded with only lowercase, odds and ends, dotless j.
My take on this is that there are very few fonts that would need to support the whole block. Unless one is either setting out to support all of Unicode, à la Noto, or providing fonts for broad academic publishing, à la Brill, only a subset of this block is likely to be necessary. The four Romanian diacritics are the most important for a font targeting European languages, as they correct an earlier encoding issue. If your font is supporting Vietnamese, then obviously you'll need the horn letters.
Languages: Romanian, Azeri, Vietnamese, Slovenian (Latin), Croatian (Latin), Sami, Khoisan, Zulu, a number of native american languages from West Canada, and several West-African languages which use the pan-African and pan-Nigerian alphabets. Also supports minority languages which use pan-Turkic alphabet, mainly less known idioms from small comunities inside Russia with roots linked to Latin script.
Transliterations: Pin Yin, Serbian Cyrillic translated to Croatian Latin
Old languages and orthographies: Zhuang, Gothic, Scots, Old Norse, Old English, Old Saxon and also legacy orthographies of West African languages.
Phonetics: sparse additions to IPA, APA and UPA.
Of course, the relevance of this block need to be evaluated in face of your audience and targets. But if you are aiming to wider market, Cyrillic represent more potential licensees.
There are two things you need to consider when looking at Latin Extended-B.
As John says, this is a mix of characters from various sources. The way they are grouped is also problematic, in particular the 'Non-european and historic' group which contains both characters used by millions and character used only in historical documents or document related to them.
As a client, it would make more sense to look for specific characters rather than the whole block.
The other thing is that some of the letters in Latin Extended-B have their uppercase or lowercase in a different Unicode Block, or some letters are only used in orthographies that also use some characters in other Unicode Blocks : IPA Extensions, Latin Extended-C, Latin Extended-D, Combining Marks.
Latin Extended-A corresponds to Central European encodings. Pretty much all the other Latin Extended-N blocks don’t correspond to any well-defined set of languages. Rather, they are “dumping grounds” for Latin characters that didn’t fit into the first three blocks of latin characters. So it makes far more sense to think first in terms of which languages and special needs (poetics, phonetics, historical uses, etc.) you want to support rather than thinking in terms of unicode blocks.
For the record: DTL OTMaster (OTM) is a product of the Dutch Type Library (DTL). OTM is jointly developed with URW Type Foundry (formerly URW++). The programming is done at URW in Hamburg, Germany. DTL and URW work together since 1991.
I seem to be able to select individual characters (I mean select L and J individually) in that chart. On the other hand, if I copy the first line, paste it in a text editor and dump the unicode content of that file, I get those characters I don't quite understand what is going on. I just checked and those characters appear to be hidden. They are not those that I could select as individual characters (LJ, lj etc).
PS: If I select only the section between LJ and m, I get the following output from hidden text (with blanks removed) Three hours later: On a better screen at home I can now see those "hidden" characters (and much better with a large font; my eyesight is no longer what it once was).