Rather than start a new thread every time I have a question about some obscure glyphs, I figured it'd be better to create a new thread as a catch-all. Feel free to chime in with your own questions.
This isn't about whether or not certain glyphs are valid or worth bothering with. It's about knowing what they're used for some you can make a better judgment call as to whether or not including them. For example, if you're making a text font that might be used in a dictionary, you might want to include
IPA characters. If you're making a display font, you might want to leave those out.
Anyway, today, I'm fooling around with
Latin Extended Additional and I got to the 1E80-1E85 range. ẀẁẂẃẄẅ
I've been including these glyphs for about a decade simple because someone emailed me and told me that they're needed for Irish...or was it Welsh. It's been a while; I can't remember. Anyway, I looked them up today and I can't figure out who, if anyone, actually uses these. Any ideas about these accented W's?
Comments
http://www.200words-a-day.com/typing-welsh-characters.html
https://fr.wikipedia.org/wiki/Diacritiques_de_l'alphabet_latin#Tableau_r.C3.A9capitulatif
It's in French, but seems pretty easy to understand (at least for the table part). Unfortunately, I didn't find an English equivalent.
As for the Tironian et, I don't recall every seeing one in my 2 1/2 years here. FWIW, Irish road signs use a lot of archaic forms. They're all set in a proprietary version of MOT Transport.
My understanding from Michael Everson is that both the traditional and reformed orthographies have official status, although the former is limited in contemporary use and mostly found only when the traditional script form is also used.
Uralicists. These characters are not, to my knowledge, used in any natural language orthographies. They are part of a system of the Uralic Phonetic Alphabet, used by linguists primarily in reconstructions of proto-Finno-Ugric and -Samoyed languages.
www.eki.ee/letter/ is useful for more common glyphs. For example: Latin Extended B hasn't been filled in much. The Serer language uses Ƈ (0187), but no languages are listed here. Over a million people speak a language that uses that glyph but it's hard to determine that from any of the links mentioned.
Very often, the French wikipedia pages have more detailed explanations of alphabets. For example: Serer French vs Serer English
One of the more useful resources I've been using is Omniglot.
Ẁ is used in Welsh, Ẃ in Lower Sorabian, they could also be used in some tonal languages. Ẅ is used in a few Cameroon languages.
Serer has used Ƈ for a while. That was also made official in a Senegalese decree from 2005.
A good criteria is to check the colonial language, like Spanish for Central and South America, French for Western Africa and Pacific Islands, Dutch for Caribbean Islands and so on. Another criteria is to verify the Deutsch version as Germans have a long and deep tradition on linguistic studies. And, surprisingly, some Russian pages are quite good about Central and East European languages.
An example: the extinct Polabian language has a good page in English, but the German page is better and the Russian one is impressive.
I already got caught in a search error, kindly corrected by Nicolas Silva here in TypeDrawers: the Brazilian stuff I read about Guarani did not mention G̃, but the studies published in Spanish are more complete and include it in the alphabet. As Guarani is limited to very few regions of Brazil, but widely used in Paraguay, Bolivia and north of Argentina, I should suppose the best source wouldn't be in Portuguese.
A less known way to get info is to read proposals submitted to Unicode. These documents are usually quite informative, with samples of use and historical background, what may be extremely relevant to define how to design the characters. SIL has a page with their proposals, but most of them are spread over the web.
fileformat.info is a really sweet site that will give you a list, with links, of many fonts that support the Unicode point you're looking for.
http://www.fileformat.info/info/unicode/char/03a7/fontsupport.htm
It gives a list of at least a hundred fonts that have chi glyphs. Many of them available for free download.
Rich
Frode: In my searches, I found several fonts using the bar crossing the G descender also when it is a one-bowl g. And none using it at the upper position. So I keep the position regardless it is an one- or two-bowl G.
Regarding Kadiwéu, the language was documented by the first time in 1977 by a couple of SIL linguists. They used a hyphen over G as this is easily achieved with typewriters. So, the Kadiwéu variant may be as your right sample –this is also the way I built it. Publications with Kadiwéu are very rare and, due to lack of fonts, no one used the barred-G except those made with typewriters.
There are no other stroked letters in that alphabet so it doesn't have to match anything. Personally, if my typeface didn't already have a hooked g, I'd combine q, dotless j and put the stroke under the bowl. I don't think it can look attractive with a binocular g, except in lighter weights. A bold Ǥ is problematic if your G has a horizontal stroke. Once I dealt with this be removing the horizontal stroke, extending the horizontal stroke and adding the bar.
Some glyphs were created to punish type designers.
But I do have a partial answer to your question if I understand it correctly.
In browsers, at least, the behavior described in the Wikipedia entry for Zero-Width Space is accurate and you can even test that behavior right there on the Wikipedia page itself by resizing the browser window.
It says:
"In HTML pages, the zero-width space can be used as a potential line-break in long words as an alternative to the
<wbr>
element."And so the zero-width space is not simply empty space. It's more akin to a control character. But one that only kicks in under certain circumstances such as when the viewport is too small to display an unbroken string of text and the character has been inserted at the preferred breakpoints.
AFAIK - it's most useful when you've got a long URI and you don't want it to break in a weird spot in a small viewport. Maybe long place names, too - I'd have to think about it.
And yeah, they are zero width blanks.
SOTA TypeCon presentation evaluators take heed!
If a font contains U+034F, then yes, it should probably be zero-width, no-outline, and the same obviously true for U+200B.
U+200C and U+200D are a little different. They can be no-outline glyphs, but for scripts in which these are used as layout control characters is is helpful to have visual representations for editing purposes. Software like MS Word has an option to display control characters in text, and does so by using glyphs in the font; when this option is disabled — most of the time — display of these glyphs is suppressed. Conventions for display of these and other layout control characters varies, but typically involves a thin vertical bar to make it easy to identify the insertion point in text when displayed, topped by a small symbol indicating the character. And, of course, all zero width. These are the forms I have come to favour, the first few following Microsoft conventions:
The chi or lambda or most Greek lowercase are not so tightly controlled a system as latin glyphs. Greek is more free-flowing and not so dependent on geometry as Latin may appear. Think of it as crafted writing rather than construction and architecture. While the modern more Latinized Greek fonts do attain more rigidity to construction, they do not shy away from this and embrace it. The more traditional Greek forms for lowercase have more the feel of Matisse gesture drawings with vitality. They flow together rather than fit together. Latinized Greek is more like soldiers marching in step while the more humanized Greek forms are line-dancing in harmony with each other.
Two more questions:
Spacing Modifier Letters (02B0-02FF) contains the non-zero width accents we usually include in our fonts. For example: ring at 02DA. Apart from using these glyphs in composites, when are these actually used? We already have a combining ring at 030A. Who actually uses the 02DA ring? Do some applications use these as combining accents?
If I'm using regular accents for lowercase and unencoded compact accents for capitals, how do I deal with combining accent substitution? I've got @comb, @combcap and @cap classes so I can check if the preceding glyph is a capital and substitute the alternate combining accent. Should these combining accents be placed at lowercase height so applications will then raise them to cap height? If I place those alternate combining accents at cap level, will applications bump them up too high over the capitals?
Regarding the accent substitution, this is made with ccmp and marks. You define, for example, that á is made by a+acutecomb while Á is made by A+acutecomb.uc.
John Hudson said: The Wikipedia page for 034F does a good job of explaining. With Hebrew examples, too.
Sorry to go off topic, but wasn't Primordial Xerox the name of a heavy metal band from the eighties?