The Mysteries of the Unicode Table

Rather than start a new thread every time I have a question about some obscure glyphs, I figured it'd be better to create a new thread as a catch-all. Feel free to chime in with your own questions.

This isn't about whether or not certain glyphs are valid or worth bothering with. It's about knowing what they're used for some you can make a better judgment call as to whether or not including them. For example, if you're making a text font that might be used in a dictionary, you might want to include IPA characters. If you're making a display font, you might want to leave those out.

Anyway, today, I'm fooling around with Latin Extended Additional and I got to the 1E80-1E85 range.  ẀẁẂẃẄẅ
I've been including these glyphs for about a decade simple because someone emailed me and told me that they're needed for Irish...or was it Welsh. It's been a while; I can't remember.  Anyway, I looked them up today and I can't figure out who, if anyone, actually uses these. Any ideas about these accented W's?
Tagged:
«1

Comments

  • Welsh. Also ŵ. But relatively rare.

    http://www.200words-a-day.com/typing-welsh-characters.html
  • Thanks. Here it is down in the section on diacritics: en.wikipedia.org/wiki/Welsh_orthography
  • Belleve InvisBelleve Invis Posts: 157
    edited January 2016
    There's even something more interesting. For example, do you know Unicode has two half o's? That's U+1D16 & U+1D17 (ᴖ and ᴗ). I am really curious about who use them.
  • Ray, do you already have this link in your collection?

    https://fr.wikipedia.org/wiki/Diacritiques_de_l'alphabet_latin#Tableau_r.C3.A9capitulatif

    It's in French, but seems pretty easy to understand (at least for the table part). Unfortunately, I didn't find an English equivalent.
  • edited January 2016
    Another mystery of the British Isles: The Tironian et (⁊).

    The Microsoft Scottish Gaelic style guide states:

    Ampersand: Gaelic (along with Irish) requires the use of an additional punctuation mark called the (left-facing) Tironian Ampersand. This is located at U+204A (⁊). The mathematical operator U+2510 (┐) is also commonly used if there are font issues with U+204A.

    http://tinyurl.com/gs7dyx4

    Evertype gives it as required for Irish Gaelic
    http://evertype.com/alphabets/irish-gaelic.pdf

    It is used in the bilingual Irish road signs
    https://stancarey.wordpress.com/2014/09/18/the-tironian-et-in-galway-ireland
    https://www.flickr.com/photos/underware/8572256718/#comment72157633046120788
    https://en.wikipedia.org/wiki/Road_signs_in_the_Republic_of_Ireland

    The Tironian et died everywhere except for in Ireland, where it still lives in the wild. 

    http://english.stackexchange.com/questions/200677/how-did-7-come-to-be-an-abbreviation-for-and-in-old-english


    Tironian notes are still used today, particularly, the Tironian "et", used in Ireland and Scotland to mean and (where it is called agus in Irish and Scottish Gaelic), and in the "z" of "viz." (for 'et' in videlicet).

    https://en.wikipedia.org/wiki/Tironian_notes

    If one adds the tironian et (⁊), should one also add ḃ ċ ḋ ḟ ġ ṁ ṗ ṡ ṫ? These letters are commonly transliterated to letter+h in modern Irish orthography. Following the same logic, also, a dotless i would be required, as one can observe in the earlier mentioned parking sign. The thing is, this dotless i is technically just a stylistic variant of i.

    https://en.wikipedia.org/wiki/Irish_orthography

  • Georg SeifertGeorg Seifert Posts: 413
    edited January 2016
    Wasn’t decodeunicode.org meant to document this kind of stuff?
  • And I usually look at: www.eki.ee/letter/
  • edited January 2016
    I also look to these, but they are not always correct. (For example, unlike what EKI claims, Turkmen has officially been written in the Latin script since 1991.) Neither adds much about the Tironian Et. Decode Unicode seems to imply the “z” in viz is a yogh (ȝ), but other sources contradict that.
  • If one adds the tironian et (⁊), should one also add ḃ ċ ḋ ḟ ġ ṁ ṗ ṡ ṫ? These letters are commonly transliterated to letter+h in modern Irish orthography.
    I think ḃ ċ ḋ ḟ ġ ṁ ṗ ṡ ṫ would only be useful for scholarly work. The Irish government mandated their replacement by a boatload of initial h's sometime in the 1960s, and I believe pretty much all daily readers of Irish are used to reading this way.

    As for the Tironian et, I don't recall every seeing one in my 2 1/2 years here.  FWIW, Irish road signs use a lot of archaic forms.  They're all set in a proprietary version of MOT Transport.


  • I think ḃ ċ ḋ ḟ ġ ṁ ṗ ṡ ṫ would only be useful for scholarly work. The Irish government mandated their replacement by a boatload of initial h's sometime in the 1960s, and I believe pretty much all daily readers of Irish are used to reading this way. 

    My understanding from Michael Everson is that both the traditional and reformed orthographies have official status, although the former is limited in contemporary use and mostly found only when the traditional script form is also used.
  • There's even something more interesting. For example, do you know Unicode has two half o's? That's U+1D16 & U+1D17 (ᴖ and ᴗ). I am really curious about who use them.

    Uralicists. These characters are not, to my knowledge, used in any natural language orthographies. They are part of a system of the Uralic Phonetic Alphabet, used by linguists primarily in reconstructions of proto-Finno-Ugric and -Samoyed languages.
  • Wasn’t decodeunicode.org meant to document this kind of stuff?
    I thought that was what scriptsource.org is for. But those are places to post answers, this can be the place to ask the questions.
  • edited January 2016
    Everson also states both (Irish) orthograpies use the Tironian Et. 

    http://evertype.com/alphabets/irish-gaelic.pdf
  • What is so strange about the Tironian Et?
  • Ray, do you already have this link in your collection?
    I've encountered it but I usually get to the same destination by Googling the glyph + wiki.

    www.eki.ee/letter/ is useful for more common glyphs. For example: Latin Extended B hasn't been filled in much. The Serer language uses Ƈ (0187), but no languages are listed here. Over a million people speak a language that uses that glyph but it's hard to determine that from any of the links mentioned.

    Very often, the French wikipedia pages have more detailed explanations of alphabets. For example: Serer French  vs  Serer English

    One of the more useful resources I've been using is Omniglot.
  • Unicode’s Latin extensions are a mixed bag of things. Many IPA characters needed for some language orthographies and have uppercase in the Latin extensions, and there’s also a couple that don’t have uppercase. So if large language support is part of the scope of your project, don’t leave it out completely.

    Ẁ is used in Welsh, Ẃ in Lower Sorabian, they could also be used in some tonal languages. Ẅ is used in a few Cameroon languages.

    Serer has used Ƈ for a while. That was also made official in a Senegalese decree from 2005.


  • edited January 2016
    @Denis Moyogo Jacquerye What is your source for the ẃ in Sorbian? I thought that was discontinued, along with b́, ṕ and ḿ.
  • @Frode Bo Helland You’re right, sorry. It was used in Lower Sorbian.
  • Igor FreibergerIgor Freiberger Posts: 93
    edited January 2016
    Regarding Wikipedia, it is mostly a decent source of information about languages and alphabets. But many times the English version is not the best one, as Ray already pointed. 

    A good criteria is to check the colonial language, like Spanish for Central and South America, French for Western Africa and Pacific Islands, Dutch for Caribbean Islands and so on. Another criteria is to verify the Deutsch version as Germans have a long and deep tradition on linguistic studies. And, surprisingly, some Russian pages are quite good about Central and East European languages.

    An example: the extinct Polabian language has a good page in English, but the German page is better and the Russian one is impressive.

    I already got caught in a search error, kindly corrected by Nicolas Silva here in TypeDrawers: the Brazilian stuff I read about Guarani did not mention G̃, but the studies published in Spanish are more complete and include it in the alphabet. As Guarani is limited to very few regions of Brazil, but widely used in Paraguay, Bolivia and north of Argentina, I should suppose the best source wouldn't be in Portuguese.

    A less known way to get info is to read proposals submitted to Unicode. These documents are usually quite informative, with samples of use and historical background, what may be extremely relevant to define how to design the characters. SIL has a page with their proposals, but most of them are spread over the web.
  • Does anyone have any tips for lowercase chi? I can get a handle on most of the Greek alphabet but that one letter eludes me. I went through the docs on Gerry Leonidas' site. But I still can't figure chi out. What other letters does the stroke relate to? What other letters can I use for reference for the angle or width? Pictured below is Corbel's chi.
  • Ray,

    fileformat.info is a really sweet site that will give you a list, with links, of many fonts that support the Unicode point you're looking for. 
    http://www.fileformat.info/info/unicode/char/03a7/fontsupport.htm
    It gives a list of at least a hundred fonts that have chi glyphs.  Many of them available for free download.

    Rich

  • I know what a chi looks like, i just don't get how it relates to the rest of the Greek lowercase. Like sometimes it's drawn like a lowercase x that reaches the descender line. But that type of chi design seems to be restricted to designs with heavy Latinization such as ones with the eta and upsilon with no leading curl. In Corbel, the chi stroke goes past the left sidebearing...and it looks correct to me. The chi seems to relate the the lambda but has hardly any resemblance to a lowercase x. The stroke gets close to the eta's descender and I'm guessing that's how you're supposed to determine how far past the left sidebearing it can go. Does anyone have any good chi strategies?
  • edited February 2016
    Is it acceptable to run the bar of ǥ through the bowl? Specifically, the Sami variant. The Brazilian language Kadiweu also uses it (on the right).


  • Igor FreibergerIgor Freiberger Posts: 93
    edited February 2016
    Ray: I would make lowercase chi coherent with lowercase lambda.

    Frode: In my searches, I found several fonts using the bar crossing the G descender also when it is a one-bowl g. And none using it at the upper position. So I keep the position regardless it is an one- or two-bowl G.

    Regarding Kadiwéu, the language was documented by the first time in 1977 by a couple of SIL linguists. They used a hyphen over G as this is easily achieved with typewriters. So, the Kadiwéu variant may be as your right sample –this is also the way I built it. Publications with Kadiwéu are very rare and, due to lack of fonts, no one used the barred-G except those made with typewriters.

  • @Frode Bo Helland 
    There are no other stroked letters in that alphabet so it doesn't have to match anything. Personally, if my typeface didn't already have a hooked g, I'd combine q, dotless j and put the stroke under the bowl. I don't think it can look attractive with a binocular g, except in lighter weights. A bold Ǥ is problematic if your G has a horizontal stroke. Once I dealt with this be removing the horizontal stroke, extending the horizontal stroke and adding the bar.

    Some glyphs were created to punish type designers.
  • Ray LarabieRay Larabie Posts: 554
    Is combining grapheme joiner (034F) the same as zero width non joiner (200C) and zero width space (200B). As far as font construction goes, are all three of these simply zero width blanks?



  • Richard FinkRichard Fink Posts: 163
    edited March 2016
    Is combining grapheme joiner (034F) the same as zero width non joiner (200C) and zero width space (200B). As far as font construction goes, are all three of these simply zero width blanks?
    Ray, I sent in a proposal for a presentation at TypeCon in Seattle this year titled "Empty Space Characters In Modern Character Sets" and I was unaware of the grapheme joiner 034F until your post, so thanks. (Gotta find out what that's about.)
    But I do have a partial answer to your question if I understand it correctly. 

    In browsers, at least, the behavior described in the Wikipedia entry for Zero-Width Space is accurate and you can even test that behavior right there on the Wikipedia page itself by resizing the browser window.
    It says:
    "In HTML pages, the zero-width space can be used as a potential line-break in long words as an alternative to the <wbr> element."

    And so the zero-width space is not simply empty space. It's more akin to a control character. But one that only kicks in under certain circumstances such as when the viewport is too small to display an unbroken string of text and the character has been inserted at the preferred breakpoints.
    AFAIK - it's most useful when you've got a long URI and you don't want it to break in a weird spot in a small viewport. Maybe long place names, too - I'd have to think about it.

    And yeah, they are zero width blanks. 

    SOTA TypeCon presentation evaluators take heed! 

  • John HudsonJohn Hudson Posts: 948
    Is combining grapheme joiner (034F) the same as zero width non joiner (200C) and zero width space (200B). As far as font construction goes, are all three of these simply zero width blanks?

    If a font contains U+034F, then yes, it should probably be zero-width, no-outline, and the same obviously true for U+200B.

    U+200C and U+200D are a little different. They can be no-outline glyphs, but for scripts in which these are used as layout control characters is is helpful to have visual representations for editing purposes. Software like MS Word has an option to display control characters in text, and does so by using glyphs in the font; when this option is disabled — most of the time — display of these glyphs is suppressed. Conventions for display of these and other layout control characters varies, but typically involves a thin vertical bar to make it easy to identify the insertion point in text when displayed, topped by a small symbol indicating the character. And, of course, all zero width. These are the forms I have come to favour, the first few following Microsoft conventions:




  • John HudsonJohn Hudson Posts: 948
    BTW, note that the 'combining grapheme joiner' is a bit of an oddity, not least because it doesn't join graphemes. There was a period when it looked like it might be deprecated, but then it was found to be useful for preventing mark reordering during normalisation, which is necessary for Biblical Hebrew.
Sign In or Register to comment.