Cyrillics I really need to bother with

While there's a time and a place for a fully decked-out Cyrillic Unicode range, I'm trying to come up with some better choices for where to draw the line. 

I've noticed a lot of fonts have a limited Cyrillic set that goes from 0400 to 045F.  0460 to 0489 are historical glyphs so I probably won't bother with those. But I don't know much about the 048A to 04F9 range. I know the 0490 Ґ and 0491 ґ are used in Ukrainian so I'll start including those in my Cyrillic set. Are there any other characters in that range which I should definitely include or not bother with?


Tagged:
«1

Comments

  • I always found @Thomas Phinney’s research for Adobe quite helpful.
  • Ray Larabie
    Ray Larabie Posts: 1,436
    Here's an updated image with with the glyphs mentioned in Thomas' article. In case my goal wasn't clear: I'm trying to decide what a slightly more ambitious Cyrillic set could look like...ending at 045F as many fonts do seems like a waste since, with just a few more glyphs, so many more languages could be covered. But I don't want to waste my time adding historical glyphs. I also want to avoid supporting languages that are almost extinct or transitioning to Latin. It's cruel but I can't support every language with every font.


  • Kent Lew
    Kent Lew Posts: 944
    edited December 2015
    Not that long ago, I had occasion to attempt to sort out something similar. I wound up parsing through the data in this Wikipedia entry: https://en.wikipedia.org/wiki/List_of_Cyrillic_letters

    And cross-checking with data here: http://www.eki.ee/letter/
  • I've been researching this same thing. The page Kent references on Wiki is an excellent one, and another Wiki page I found quite useful, especially the comparison chart at the very bottom, is here:
    https://en.wikipedia.org/wiki/Cyrillic_alphabets

    There is also the Bulgarian design differences issue which involves some of the lowercase.
  • Thank you, Kent.
  • Ray Larabie
    Ray Larabie Posts: 1,436
    Thanks, everybody. The glyphs I've highlighted seem to cover all of these except for these two languages.

    Abkhaz (7000 speakers/22 glyphs) ӶҔҼҾӠҞҨҦҬҲҴӷҕҽҿӡҟҩҧҭҳҵ
    Kildin Sami (600 speakers/18 glyphs) ЙҊӅӍӉӇҎҌӬйҋӆӎӊӈҏҍӭ
  • Kent Lew
    Kent Lew Posts: 944
    Frode — Thanks for directing me to that PDF. Kildin Sami was definitely one of the more difficult to find any consistent information for. It was hard to ascertain whether the lengthened vowels were truly alphabetic. And most of those do not have codepoints anyway.

    I see that in the version of my data I posted above, the Ӣӣ and Ӯӯ went missing. You’ll note my hedged comment at the end of that line about the combining macron. As far as I can tell the ӢӣӮӯ were encoded in Unicode for Tajik; but the rest of the “macroned” vowels for Kildin Sami were never included.

    Hard to tell what’s the best approach in a situation like this, where only a few of a pattern of related characters are encoded and the rest must be achieved with combining accents.

    But yeah, I suppose the precomposed ӢӣӮӯ should have been left in the listing. They are necessary but not sufficient. (Which may be true of some of the other langs as well; such is the lot of so-called “minority” languages.)
  • Is there reason to add Palochka glyph when people just type 1 or uppercase i in practice because there's no keyboard layout for it?
  • Ray Larabie
    Ray Larabie Posts: 1,436
    @Joon Park I came here to ask exactly the same question. If you click View all you can see how palochkas are represented in different fonts.

    http://www.fileformat.info/info/unicode/char/04c0/fontsupport.htm
    http://www.fileformat.info/info/unicode/char/04cF/fontsupport.htm
  • @Joon Park I came here to ask exactly the same question. If you click View all you can see how palochkas are represented in different fonts.

    http://www.fileformat.info/info/unicode/char/04c0/fontsupport.htm
    http://www.fileformat.info/info/unicode/char/04cF/fontsupport.htm
    Great to know many fonts still include them regardless.

    It's bit confusing though, I see lowercase being represented as uppercase i as well as lowercase L. Which is correct both in practice and semantically? 
  • Ray Larabie
    Ray Larabie Posts: 1,436
    edited December 2015
    Although the wiki doesn't clearly state what the deal is with the lowercase form, there's some helpful background on the talk page. https://en.wikipedia.org/wiki/Talk:Palochka

    I couldn't even find a web page that displays a lowercase palochka in a sentence. There are probably very few people on the planet who can answer this.
  • Kent Lew
    Kent Lew Posts: 944
    FWIW, the note about palochka in the Unicode Standard says this:
    Palochka. U+04C0 “I” CYRILLIC LETTER PALOCHKA is used in Cyrillic orthographies for a number of Caucasian languages, such as Adyghe, Avar, Chechen, and Kabardian. The name palochka itself is based on the Russian word for “stick,” referring to the shape of the letter. The glyph for palochka is usually indistinguishable from an uppercase Latin “I” or U+0406 “I” CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I; however, in some serifed fonts it may be displayed without serifs to make it more visually distinct.

    In use, palochka typically modifies the reading of a preceding letter, indicating that it is an ejective. The palochka is generally caseless and should retain its form even in lowercased Cyrillic text. However, there is some evidence of distinctive lowercase forms; for those instances, U+04CF CYRILLIC SMALL LETTER PALOCHKA may be used.
  • Kent Lew said:
    FWIW, the note about palochka in the Unicode Standard says this:
    Palochka. U+04C0 “I” CYRILLIC LETTER PALOCHKA is used in Cyrillic orthographies for a number of Caucasian languages, such as Adyghe, Avar, Chechen, and Kabardian. The name palochka itself is based on the Russian word for “stick,” referring to the shape of the letter. The glyph for palochka is usually indistinguishable from an uppercase Latin “I” or U+0406 “I” CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I; however, in some serifed fonts it may be displayed without serifs to make it more visually distinct.

    In use, palochka typically modifies the reading of a preceding letter, indicating that it is an ejective. The palochka is generally caseless and should retain its form even in lowercased Cyrillic text. However, there is some evidence of distinctive lowercase forms; for those instances, U+04CF CYRILLIC SMALL LETTER PALOCHKA may be used.
    does the lowercase palochka resemble an lowercase of L?
  • When you dig that much deep into the “which for what” issue: maybe you’re quicker arriving at a comprehensive solution when you do just all characters and thus be sure no one would miss anything – ?

  • Chris Lozos
    Chris Lozos Posts: 1,458
    you’re quicker arriving at a comprehensive solution
    This is also true of Joon Park post about Greek glyphs.  I am pretty sure you can draw the glyph quicker than you can unearth the reasonable degree of usage.
  • Kent Lew
    Kent Lew Posts: 944
    does the lowercase palochka resemble an lowercase of L?
    I do not think the matter is settled at all. The very notion of a lowercase palochka seems to be a matter of debate.

    We would need native speakers to weigh in on current preferences.

  • Ray Larabie
    Ray Larabie Posts: 1,436
    Since many of us here will spend the rest of our lives filling in these Unicode ranges over and over again, it pays to spend more time working out what to include, what not to include. If you're working on a long-term project where you intend to fill everything in, go for it. Filling in Unicode ranges without knowing how/if glyphs are ever going to be used wastes time in the long term and bloats fonts. New type designers, unsure of which glyphs they should include may look to existing fonts for guidance which perpetuates wrong/junk glyphs. Except in the case of comprehensive language fonts, we all decide which characters we're going to support and which characters we're not going to support. Knowing which forms are historical or deprecated is important in making those decisions.

    In the case of textured/distressed fonts. There's a breaking limit to the number of non-composite glyphs that can be included. Knowing which glyphs are deprecated, historical or rarely used can contribute to more language coverage and more stable fonts.

    For example, the long s: ſ. Even beginners know that this is an historical glyph. It's certainly appropriate in a comprehensive cover-everything font, a old-timey Caslon or a distressed pirate themed font but in an ultramodern design, it's clutter. There's a deprecated character right in the middle of Latin Extended A that we all know about: ʼn. There it is at the top of table 2. http://unicode.org/review/pr-122.html
    A lot of new fonts still include this glyph, not because of it's usefulness, but because it just happens to be in the middle of Latin extended A.

    Feel free to fill in everything if you want but perhaps we shouldn't use Unicode tables to decide where to stop.
  • Joon Park
    Joon Park Posts: 56
    edited December 2015
    0344 ( ̈́ ) COMBINING GREEK DIALYTIKA TONOS * 
    037E ( ; ) GREEK QUESTION MARK * 
    0387 ( · ) GREEK ANO TELEIA *
    20A4 ( ₤ ) LIRA SIGN 
    2126 ( Ω ) OHM SIGN * 

    Interesting proposal list, so discourage of use because of lack of practice or duplicate glyphs are in existence?

    Edit: NM, had to look up Normalization Form C.
  • Wei Huang
    Wei Huang Posts: 98
    Does anyone have reliable documentation on the localised forms of Cyrillic? 
  • Stefan Peev
    Stefan Peev Posts: 103
    edited July 2017
    Wei Huang I have started such a kind of documentation. Look at the Local Fonts (here). And here are the local forms by languages – Bulgarian Cyrillic Feature Locl, Serbian Cyrillic Feature Locl, Macedonian Cyrillic Feature Locl, Bashkir Cyrillic Feature Locl, Chuvash Cyrillic Feature Locl.






  • John Savard
    John Savard Posts: 1,135
    The document pointed out by Frode Bo Heiland reminds me of a political issue I stumbled across in listening to music on YouTube. It appears that the Sami are recognized as an indigenous people by Sweden, but Finland refuses to give them the same recognition.

    Given that Finnish, Estonian, and Sami are all very closely related languages, I would think that the Finns do have an excuse for this apparently retrograde political position. Recognizing the Sami as an indigenous people would suggest that the Finns themselves are an indigenous people - as opposed to a civilized people every bit the equal of Swedes, Frenchmen, and so on.

    But they could be recognized as a separate nationality, like Basques or Welshmen, without the Finns having to categorize themselves or anyone else as primitive savages. This would make everyone happy.
  • Josh_F
    Josh_F Posts: 52
    So are the combining diacritical marks truly necessarily for modern Cyrillic text?

    I recently came across the Lettersoup page on Bulgarian Cyrillic localized forms and under "Marks in the Cyrillic Script" section. They say...

    "Some characters in the Cyrillic script need marks but they do not have a Unicode and actually do not exist as precomposed characters."

    Is this true?
  • André G. Isaak
    André G. Isaak Posts: 634
    edited July 2017
    Yes, for full coverage of cyrillic you minimally need a combining dieresis, a combining breve (the cyrillic-looking kind), a combining macron, and a combing acute. I'm not sure about grave. Double-acute, double-grave, and inverted breve *might* be needed for serbian poetics but not for actual day to day use (they're used in the latin alphabet for this purpose, but I'm not 100% sure if they are used in cyrillic).
  • The Cyrillic set I use has all the usual Latin combining accents with the exception of circumflex, dot, ogonek and ring. I devised the set by referring to a variety of sources and as much as I could, determined they were valid. Most are unencoded and likely of use primarily in loanwords and transliteration. The time spent including them is so minimal I can't find a valid reason to omit them.
  • Sorry. I forgot to mention combining caron, which is also used.

    André
  • Josh_F
    Josh_F Posts: 52
    The Cyrillic set I use has all the usual Latin combining accents with the exception of circumflex, dot, ogonek and ring. I devised the set by referring to a variety of sources and as much as I could, determined they were valid. Most are unencoded and likely of use primarily in loanwords and transliteration. The time spent including them is so minimal I can't find a valid reason to omit them.

    The main reason I'm asking is because it will only add minimal time for someone who uses anchors, but I don't usually build accented glyphs with anchors, meaning it will add a bit of time to include any combining marks... 
  • Josh_F
    Josh_F Posts: 52
    Yes, for full coverage of cyrillic you minimally need a combining dieresis, a combining breve (the cyrillic-looking kind), a combining macron, and a combing acute. I'm not sure about grave. Double-acute, double-grave, and inverted breve *might* be needed for serbian poetics but not for actual day to day use (they're used in the latin alphabet for this purpose, but I'm not 100% sure if they are used in cyrillic).

    Hi André (or anyone else who wants to join in). A couple more Cyrillic combining accent questions if you don't mind...

    when you say for "full coverage of cyrillic" are you saying combining diacritics are necessary in day to day use, like an é in French or ñ in Spanish? 

    Or are you saying more so to cover every possible orthographic need in things like grammar books and dictionaries for pronunciation and showing stress?

    The reason I ask is because I've been trying to see how extensive the Cyrillic language support is from some of the larger foundries, and while I know Hoefler isn't known for making Cyrillic fonts, Gotham contains no combining diacritics, yet mentions...

    "A survey into linguistic, cultural, political, economic, and technological conditions in the region, along with a review of typography created by native speakers, led to H&Co’s Cyrillic-X character set, which is included standard in all Gotham packages. Consulting with H&Co on the project were two Cyrillists: Maxim Zhukov, former Typographic Coordinator to the United Nations, and Ilya Ruderman, creator of the Type & Typography program at the British Higher School of Art and Design in Moscow."   https://www.typography.com/fonts/gotham/features/gotham-language-support

    Commercial Type hired Ilya Ruderman as well for their Cyrillic extensions and none of their typefaces contain combining accents.

    This is by no means trying to question your knowledge, it is more so me just trying to sort out my confused and uninformed mind.