Hi, I’m looking for a solution to add Bulgarian glyphs with grave that don’t have a unicode, such as ъ̀ (ъ with grave) or Я̀ (Я with grave). What’s the Bulgarian way of solving this, what does a real-life solution look like? The most obvious *technical* solution is to combine “Я” and “Combinging Grave Accent” but is that the way Bulgarians deal with this?
I've seen one solution where the Bulgarian “ъ̀” (Cyrillic hard sign
0x044A + with grave) is simply put on “њ” (Cyrillic nje 0x045A) but
that badly messes with the unicode standard, which we really do not want to
do.
Comments
On Mac OS the keyboard Bulgarian Phonetic has a dead key for grave accent, but then translates to
Other Bulgarian keyboards have no grave accent.
It's up to the user to solve this. E. g. for historic English, German or Latin I need my own customised keyboards for historic characters (long_s, rotunda, combining \e above). Or have a collection of seldom characters in a file and use copy & paste.
The only feature you can support on font level in this case is a stylistic set (default off) to substitute 'њ' by the glyph of
Solution on font level is always ugly and needs documentation.
- Local Fonts
- Omniglot
- Wikipedia (English)
- Wikipedia (Bulgarian)
- r12a Apps
The internal database of macOS' FontBook also doesn't include other characters.Independently of this, combinations not included in Unicode should be built with combining diacritic + base letter.* You don't need to add the composite in your font, just the letter and the diacritic. But the font needs proper <mark> and <mkmk> features to provide correct diacritic positioning.**
How these combinations will be typed is up to the OS and its keyboard layouts. Although you can include some substitution rule into the font, this is hardly needed for well established languages, as Bulgarian. Special input methods are used for languages with poor or no support in OS.
* This is the Unicode approach, but only for diacritics that can be positioned above or below the base letter. Combinations with diacritics crossing the base letter are eligible for encoding.
** I'm not adding explanation about them because there are lots of very good stuff about <mark> and <mkmk> on the web.
@Martin Wenzel The best route will be speaking with writers/readers of Bulgarian directly, but there are few hacks to get an impression of how Bulgarians may be dealing with this.
For instance:
From that wiki, I see that the grave accents are used to indicate stressed syllables, just as acute accents are used for that purpose in Russian.
@Mark Simonson these characters will never be added to Unicode as precomposed characters because they are already in Unicode a composed sequences of characters. Unicode stopped adding canonically equivalent precomposed characters almost two decades ago.
The situation with Burmese wasn’t to do with ‘computers with the ability to handle Unicode properly were too expensive for people there’. The problem was that there were no computers that handled Burmese OpenType shaping properly. The embargo against the military regime made it impossible/unattractive for foreign companies to do business in Myanmar, which led to very slow implementation of Myanmar script shaping in operating systems, leading to development a local hack encoding/shaping model to fill the gap.
This is (one of the reasons) why normalisation exists and is part of Unicode.
- on vowels in individual words to avoid ambiguity [сèдмица (week) vs. седмѝца (seven)]
- on vowels in the particle [по] for grading nouns, verbs and prepositional combinations [пò юнак, пò обичам, пò към тебе, още пò на запад]
- in case of transcribed proper names from foreign languages, if it is necessary to indicate the place of the stress [Мàртин and Мартѝн, Àгата and Агàта, Ивàнов and Иванòв]
- in specialized publications such as dictionaries, reference books, encyclopedias [кàжа, кàжеш, кàжем; кàзах, кàза; кàжех, кàжеше; кàзал; кàжел; кàзан; кажѝ!, кажèте!]
- in the short forms of the personal and possessive pronouns for the third person feminine singular [Трябва да ѝ изпратиш поздравления. Книгата ѝ лежеше на масата.]
However, all these cases do not refer to the usual practice of using the Bulgarian literary language. In other words, accented vowels are rarely used. Only the accented letters Ѝ (uni040D) and ѝ (uni045D) are used extremely often, but they have separate unicode numbers.
That is precisely the use case illustrated by the парà/пàра example provided by Martin’s client. Because such distinctions are not always marked in Bulgarian text does not mean that they do not sometimes need to be marked, and the fact that use cases for the grave mark are explicitly described in the official spelling dictionary indicates that these cases should be supported in fonts for Bulgarian text.
None of these platform includes combining marks in the standard keyboard layouts for most languages, and many fail to even include proper punctuation (as in emdashes on Windows). Even touch keyboards with flyouts à la iOS include absurd subsets of accented letters: č or ž exists in iOS English keyboards but ý doesn't.
And the funny thing is: keyboard layouts are easy to create, they're TINY and OSes can easily ship multiple layouts per language.
I think the type community would benefit from teaming up with linguists to create a set of »Typo« layouts for major languages, esp. the LCG ones (bacause those are the ones that stay behind the most).
Make a common repository, submit them to CLDR/Unicode, produce the layouts in various formats, and submit them for inclusion in various OSes.
There are even 3rd-party keyboard vendors for mobile apps (I use Google’s Gboard on iOS). Those vendors compete, and they have a business case (data collection, machine learning for better text input prediction). I wouldn't be surprised is one of these vendors would agree to sponsor such a project.