A new character?

Andreas Stötzner · April 2022

Image: https://us.v-cdn.net/5019405/uploads/editor/gb/fi1f3oll9kkx.png

I consider to propose a new character. It is going to be another umlaut for German.

John Savard · April 2022

It's interesting that a company chose to do this in one of their brand names. Perhaps to make it more German-looking. But before we could consider adding the character to Unicode - or the German alphabet - one important thing needs to be known. What sound is it supposed to represent?

Or could there be found somewhere a language other than German in which the eszet could be a useful character, and the sound it would represent happens to have two variants, so an accented version is needed?

Jens Kutilek · April 2022

Perhaps the umlaut marks the ß as uppercase, because it otherwise looks like a lowercase design here?

K Pease · April 2022

It says to non-Germans, "pay attention, this is not a B". It's a dieresis, kind of.

John Savard · April 2022

I think I have found a use for this letter, to help provide a replacement alphabet for one of the Slavic languages (which I propose to call East Ruthenian):

Image: https://us.v-cdn.net/5019405/uploads/editor/v3/r6z7f9y1mg22.jpg

The alphabet is based on the Latin alphabet; its intent is to help dampen harmful extreme nationalist sentiments. The umlaut is used to distinguish zh from z, a y that is used to modify the sound of a preceding vowel from a y that represents a vowel sound, ch from ts, and shch from sh, as well as palatalized forms of u and a from their ordinary forms, as well as to distinguish v from an alternate form of the letter f, and i from an alternate form of i.

Note that the Latin letter for z is not used, due to its having been abused, and is instead replaced by the Ukrainian letter used for g.

Some missing letters, required for returning the language to its correct spelling, have been restored, and they remain in their Cyrillic form.

James Puckett · April 2022

This is Georg’s fault. Nobody would do this if Glyphs didn’t make it so easy to create fonts with combining marks.

Chris Lozos · April 2022

Perhaps it means BS ;-)

Scott-Martin Kosofsky · April 2022

I had always associated this kind of unnecessary accenting with the byways of U.S. commerce. There was a Chrysler car in the 1970s called the “Volaré,” as if the name needed an acute accent to get the pronunciation right. And there is the much-loved ice cream brand “Häagen-Dazs,” which was invented in the Bronx by a couple who wanted to give it a “Danish-sounding name.” Neither name is is even remotely Danish, but who cares, anyway?

Now we have an über-Deutsch clothing brand, über-identified in case the ß wasn't enough to tip you off. So, how do we pronounce the ß-umlaut? I suggest that it should be über-sibilant, enough to produce a bit of spittle.

André G. Isaak · April 2022

I blame Spin̈al Tap.

Nick Shinn · April 2022

Respect where respect is due: rock band Mötley Crüe often gets credit for this pseudolinguistic genre, in 1981, but perhaps punk band Hüsker Dü was earlier, named after a Danish board game.
There is a minor tradition of archly mis-spelled band names, from the 1960s at least—Beatles, Byrds, Monkees, Led Zeppelin, etc.

Alan Bernard Hughes · April 2022

André G. Isaak said:

I blame Spin̈al Tap.

Chrome 100

Edge 100

Firefox 99

IE 11 (soon to be retired)

All these were viewed on Windows 10. It is 38 years since the band found fame, and Firefox still can’t spell their name correctly.

Rob Barba · April 2022

André G. Isaak said:
I blame Spin̈al Tap.

Everyone does.

Adam Jagosz · May 2022

Actually Firefox is not to blame. It just has a different way of handling combining marks when mark positioning data is not present in the font. Note that on Chromium browsers the dieresis is also not positioned correctly above the n, but a tad too low. That's right, fonts served from Google Fonts do not contain mark and mkmk features. They are stripped of most non-essential features to save bandwidth.

In 2022, you should really self-host webfonts, especially those from Google — font cache can no longer be shared between domains, and GF is actually kinda slow anyway. And you absolutely must self-host if you want to use all fanciness built into a font.

(By self-hosting a GF I mean downloading the untampered OTFs/TTFs from GF site and converting them to WOFF2 through FontSquirrel's generator — or downloading WOFF2 directly using this nifty helper — uploading the files to your server, and linking to them in CSS). Typedrawers.com, I'm looking at you.

Helmut Wollmersdorfer · May 2022

Adam Jagosz said:

Actually Firefox is not to blame. It just has a different way of handling combining marks when mark positioning data is not present in the font. Note that on Chromium browsers the dieresis is also not positioned correctly above the n, but a tad too low. That's right, fonts served from Google Fonts do not contain mark and mkmk features. They are stripped of most non-essential features to save bandwidth.

As a user I would expect that popular fonts (delivered by the OS vendor) work in the major browsers. Nice to see, that this is not the case for U+0308 COMBINING DIAERESIS, which can not be considered exotic. It's reasonable, that e.g. an environment supporting fall-back fonts, takes the base AND the combining character from a different font, if the combing character is not defined by the font.

OK, \m and \n have no predefined code points in Unicode for some combining characters. As the combinations are seldom, fonts (except some special ones) don't include predefined glyphs for them. But I would expect them to combine in the right way, even if not perfect.

E.g. TextEdit on MacOS does it well, the Terminal (command line) and Google Chrome too.

BBedit (a general purpose text editor) has funny (broken) behaviour:

Image: https://us.v-cdn.net/5019405/uploads/editor/xf/ay1ffrf31jvo.png

For U+0364 COMBINING LATIN SMALL LETTER E BBedit does it right combined with \a but not not combined with \ä, where the combining should be placed above the diaeresis:

Image: https://us.v-cdn.net/5019405/uploads/editor/o8/n1jerz8h8189.png

Didn't look into the font if it uses mark and mkmk.

This is how it looks in TextEdit for some fonts (not perfect vertical position in Arial, Times and Maguntia):

Image: https://us.v-cdn.net/5019405/uploads/editor/5l/73o4394zpz4b.png

Adam Jagosz · May 2022

Helmut Wollmersdorfer said:

Adam Jagosz said:

Actually Firefox is not to blame.

As a user I would expect that popular fonts (delivered by the OS vendor) work in the major browsers.

Oh, but they do (some of them, at least — screenshot from Firefox/Windows below — as seen originally and with the font swapped using DevTools):

Image: https://us.v-cdn.net/5019405/uploads/editor/7z/mp2d98enee3j.png

As I said, the issue lies in the webfont served directly from Google Fonts (and GF developers who only seem to be testing their products in Google Chrome) — this forum currently uses Noto Sans served from GF, not a system font.

Coincidentally, finally a proof that Comic Sans is a bad font (I didn't check the macOS version though).

Florian Pircher · May 2022

Current macOS, Safari, Chrome, Firefox:

Image: https://us.v-cdn.net/5019405/uploads/editor/yf/qyxvml92gako.png

“You know, it was never the aesthetics that bothered me about Comic Sans; it was the missing combining diaeresis.”

RichardW · July 2022

John Savard said:

But before we could consider adding the character to Unicode - or the German alphabet - one important thing needs to be known.

The thing to be known is that precomposed characters composed of pre-existing characters will not be added to Unicode.

John Savard · July 2022

RichardW said:

The thing to be known is that precomposed characters composed of pre-existing characters will not be added to Unicode.

This, however, is a disastrous policy, because it means that some languages are less than equal. Languages like French and German have a character in Unicode for every one of their accented letters, while Burmese doesn't have precomposed characters in Unicode for every character used by that language.

So processing Burmese on a computer is made more difficult, at least if existing standard character codes are used. Presumably, computers in Burma will just use their own character code, and translate to and from Unicode when necessary, for things like using the Internet.

In fact, I see this has happened: the code most devices in Burma use is called Zawgyi, and it is incompatible with Unicode.

Not that Zawgyi was really a good idea, since it is a modified Unicode that replaces other characters with the extra glyph codes needed to serve Burmese properly. Instead, I'd have favored code pages not only for Burmese, but all the other languages using its script and needing different precomposed characters, which don't interfere with also supporting Unicode. But mobile phones, unlike desktop computers, have limitations that prevented a good solution.

Andreas Stötzner · July 2022

John Savard said:

RichardW said:

The thing to be known is that precomposed characters composed of pre-existing characters will not be added to Unicode.

This, however, is a disastrous policy, because it means that some languages are less than equal. Languages like French and German have a character in Unicode for every one of their accented letters, while Burmese doesn't have precomposed characters in Unicode for every character used by that language. …

It is a well-known inconsistency inherent in the UCS, quite from its very beginning. That languages like German, French, Czech or Greek got their sets of accented characters fully accomodated in the standard, but a lot of other languages not, is due to the standard’s history, in the early days of it the basis has been much older character set standards, which have been transmitted to the UCS for compatibility reasons. Many languages got supported after that stage and in the course of its development (and of the development of text processing in general), the dogma was enforced not to continue the encoding of precomposed characters which could get generated otherwise. – It is the result of a mere technical perspective, disregarding scientific or practical aspects (or even ethical ones). – It is not the only inconsistency of the UCS. However, I’m afraid we’ll have to live with that for a long while. Because the idea that typography is a science in its own right has no lobby. Otherwise it would be ‘good practice’ to maintain consistency in such matters across the standard, like it is good and well-established practice in most subjects which have a scientific basement.

For some stakeholders contributing to the standard it is far more important to get A pile of shit or a Screamin’ cats face encoded right away, than to bother with a practical solution for Burmese or any other ‘such stuff’.

John Hudson · July 2022

Precomposed diacritic characters were encoded in Unicode simply for one-to-one compatibility mappings with pre-existing 7- and 8-bit encoding standards. Where no such standards existed, there is zero need for precomposed diacritic characters; indeed, as time goes on, those encodings are becoming more of a burden than they are a benefit. Yes, it meant that some languages got supported earlier than others as software transitioned to Unicode text and layout—since combining marks require more intelligent keyboard mechanisms and glyph processing at the font level—, but we’re presently stuck having to include them in fonts, even though mark attachment GPOS is so much more flexible and productive.

This, however, is a disastrous policy, because it means that some languages are less than equal. Languages like French and German have a character in Unicode for every one of their accented letters, while Burmese doesn't have precomposed characters in Unicode for every character used by that language.

This is not accurate. Burmese text is perfectly representable using standard Unicode encoding. The Zawgyi hack encoding you refer to was not developed because of any deficiency in the Unicode Myanmar character set, but because international sanctions on the military dictatorship meant that there was little impetus for major software companies to implement Myanmar script shaping engines and glyph layout models to support that character set. The delayed support led to developers with Burma coming up with Zawgyi: a pseudo encoding that hijacks 16-bit codepoints from the Unicode Myanmar block to represent some glyph variants that are needed for Burmese shaping. It fails to do a good job of representing all the typographic subtleties of proper shaping, and is much less flexible than the actual Unicode encoding plus OpenType model.

A new character?

Comments

Categories