Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

U+FDF2 'ARABIC LIGATURE ALLAH ISOLATED FORM' not always rendered correctly #125

Open
Manishearth opened this issue Jun 13, 2017 · 13 comments

Comments

@Manishearth
Copy link
Member

U+FDF2 'ARABIC LIGATURE ALLAH ISOLATED FORM' (ﷲ) is supposed to render as alef-lam-lam-meem (with diacritics), but in some fonts, including Courier New, the Alef is missing.

http://www.fileformat.info/info/unicode/char/fdf2/fontsupport.htm

The code point could conceivably mean "the main l-l-m ligature in 'allah'", however the spec decomposes it as a-l-l-h, so all fonts should render the leading alef.

@behnam
Copy link
Member

behnam commented Jun 13, 2017

Screenshot on my system, with buggy fonts marked highlighted red:

screen shot 2017-06-13 at 5 07 38 pm

screen shot 2017-06-13 at 5 08 27 pm

Creating these kinds of ligatures, specially RIAL and ALLAH are very common in fonts.

The bug here seams to be the font assigning U+FDF2 to a ligature glyph for the second joining segment of the word ALLAH (which is LLAH), instead of creating a composed glyph for U+FDF2 using the ligature.

CLDR data, which is our primary source for character support, misses any kind of information about ligatures (and their possible codepoints). Seeing this bug being common, specially in the more open-source fonts, I think we can cover the topic in ALReq and, even, maybe, provide an Annex with some details about the important ligatures and their implementation details in fonts (like the detail here that the ligature doesn't get U+FDF2 codepoint, but U+FDF2 uses the ligature.)

What do you think?

@behnam behnam self-assigned this Jun 13, 2017
@khaledhosny
Copy link

Since U+FDF2 is a presentation form character, I think we shouldn’t say much more than discouraging the use of presentation forms in text input. As for the fonts, though they indeed break the glyph for U+FDF2, the ligatures for الله and لله still work correctly.

@behnam
Copy link
Member

behnam commented Jun 13, 2017

Right, @khaledhosny. True that we want to discourage them in text. So, the question is, do we want to cover the issue for the sake of improving font development processes and font products for the script?

Since the topic is not exactly text layout, I think it could be a separate (wiki) document, or maybe an annex on font development.

@khaledhosny
Copy link

I agree this does not belong to the main document, an annex on Arabic font development best practices might be a good idea.

@ntounsi
Copy link
Contributor

ntounsi commented Jun 24, 2017

My thinking is :

  • Do not use U+FDF2 the presentation form character. Beside being deprecated, many fonts omit the first ALEPH.
  • Write ALLAH in full letters (ALEF LAM LAM HEH). Many fonts try, if not to replace it by the ligature shape, but to decorate it by adding the "formal" ARABIC SHADDA ّ (U+0651) and ARABIC LETTER SUPERSCRIPT ALEF ٰ (U+0670). Note however, to not put proper diacritics like SHADDA and FATHA after LAM. They might come over the added formal signs.

Html code to test your fonts:
<p>&nbsp;&#xFDF2; &#x627;&#x644;&#x644;&#x64E;&#x651;&#x647; &#x627;&#x644;&#x644;&#x647; </p>

@behnam and @khaled, +1 to cover font development best practices.

@behnam behnam added this to the First_Public_Draft milestone Jun 27, 2017
@behnam behnam added this to Ready to Pick Up in Authoring ALReq 1.0 Jul 18, 2017
@behnam behnam moved this from Ready to Pick Up to Ideas + Discussions in Authoring ALReq 1.0 Jul 18, 2017
@behnam behnam removed this from Ideas + Discussions in Authoring ALReq 1.0 Jul 18, 2017
@moyogo
Copy link
Contributor

moyogo commented Jun 20, 2018

The Unicode Standard 11.0.0 says the following in section 9.2 Arabic Presentation Forms-A: U+FB50–U+FDFF, Word Ligatures (this was added in Unicode 7.0.0):

U+FDF2 ARABIC LIGATURE ALLAH ISOLATED FORM is a very common ligature, used to diplay the name of God. When the formation of the allah ligature is desired, the recommended way to represent the word would be <alef, lam, lam, shadda, superscript alef, heh> <0627, 0644, 0644, 0651, 0670, 0647>. In non-Arabic languages, other forms of heh, such as heh goal (U+06C1), may also form the ligature. Extra care should be taken not to form the ligature in the absence of the shadda and the superscript alef, as the sequence <alef, lam, lam, heh> and <alef, lam, lam, shadda, heh> exist in Persian and other languages with different meanings or pronunciations, where the formation of the ligature would be incorrect and inappropirate.

@r12a
Copy link
Contributor

r12a commented Jun 21, 2018

I decided it was time for me to explore this a little more deeply. Here are some other results. I created a test page at:
https://w3c.github.io/alreq/gap-analysis/tests/ligation/ligation_000.html

Here are some results i screen-captured on my Mac. Grey backgrounds from a v quick scan indicate things i think are probably incorrect.

screen shot 2018-06-21 at 17 55 31

Essentially, this whole thing is quite broken, it seems. (Which is surprising given the content involved.)

@Manishearth
Copy link
Member Author

Arial overcompensating by adding a double shadda/alif is very surprising (and somewhat hilarious) to me given how commonly that font is used.

Then again, I guess very little about non-latin text not working on computers should surprise me anymore 😩

@khaledhosny
Copy link

My perception is that, contrary to what Unicode suggests, Arabic users expect bare [alef] lam lam heh to ligate and that is what almost all Arabic fonts do. Arabic non-God name words that would match the same sequence of letters are very uncommon to the extent that I never encountered any of them until I was researching this very issue. In Amiri I approached this from the other end; actively matching sequences that are unlikely to be the name of God and unligating them, e.g. خالله does not ligate, but فالله ligates while فالَله does not.

@Manishearth
Copy link
Member Author

Manishearth commented Jun 21, 2018

When I discussed this issue with @roozbehp he had some examples of Persian words that do this, IIRC.

Just to lay it out, there are multiple issues here, of varying severity:

  • U+FDF2 sometimes rendered as l-l-h for some fonts, which is completely wrong
  • a-l-l-h autoligaturifies to add shadda/dagger alif, which is incorrect if it is part of some words (but as Khaled says this may be what people expect)
  • a-l-l-h with diacritics gets more diacritics added to it in Arial, which is again completely wrong
  • Arial, Tahoma, Al Bayan, Damascus add diacritics to l-l-h when there is no alif, which seems similarly incorrect to me (PR adding them to this file)

@moyogo
Copy link
Contributor

moyogo commented Jun 22, 2018

As @r12a notes in https://r12a.github.io/scripts/arabic/block#charFDF2 the compatibility decomposition for FDF2 is <alif, lam, lam, heh> (“≈ [isolated] 0627 0644 0644 0647”).

While the (non normative) reference glyph is a ligature <alif, lam, lam, shadda, superscript alif, heh>, this hasn’t always been the case. In the Appendix H. New Characters of the Unicode Standard 1.1, the reference glyph used is a ligature <alif, lam, lam, heh> without shadda nor superscript alif.
This may explain where the compatibility decomposition of FDF2 comes from.
capture d ecran 2018-06-22 a 10 18 25

@asmusf
Copy link

asmusf commented Jun 23, 2018

The production process changed between Unicode 2.x and 3.0. From that point on, different custom software was used with an entirely new collection of TrueType fonts. With many upgrades, both to the software and the font collection, that process is still very much in place today.

Every update of the font collection bears the risk of unintentional changes, and not all of them are caught be reviewers. Therefore, it would take some digging to find out whether the change from a glyph matching the decomposition to a glyph adding shadda and alif was indeed intentional at the time.

@moyogo
Copy link
Contributor

moyogo commented Jun 24, 2018

I was curious to see if any fonts have FDF2 as alif, lam, lam, heh without shadda and superscript alif.

I managed to find a handful:

There are most probably more.

Including these, there are also more typefaces that do not ligate <lam, lam, heh> (regardless of what FDF2 they have). Some of these do have an optional discretionary ligature feature that does the ligature.

There may also be fonts that do FDF2 with shadda but no s. alif like https://www.linotype.com/1079191/hasan-alquds-unicode-regular-product.html?site=webfonts&format=ot-ttf&branding=std
or there may also be fonts that do FDF2 with shadda and fatha like https://fonts.google.com/specimen/Harmattan.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants