Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lookup flags in fonts with no GDEF table #2647

Open
khaledhosny opened this issue Aug 19, 2020 · 35 comments
Open

Lookup flags in fonts with no GDEF table #2647

khaledhosny opened this issue Aug 19, 2020 · 35 comments

Comments

@khaledhosny
Copy link
Collaborator

Someone reports an issue in LibreOffice about some Iranian fonts like B Nazanin (fonts attached in the LIbreOffice issue).

Basically the font has mark ligatures for shadda and the lookups has ignore base and ignore ligatures set (which makes no sense), which causes HarfBuzz to correctly skip bases and make ligature between shadda and fathatan in strings like:

$ hb-view "B Nazanin.TTF"  اتّفاقاً

HarfBuzz
Oddly enough, both Uniscribe and Core Text ignore the lookup flags and don’t for the ligatures.

$ hb-view "B Nazanin.TTF"  اتّفاقاً --shaper=coretext

Core Text

As it turns out, the font has no GDEF table, so HarfBuzz synthesizes one and use it here.

It is a rather odd font, and I HarfBuzz synthesis of a GDEF table actually matches what the spec is saying (emphasis mine):

In addition, a client uses class definitions to apply GSUB and GPOS LookupFlag data correctly. For example, a LookupFlag may specify ignoring ligatures and marks during a glyph operation. If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.

But it might be an interoperability issue that we might want to consider (my 2¢ is to 🤷🏽 and declare it working as designed).

@behdad
Copy link
Member

behdad commented Aug 19, 2020

Sounds related to the GDEF change we made and reverted. The carrying-forward of the flags is wrong...
Anyway. I'll check later. This was just me speculating based on the title.

@khaledhosny
Copy link
Collaborator Author

That is with 2.7.1 where the offending commit is reverted, unless you mean the old behavior was also wrong.

@khaledhosny
Copy link
Collaborator Author

khaledhosny commented Aug 19, 2020

Hmm, if I add a GDEF table with glyph classes, HarfBuzz stops making the ligatures: BNazanin#2.zip

@khaledhosny
Copy link
Collaborator Author

(Core Text and Uniscribe don’t make the ligatures when GDEF table is present as well, so my initial guess was wrong)

@behdad
Copy link
Member

behdad commented Aug 19, 2020

unless you mean the old behavior was also wrong.

Yes.

@behdad
Copy link
Member

behdad commented Aug 19, 2020

Simon found problems with pre-2.7.0 HB-no-GDEF. We tried to address, broke others.

Even then there's more issues. I'll try to write them down here soon.

@behdad
Copy link
Member

behdad commented Jun 14, 2021

Let's revive this issue and figure out what needs to be fixed.

@behdad
Copy link
Member

behdad commented Jan 19, 2022

Fixed by #3365 ?

@khaledhosny
Copy link
Collaborator Author

Does not seem so.

@behdad
Copy link
Member

behdad commented Jan 19, 2022

Can you send me the font in question please?

@behdad
Copy link
Member

behdad commented Jan 19, 2022

Can you send me the font in question please?

Nevermind. Got it.

@behdad
Copy link
Member

behdad commented Jan 19, 2022

I think I know what to do.

@behdad
Copy link
Member

behdad commented Jan 19, 2022

Basically, what they are doing, I suppose, is that when matching ligatures (and context, etc) other than the first item, ignore IgnoreBase and IgnoreLigature... lol. We do the same during mark-attachment lookups.

behdad added a commit that referenced this issue Jan 19, 2022
Do not respect IgnoreBaseGlyphs and IgnoreLigatures when matching
input, lookahead, and backtrack.

Makes five tests fail; four of them in aots. One in our mark-attachment.
I think this matches what other shapers do though. We should test more.

Fixes #2647
behdad added a commit that referenced this issue Jan 19, 2022
Do not respect IgnoreBaseGlyphs and IgnoreLigatures in the matcher.

Makes five tests fail; four of them in aots. One in our mark-attachment.
I think this matches what other shapers do though. We should test more.

Fixes #2647
@behdad
Copy link
Member

behdad commented Jun 25, 2022

If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.

Do we actually have evidence that Uniscribe does this?

@behdad
Copy link
Member

behdad commented Jun 25, 2022

I'm leaning towards closing this.

@khaledhosny
Copy link
Collaborator Author

I have no more information to add and since apparently it is only this font and IMHO that is a badly built one, lets close.

@khaledhosny khaledhosny closed this as not planned Won't fix, can't repro, duplicate, stale Jun 26, 2022
@hosseinn
Copy link

hosseinn commented Jul 17, 2022

This problem is not limited to B Nazanin. I have the same problem with Arial and Tahoma:

$ hb-view /usr/share/fonts/truetype/msttcorefonts/Arial.ttf  اتّفاقاً
## PROBLEMATIC OUTPUT HERE ##

$ hb-view --version
hb-view (HarfBuzz) 2.7.4
Available shapers: graphite2,ot,fallback

I am using Ubuntu 22.04.

The same problem is visible with Arial and the latest version of harfbuzz:

$ ./util/hb-view /usr/share/fonts/truetype/msttcorefonts/Times_New_Roman.ttf اتّفاقاً
## PROBLEMATIC OUTPUT HERE ##

$ ./util/hb-view --version
hb-view (HarfBuzz) 4.4.1
Available shapers: ot,fallback

@khaledhosny
Copy link
Collaborator Author

This is not the same issue. The old corefonts are broken, I get the same failure in with DirectWrite as well. Newer versions of the fonts are fine.

@khaledhosny
Copy link
Collaborator Author

To be clear, the fonts have ligature lookups for the mark ligatures that set both ignore base and ignore ligatures lookup flags, so HarfBuzz is doing what the fonts asked for.

@khaledhosny
Copy link
Collaborator Author

So the difference is that Arial has a GDEF table, but B Nazanin does not. So again I think the issue is that HarfBuzz is synthesizing glyph classes when GDEF table is absent and DirectWrite is apparently not doing that.

@khaledhosny khaledhosny reopened this Jul 17, 2022
@khaledhosny
Copy link
Collaborator Author

Re-opening to see if we actually want to follow MS implementation here or not.

@khaledhosny
Copy link
Collaborator Author

If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.

Do we actually have evidence that Uniscribe does this?

Apparently the answer is no, it does not do this.

@khaledhosny
Copy link
Collaborator Author

With:

diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
index 0806abb7d..29c2ec2f9 100644
--- a/src/hb-ot-shape.cc
+++ b/src/hb-ot-shape.cc
@@ -137,8 +137,8 @@ hb_ot_shape_planner_t::compile (hb_ot_shape_plan_t           &plan,
    * Decide who provides glyph classes. GDEF or Unicode.
    */

-  if (!hb_ot_layout_has_glyph_classes (face))
-    plan.fallback_glyph_classes = true;
+  //if (!hb_ot_layout_has_glyph_classes (face))
+  //  plan.fallback_glyph_classes = true;

   /*
    * Decide who does substitutions. GSUB, morx, or fallback.

The B* fonts stop making the incorrect mark ligature.

@khaledhosny
Copy link
Collaborator Author

khaledhosny commented Jul 17, 2022

If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.

Do we actually have evidence that Uniscribe does this?

Apparently the answer is no, it does not do this.

I opened MicrosoftDocs/typography-issues#954 for spec clarification.

@behdad
Copy link
Member

behdad commented Jul 18, 2022

If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.

Do we actually have evidence that Uniscribe does this?

Apparently the answer is no, it does not do this.

From the Uniscribe API limitations also I have deduced that they can't be doing this.

@behdad
Copy link
Member

behdad commented Jul 18, 2022

If the font does not include a GlyphClassDef table, the client must define and maintain this information when using the GSUB and GPOS tables.

Do we actually have evidence that Uniscribe does this?

Apparently the answer is no, it does not do this.

From the Uniscribe API limitations also I have deduced that they can't be doing this.

At least not for positioning.

@behdad
Copy link
Member

behdad commented Jul 30, 2022

I'm thinking about blacklisting the GPOS of these fonts.

Where can we find them all?

@behdad
Copy link
Member

behdad commented Jul 30, 2022

This problem is not limited to B Nazanin. I have the same problem with Arial and Tahoma:

FWIW we intentionally nuke the GDEF of these fonts, up to Windows 8. Wonder if we should do the same to their GPOS.

@behdad
Copy link
Member

behdad commented Jul 30, 2022

This is one way to handle this: Only synthesize glyph classes if font doesn't have GDEF and doesn't have GPOS.

diff --git a/src/hb-ot-shape.cc b/src/hb-ot-shape.cc
index 3da317a5c..5c68bee21 100644
--- a/src/hb-ot-shape.cc
+++ b/src/hb-ot-shape.cc
@@ -137,7 +137,8 @@ hb_ot_shape_planner_t::compile (hb_ot_shape_plan_t           &plan,
    * Decide who provides glyph classes. GDEF or Unicode.
    */
 
-  if (!hb_ot_layout_has_glyph_classes (face))
+  if (!hb_ot_layout_has_glyph_classes (face) &&
+      !hb_ot_layout_has_positioning (face))
     plan.fallback_glyph_classes = true;
 
   /*

One test fails though. Need to investigate.

I haven't checked that it actually fixes the case at hand.

@behdad
Copy link
Member

behdad commented Jul 30, 2022

One test fails though. Need to investigate.

One of the failing tests is in Thai and we fail to zero mark advances, I suppose because there is no mark attachment in the GPOS. This change seems fragile at this point in time :(.

@behdad
Copy link
Member

behdad commented Jul 30, 2022

One of the failing tests is in Thai and we fail to zero mark advances, I suppose because there is no mark attachment in the GPOS. This change seems fragile at this point in time :(.

Humm. The other failing test is Latin. It does have mark attachment, so I'm surprised why marks are not getting zero advances. Investigating.

@behdad
Copy link
Member

behdad commented Jul 30, 2022

Humm. The other failing test is Latin. It does have mark attachment, so I'm surprised why marks are not getting zero advances. Investigating.

Oh. Because we don't synthesize, we now are not applying mark attachment either.

@behdad
Copy link
Member

behdad commented Jul 30, 2022

Blocklisting the GPOS of the affected fonts sounds like the best option to me. Where can we get all these fonts?

@khaledhosny
Copy link
Collaborator Author

I only know the fonts attached to the LO issue above (https://bugs.documentfoundation.org/show_bug.cgi?id=135778)

@khaledhosny
Copy link
Collaborator Author

... and there is the font in #2888

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants