Can you do reorder to the glyphs and apply ligature after that?

WAY KYI
WAY KYI Posts: 130
edited June 2022 in Technique and Theory
Can you do reorder to the glyphs and apply ligature after that?
For example: The glyphs are reordered after shaping like this
ABMC. I have ligature for B_C. So the process could be quicker if
I reorder ABMC ( switching B and M ) become AMBC then apply
ligature to BC to become the final output as AMB_C?
Since this process will call two processes, can this be done in
one process in which I do reorder & ligature? and how? I am afraid,
the shaping engine will reorder the text right after reordering of BM?
What is the best way to do and please share if you have any suggestion?
And can the text still be correct if I change the font? Thanks

«13

Comments

  • Reordering glyphs can be done through OpenType layout features, but in general this is not a good idea.

    Can you be more specific, e.g. are all the glyphs base glyphs? If M is a mark, then setting the IgnoreMarks flag might solve your problem.
  • WAY KYI
    WAY KYI Posts: 130
    Reordering glyphs can be done through OpenType layout features, but in general this is not a good idea.

    Can you be more specific, e.g. are all the glyphs base glyphs? If M is a mark, then setting the IgnoreMarks flag might solve your problem.

    Ok, A is Base glyph, M is medial and B&C are above base glyphs. Shaping Engine did reorder correct. But I need to get ligature B_C for BC which must be together in order ligature to work. Shaping Engine did reorder correct, but I need to reorder B and C to get B_C. That it is. There are no mark glyph in there. Thanks
  • John Hudson
    John Hudson Posts: 3,227
    It would help to know what actual writing system you are working with, rather than trying to abstract this to ABMC, because character properties, shaping engine behaviour, and glyph categorisation are all factors.
  • @WAY KYI Regarding Erwin's question, what matters is how the glyphs are classified in your font data (the GDEF table), and the key distinctions are base, ligature, mark and component (not used in GSUB). See the Glyph Class Definition subtable. This classification can be used in lookup tables to tell the engine to ignore certain glyphs when it is processing. In addition, you can also create your own classification through filtering sets, and tell the engine to ignore particular sets of glyphs. See lookup flags  and mark filtering sets in the Lookup table description. 
  • WAY KYI
    WAY KYI Posts: 130
    edited June 2022
    The script is MYM2 ( Myanmar ) and since a lot you may not know, I am just giving example with ABMC. You can access the font here  https://drive.google.com/file/d/1RoC7fFV-ZVrAnJS3Hsp62h7QZC5l96S-/view?usp=sharing  and 
    At second image "End Lookup 90", uni102E is replaced with UE392. But that is not the end. So if I can reorder uniE390 & uni103B and ( switch places ) before lookup 90 and use ligature U390+ U102E = uniE392 (lig ). So, I don't need to go thru the rest of the steps starting from Lookup 90. Sorry, I am not too knowledgeable about skipping Medial u103B here. How can I do this without going many steps here to get there. Thanks 
  • Simon Cozens
    Simon Cozens Posts: 752
    edited June 2022
    The way I handle this in Myanmar is to make a ligature with the IgnoreBaseGlyphs flag set in the rphf feature:
    
    feature rphf {
        lookupflag IgnoreBaseGlyphs UseMarkFilteringSet @abovemarks;
        sub kinzi-myanmar iMark-myanmar by kinzi_iMark-myanmar;
    } rphf;
    
    IgnoreBaseGlyphs normally causes more problems than it solves (you can get interactions between faraway marks), but in the rphf feature it's scoped to the current cluster, not to the whole run.
  • John Hudson
    John Hudson Posts: 3,227
    I am not too knowledgeable about skipping Medial u103B here.
    A typical procedure for skipping a glyph like this would be to categorise the /uni103B/ glyph as a mark in the font’s  GDEF table, and then to filter all marks out of your ligature lookup processing by setting the process marks flag to NONE, or to set the flag to only allow a group of marks that does not include /uni103B/.

    As Simon has indicated, it is also possible to set a lookup flag to ignore non-mark base glyphs, which can be useful if you want to manage mark interaction and skip intervening base glyphs.

    It has been a while since I worked on Myanmar, but when I a bit less busy I will go and look at how we handled this in Microsoft’s Myanmar Text fonts.
  • WAY KYI
    WAY KYI Posts: 130
    edited June 2022
    The way I handle this in Myanmar is to make a ligature with the IgnoreBaseGlyphs flag set in the rphf feature:
    feature rphf {
        lookupflag IgnoreBaseGlyphs UseMarkFilteringSet @abovemarks;
        sub kinzi-myanmar iMark-myanmar by kinzi_iMark-myanmar;
    } rphf;
    
    IgnoreBaseGlyphs normally causes more problems than it solves (you can get interactions between faraway marks), but in the rphf feature it's scoped to the current cluster, not to the whole run.
    Thank you very much for your suggestion. Let me try and see this will work. I will update you with the result. One other thing, I just want to know - can you put any kind of glyphs in markclass/markset?  What is the different between the two and when to use one of them in what given situation and why? You want to set markset here, right? Just my original question about reorder ( which opentype feature can do this task?? ) and do ligature will work in this case too? Thanks 
  • WAY KYI
    WAY KYI Posts: 130
    edited June 2022
    I am not too knowledgeable about skipping Medial u103B here.
    A typical procedure for skipping a glyph like this would be to categorise the /uni103B/ glyph as a mark in the font’s  GDEF table, and then to filter all marks out of your ligature lookup processing by setting the process marks flag to NONE, or to set the flag to only allow a group of marks that does not include /uni103B/.

    As Simon has indicated, it is also possible to set a lookup flag to ignore non-mark base glyphs, which can be useful if you want to manage mark interaction and skip intervening base glyphs.

    It has been a while since I worked on Myanmar, but when I a bit less busy I will go and look at how we handled this in Microsoft’s Myanmar Text fonts.
    Glad to know original Myanmartext type Engineer. Thank you very much. Pyidaungsu font was developed may be 3-4 years later than Microsoft Myanmartext font. It still widely used as default font for Myanmar here. Myanmar Unicode font development is not so easy to do up until now and font designers use either Pyidaungsu /Myanmartext background opentype programming and replace glyphs with new ones without needing to know a thing about opentype. I originally from Myanmar and lived and worked as Sr. Software Engineer for about 30 years in USA then back to retire in Myanmar. I started learning font development over a year ago and I want to change this and hand my experience and knowledge to younger generation to carry on in Myanmar. 

    Ok, back to the subject about setting /uni103B/ glyph as a mark in the font’s  GDEF table. I am not able to find in Fontforge where to set it or know how to set in the table. May be point me to documentation or sample on how to implement it. Thanks 
  • Simon Cozens
    Simon Cozens Posts: 752
    edited June 2022
    WAY KYI said:

    Thank you very much for your suggestion. Let me try and see this will work.
    It does work, I promise. :-)
    One other thing, I just want to know - can you put any kind of glyphs in markclass/markset?
    I think you can only put those glyphs which have the GDEF category Mark.
    What is the different between the two and when to use one of them in what given situation and why? You want to set markset here, right?
    Mark attachment classes are an older mechanism and the main characteristic of them is that a glyph can only belong to one class. Mark sets can overlap, in that glyphs can belong to more than one mark set. In general the rule is: always use mark sets, never use mark classes.
    Just my original question about reorder ( which opentype feature can do this task?? ) and do ligature will work in this case too? Thanks 
    It depends on what you want to do. You may want to do the swap afterwards, because in situations like င်္ကျ, I believe the above marks should be anchored onto the medial ya. (Even though they are not in the font on this board!) This is much easier to achieve if you have only one above mark glyph to swap with the medialYa instead of trying to move all the above marks, so you need to ligate first and then swap. Swapping glyphs in OpenType is not supported directly, but you can do it with something like this:
    lookup AddYaBefore {
            sub kinzi-myanmar by medialYa-myanmar kinzi-myanmar;
            sub repha-myanmar by medialYa-myanmar repha-myanmar;
            sub iMark-myanmar by medialYa-myanmar iMark-myanmar;
            ...
    } AddYaBefore;
    
    lookup RemoveYa {
           lookupflag UseMarkFilteringSet @abovemarks;
           sub kinzi-myanmar medialYa-myanmar by kinzi-myanmar;
           sub repha-myanmar medialYa-myanmar by repha-myanmar;
           sub iMark-myanmar medialYa-myanmar by iMark-myanmar;
           ....
    } RemoveYa;
    
    feature abvs {
          sub @abovemarks' lookup AddYaBefore medialYa-myanmar' lookup RemoveYa;
    } abvs;
    
    (These rules are a lot easier to generate in my FEZ language):
    Routine AddYaBefore {Substitute @abovemarks -> medialYa-myanmar $1; };
    Routine RemoveYa    {Substitute @abovemarks medialYa-myanmar -> $1; } UseMarkFilteringSet @abovemarks;
    

    Notice how the reordering works: the "AddYaBefore" applies to the first glyph in the sequence and the "RemoveYa" applies to the second glyph. So with kinzi-myanmar|medialYa-myanmar, this is what happens:

    kinzi-myanmar|medialYa-myanmar
    AddYaBefore applies to first glyph, giving:
    medialYa-myanmar|kinzi-myanmar|medialYa-myanmar
    RemoveYa applies to the new second glyph, giving:
    medialYa-myanmar|kinzi-myanmar
    
  • WAY KYI
    WAY KYI Posts: 130
    Seem like your suggestion is the way to go. The other one seems rather so complicated. So, I found out from .fea file that Pyidaungsu font already has markset below:

    @GDEF_Mark = [\uni102D \uni102E \uni102F \uni1030 \uni1033 \uni1035 \uni1037 
    \uni103D \uni103E \uniE1D1 \uni103D.blws \uniE1D1.blws \uniE1F2 \uniE430 ];

    But 103B is not included. It is in 
    @GDEF_Simple set. So, I include 103B in markset, it is ok right? It will be in both sets. One more question, after you do -
    feature rphf {
        lookupflag IgnoreBaseGlyphs UseMarkFilteringSet @abovemarks;
        sub kinzi-myanmar iMark-myanmar by kinzi_iMark-myanmar; 
    } rphf;

    The kinzi_iMark-myanmar will be the last glyph, right? Thanks
  • Simon Cozens
    Simon Cozens Posts: 752
    John and I are talking about two different approaches. In my way, we make 103B a base and skip it using IgnoreBaseGlyphs. You still need the MarkFilteringSet to ignore the below marks. In John's way, you make 103B a mark and skip it using the MarkFilteringSet.
  • WAY KYI
    WAY KYI Posts: 130
    edited June 2022
    John and I are talking about two different approaches. In my way, we make 103B a base and skip it using IgnoreBaseGlyphs. You still need the MarkFilteringSet to ignore the below marks. In John's way, you make 103B a mark and skip it using the MarkFilteringSet.
    ok, I will try it and whichever works it is ok for me. I was able to find info of IgnoreBaseGlyphs but could not find UseMarkFilteringSet. You said "You still need the MarkFilteringSet to ignore the below marks." - I am confused that the text will only include Base, above base and medial in this case. So, where below mark is coming from? Sorry, there are so many things I need to study and these terms are so new to me. I need all the help I can get or direction to the right place. Thank you very much for both of you and will update you two later. Thanks
  • Simon Cozens
    Simon Cozens Posts: 752
    Consider a sequence like င်္က္ကျိ. This will turn out to be something like:

    ka-myanmar | kinzi-myanmar | virama-myanmar | ka-myanmar | medialYa-myanmar | iMark-myanmar

    When you form the kinzi_iMark ligature, you need to skip over the medialYa, but also you need to skip over the conjunct glyphs as well.
  • WAY KYI
    WAY KYI Posts: 130
    edited June 2022
    Right, in this case you need to consider below mark too. Now I get what you wanted me to see all possible combinations. So, the whole thing related to kinzi+above mark will be completely thought out. This is expert level advices. Thank you, thank you very much.
    PS: I found and understand now that IgnoreBaseGlyphs and UseMarkFilteringSet are the switches to flag the lookup to ignore or use them as you needed. And found how to set markset in Fontforge. Thanks
  • WAY KYI
    WAY KYI Posts: 130
    This is my test font and rphf was working in the first try, then I added blwf features thru Merge Feature file and it only works in Fontforge Metrics Windows. I generated as a font and tried it on Coreldraw, MS Word and AI and both features are not working at all. Don't know what has happened. Can someone take a look?? My friend said Fontlab reported as Script/Language problems. below is my test font:

    https://drive.google.com/file/d/1sK7AAjoi1TMqv_c351-n1CWSq_cXdckH/view?usp=sharing
  • Your font has no OpenType layout features at all.
  • WAY KYI
    WAY KYI Posts: 130
    edited July 2022
    arrr... I saw them in Fontforge. What happened when I generated as a font??? Below features are I am trying to get them work. Thanks

    lookup rphfRephForminMyanmar2lookup0 {

        sub \uni1004 \uni103A \uni1039  by \uniE02D;
    } rphfRephForminMyanmar2lookup0;

    feature rphf {
      script DFLT;
         language dflt ;
          lookup rphfRephForminMyanmar2lookup0;
      script mym2;
         language dflt ;
          lookup rphfRephForminMyanmar2lookup0;

    } rphf;

    lookup blwfBelowBaseFormsinMyanmar2lookup1 {
        sub \u1039 \u1000  by \uE000;
        sub \u1039 \u1001  by \uE001;
        sub \u1039 \u1002  by \uE002;
        sub \u1039 \u1003  by \uE003;
        sub \u1039 \u1004  by \uE004;
        sub \u1039 \u1005  by \uE005;
        sub \u1039 \u1006  by \uE006;
        sub \u1039 \u1007  by \uE007;
        sub \u1039 \u1008  by \uE008;
        sub \u1039 \u100A  by \uE00A;
        sub \u1039 \u100B  by \uE00B;
        sub \u1039 \u100C  by \uE00C;
        sub \u1039 \u100D  by \uE00D;
        sub \u1039 \u100F  by \uE00F;
        sub \u1039 \u1010  by \uE010;
        sub \u1039 \u1011  by \uE011;
        sub \u1039 \u1012  by \uE012;
        sub \u1039 \u1013  by \uE013;
        sub \u1039 \u1014  by \uE014;
        sub \u1039 \u1015  by \uE015;
        sub \u1039 \u1016  by \uE016;
        sub \u1039 \u1017  by \uE017;
        sub \u1039 \u1018  by \uE018;
        sub \u1039 \u1019  by \uE019;
        sub \u1039 \u100A  by \uE00A;
        sub \u1039 \u100B  by \uE00B;
        sub \u1039 \u101C  by \uE01C; 
        sub \u1039 \u101D  by \uE01D;
        sub \u1039 \u100E  by \uE00E;
        sub \u1039 \u100F  by \uE00F;
        #sub \u1039 \u1020  by \uE020;
    } blwfBelowBaseFormsinMyanmar2lookup1;
  • I didn't know you are allowed to use code-points instead of glyph names.

    Apart from that it seems valid to me, but be aware blwfBelowBaseFormsinMyanmar2lookup1 is unused.


  • WAY KYI
    WAY KYI Posts: 130
    edited July 2022
    I got it working now after reinstalling FF. Thank everyone!!!
  • Thomas Phinney
    Thomas Phinney Posts: 2,896
    I didn't know you are allowed to use code-points instead of glyph names.

    Those are glyph names, which happen to look like code points. Both “uXXXX” and “uniXXXX” styles are standard strings for glyph names, from Adobe’s naming approaches as documented in the Adobe Glyph List. https://github.com/adobe-type-tools/agl-specification

    (At one point, long ago, there was a separate document called “Unicode and Glyph Names,” but I think that got folded into the AGL spec somewhere along the way.)
  • Simon Cozens
    Simon Cozens Posts: 752
    edited July 2022
    Thomas Phinney said:
    Those are glyph names, which happen to look like code points. Both “uXXXX” and “uniXXXX” styles are standard strings for glyph names
    Of course, if you used FEZ, you could use  Unicode glyph selectors and get the best of both worlds. :-)

  • Those are glyph names, which happen to look like code points. Both “uXXXX” and “uniXXXX” styles are standard strings for glyph names, from Adobe’s naming approaches as documented in the Adobe Glyph List. https://github.com/adobe-type-tools/agl-specification

    (At one point, long ago, there was a separate document called “Unicode and Glyph Names,” but I think that got folded into the AGL spec somewhere along the way.)
    Thank you for your feedback, but the specific glyph names all start with \uni while lookup blwfBelowBaseFormsinMyanmar2lookup1 contains \u1039, etc.

    Therefor I have added support for both conventions in FontCreator, so it now successfully compiles the fea code provided by WAY KYI.





  • Thomas Phinney
    Thomas Phinney Posts: 2,896
    Thank you for your feedback, but the specific glyph names all start with \uni while lookup blwfBelowBaseFormsinMyanmar2lookup1 contains \u1039, etc.

    No, that is not correct—at least, not as a matter of the glyph naming spec.

    In short:
    • for a single BMP codepoint, one can use either form.
    • for beyond-BMP (“supra-BMP”) codepoints, one must only use “u” and not “uni”
    There are some complications with ligatures:
    • if one wishes to express a ligature of BMP codepoints, one can do so with “uni” by stringing them together like “uni20AC0034”, an option unavailable with “u”
    • one could however do “u20AC_u0034” which is clearer, and no longer… but would become longer if there were more than two codepoints involved. This can be a reason to use “uni” names with ligatures involving long strings of BMP codepoints, if one is worried about total glyph name length.
     See section 2 of the Readme portion of the AGL, partly quoted here:

    Otherwise, if the component is of the form ‘uni’ (U+0075, U+006E, and U+0069) followed by a sequence of uppercase hexadecimal digits (0–9 and A–F, meaning U+0030 through U+0039 and U+0041 through U+0046), if the length of that sequence is a multiple of four, and if each group of four digits represents a value in the ranges 0000 through D7FF or E000 through FFFF, then interpret each as a Unicode scalar value and map the component to the string made of those scalar values. Note that the range and digit-length restrictions mean that the ‘uni’ glyph name prefix can be used only with UVs in the Basic Multilingual Plane (BMP).

    Otherwise, if the component is of the form ‘u’ (U+0075) followed by a sequence of four to six uppercase hexadecimal digits (0–9 and A–F, meaning U+0030 through U+0039 and U+0041 through U+0046), and those digits represents a value in the ranges 0000 through D7FF or E000 through 10FFFF, then interpret it as a Unicode scalar value and map the component to the string made of this scalar value.

  • Erwin Denissen
    Erwin Denissen Posts: 302
    edited July 2022
    Thank you for your feedback, but the specific glyph names all start with \uni while lookup blwfBelowBaseFormsinMyanmar2lookup1 contains \u1039, etc.

    No, that is not correct—at least, not as a matter of the glyph naming spec.

    The font and the fea code are not mine, but I ensured that FontCreator can cope with it.
  • WAY KYI
    WAY KYI Posts: 130
    ok, here is my first try but failed. I imported into FF( Fontforge ) and get lot of error and the features are not imported. And I don't know where I can put lookupflags in FF but I was able to add abovemarks in there. Seemed so easy but I am not correct in writing syntax. Please see my codes and tell me where I went wrong...Thanks

    # GDEF Mark Attachment Sets
    @abovemarks=[\uni102D \uni102E \uni1032 \uni1036 ];
    @Kinzi_abovemarks=[\uniE030 \uniE031 \uniE032 \uniE033 ];

    languagesystem DFLT dflt;
    languagesystem mym2 dflt;

    lookup rphfRephForminMyanmar2lookup0 {
      lookupflag 0;
        sub \uni1004 \uni103A \uni1039  by \uniE02F;
    } rphfRephForminMyanmar2lookup0;

    lookup rphfRephFormlookup2 {
        lookupflag IgnoreBaseGlyphs UseMarkFilteringSet @abovemarks;
        sub \uniE02F @abovemarks by @Kinzi_abovemarks;
    } rphfRephFormlookup2;

    feature rphf {

     script DFLT;
         language dflt ;
          lookup rphfRephForminMyanmar2lookup0;
          lookup rphfRephFormlookup2;

     script mym2;
         language dflt ;
          lookup rphfRephForminMyanmar2lookup0;
          lookup rphfRephFormlookup2;
    } rphf;
  • WAY KYI
    WAY KYI Posts: 130
    Thank you for your feedback, but the specific glyph names all start with \uni while lookup blwfBelowBaseFormsinMyanmar2lookup1 contains \u1039, etc.

    Therefor I have added support for both conventions in FontCreator, so it now successfully compiles the fea code provided by WAY KYI.

    In FF, I always able to use uniXXXX and it printed out as \UXXXX into .fea file. But it works. Thanks for your help and I am using Fontforge for my font development. Thanks 
  • The OpenType layout features don't match that fea code.

    This is inside your font:


    And this is what it looks after importing the fea code:


  • WAY KYI
    WAY KYI Posts: 130
    edited July 2022
    The OpenType layout features don't match that fea code.

    Yes, I knew the font only has rphf first part and blwf. I was not able to import second part of the rphf-lookup. I was following suggestion by Simon Cozens and I failed, it is my part not able to follow thru the directions. There is no setting to set lookupflag in FF and I was trying to get thru .fea file import, which also failed.  Simon & John knew what I was trying to do if you follow this from very beginning. Thank you very much for trying to help me here. We are on different font tools. But thanks
  • Well, FontCreator was able to import your fea code into your font, including IgnoreBaseGlyphs and MarkFilteringSet.

    See:

    Hope this helps.