Matching marks while also ignoring marks

Is it possible to match a mark glyph in a sequence (to chain a lookup on it) but also ignore other mark glyphs in that sequuence (including other copies of the glyph itself)? Probably best understood by example.

With the sequence "a MARK b c" I want to chain a lookup on the MARK, whether or not there's another MARK between the b and the c.

I tried this:

lookupFlags IgnoreMarks
sub a MARK' lookup lu b c;
but it didn't match at all. Using a MarkFilteringSet did match, but it didn't match the "a MARK b MARK c" case.

I could, of course, write out all the possibilities:
sub a MARK' lookup lu b c;
sub a MARK' lookup lu b MARK c;
But my real situation has a longer context and a lot more combinations, and although I have Python and I can make class-based lookups, it's just ugly so I'd rather not enumerate them all if there's a better way to do it. 

Comments

  • Sami Artur Mandelbaum
    edited August 2020
    Hi Simon,
    this is possible using microsoft VOLT.
    You can choose the process marks filtering by Class
    There is a diference between Class and Group in VOLT.
    Sami
  • Hmmm... I wonder how that would work. What lookup flag would it translate to? I have a feeling that if it does work, it probably relies on different behavior between useMarkFilteringSet and markAttachmentType, and I'm not sure that's something that should be relied on.

    Would you mind making an example font for me?
  • Would you mind making an example font for me?
    SBL hebrew font is good example.
    http://www.sbl-site.org/Fonts/SBL_Hbrw.ttf
  • Did you try ‘UseMarkFilteringSet’?
  • Did you try ‘UseMarkFilteringSet’?

    Yes, but no luck:

    @below_dots = [sdb ddb tdb];
    
    lookup Routine_1 { sub @below_dots by [sdb.yb ddb.yb tdb.yb]; } Routine_1;
    
    lookup Routine_4 {
    	lookupflag UseMarkFilteringSet @below_dots;
        sub @inits @medis @below_dots' lookup Routine_1 BARI_YEf1;
    } Routine_4;
    
    Now if we have a sequence with @below_dots between the init and the medi, it does not match. (This makes sense, because we've told it to skip over anything that it's not a @below_dot, and so the dot after the init needs to match.)
    $ hb-shape font.ttf 'ببے'
    [BARI_YEf1=2+272|sdb=1@-26,58+0|BEm8=1@0,275+227|sdb=0@11,97+0|BEi2=0@0,311+389]
    
    Enumerating the possibilities does work:
    lookup Routine_4 {
    	lookupflag UseMarkFilteringSet @below_dots;
        sub @inits @below_dots' lookup Routine_1  @medis BARI_YEf1;
        sub @inits @below_dots' lookup Routine_1  @medis' @below_dots' lookup Routine_1 BARI_YEf1;
    } Routine_4;
    
    If we put something other than @below_dots in the mark filtering set, then the rule does not match at all because the @below_dots have been skipped over.
  • John Hudson
    John Hudson Posts: 3,229
    You need to separate out the marks you want to process and the marks you want to ignore. In VOLT, you'd do this by setting up a group of marks to be ignored in this specific lookup, and then compile that to a Mark Filtering Set referenced in the ignore marks field for the lookup (in the VOLT UI, you indicate that the group should be compiled to a Mark Filter Set by preceding the name of the group in the lookup ignore marks field with *).

    I consider Mark Attachment Class to be essentially obsolete for mark filtering purposes. Mark Attachment Classes have to be exclusive, i.e. the same mark cannot be included in more than one Mark Attachment Class, which makes them useless for lots of purposes. In recent projects, I have simply used Mark Filtering Sets for all ignore marks filtering groups, regardless of whether they are exclusive or not.
  • You need to separate out the marks you want to process and the marks you want to ignore.
    Right - the problem is that these are not distinct sets. I literally want to match a thing and ignore all other instances of the same thing. I'm coming to the conclusion it's not possible; enumerating the possibilities it is, then...
  • John Hudson
    John Hudson Posts: 3,229
    I literally want to match a thing and ignore all other instances of the same thing.

    That's the sort of situation that can lead to adding duplicate glyphs under different names, so you can use GSUB to contextually set up distinct sets.