Crowbar: A text shaping debugger

Simon CozensSimon Cozens Posts: 724
edited June 2020 in Font Technology
For anyone who hasn't seen it on Twitter already, I have created a tool called Crowbar which will help you to debug shaping/layout issues. It's web based, but it runs on your own browser so you're not uploading fonts anywhere. Source code is here.


Comments

  • John HudsonJohn Hudson Posts: 2,955
    Not only is this an elegant and interesting tool. It helped me spot a bug in a font this morning.
  • John HudsonJohn Hudson Posts: 2,955
    Since Crowbar doesn't actually involve uploading a font, maybe re-word the drag-and-drop field text to read

    Drag and drop to load a font locally
  • Vasil StanevVasil Stanev Posts: 759
    edited June 2020
    I used it, don't quite understand what I'm looking it but it looks cool. :)
    _____
    Adress bar says "Not secure!"
  • This is great Simon, thank you!
  • Simon CozensSimon Cozens Posts: 724
    I used it, don't quite understand what I'm looking it but it looks cool. :)

    What you're seeing is the steps that an OpenType shaping/layout engine goes through as it applies feature rules from a font to an input string and turns it into a set of glyphs to be laid out visually: first, characters are mapped to glyphs, then the shaper applies all the substitution rules from the GSUB table, and then the positioning rules.

    So in the example picture above, lookup 182 is probably something like "feature rlig { sub sp0 @spacer by @spacer; } rlig" and the last thing that happens to the glyph stream is that the sp0 glyph gets substituted out.

    Then the first thing that happens in positioning is that the shaper gets the default horizontal advances for each of the glyphs in the stream (that's the "330", "156", etc.) before applying the first lookup, which is a cursive attachment lookup and starts adding positioning information to the glyph stream. At the bottom, you have the finalized set of glyphs with their glyph IDs and positions ready to be displayed.


    Adress bar says "Not secure!"
    Yeah, it's just on a HTTP server and I should put it on HTTPS as well. As John points out, this is entirely client side and your fonts stay where they are.
  • Simon CozensSimon Cozens Posts: 724
    Since Crowbar doesn't actually involve uploading a font, maybe re-word the drag-and-drop field text to read

    Drag and drop to load a font locally
    Good catch - done that (and fixed The crazy Caps too).
  • Excellent work, Simon.
    I was surprised to see a familiar font (Noto Nastaliq) as the first example, but one needs a tool like Crowbar to work through its details. 
  • It looks very useful, but there were some confusing features in the display.  Firstly, Indic rearrangements need to be shown as separate steps.  Secondly, it seems to assume that feature execution does not overlap, and that each primary lookup is only invoked by a single feature.  Optional features need to interact with standard features.

    In my Da Lekh family, localised shaping is available both by specifying the language (which not all applications support) and by using stylistic sets.  Therefore the lookups required occur both in the standardly invoked features and in stylistic sets.  An even more extreme, but probably very unusual example, is that the family implements its idiosyncratic transliteration from Latin script to an Indic script as a fallback for renderer failings or hostility.  Thus many of the Indic shaping lookups also occur in stylistic set ss02, which is only defined for the Latin script.  I was taken aback when stylistic set ss02 was reported to be in use for shaping Indic text.
  • RichardW said:
    It looks very useful, but there were some confusing features in the display.  Firstly, Indic rearrangements need to be shown as separate steps.
    If I understand you correctly (and I'm not sure if I do) then this can be achieved by setting "Clustering" to "Characters".

      Secondly, it seems to assume that feature execution does not overlap, and that each primary lookup is only invoked by a single feature.  Optional features need to interact with standard features.
    Ah, yes, this is probably where I am trying to be too clever. Crowbar uses Harfbuzz to shape the text, and Harfbuzz (correctly) uses a list of features as part of selecting the lookups to execute in a stage. But once it's chosen the lookup IDs, it throws the feature information away. So Crowbar tries to look back into the feature-lookup mapping to try to understand why a lookup got called and where it came from, but if a lookup appears in more than one feature then I can see that it might guess wrong.

    I am not sure what to do about that. Most of the time the feature-guesser is correct, and it's useful information, so I'm loathe to turn it off even if it makes mistakes some times.

    The right thing to do is for you to build your fonts with source level debugging, and then Crowbar will not only give you the right feature name, but the lookup name and the source line it came from as well!
  • That's not the rearrangement problem, though I do like having clustering set to character.  The problem is that the output says that 'start lookup 25 ss99' introduces glyph uni25CC.  No lookup generates glyph uni25CC out of anything but U+25CC - this is inserted as part of Indic rearrangement.

    The lookup to feature mapping could be made more reliable if you know the OTL script and language being used for rendering.  It is much rarer for two features in the same script and language to share a lookup, though it can happen.  To minimise the violence done to Ed Trager's Hariphunchai to create Lamphun, I ended up using one of his lookups in both ccmp (before Indic rearrangement) and in blws (after rearrangement).  One can even have the irretrievable situation where two optional features include the same lookup, and then 'which feature?' is a meaningless question!

    I agree that current feature-guesser *is better* than having no feature-guesser.

    Where is the Debg table documented?  I did consider generating Zapf tables, but I couldn't find any examples of Zapf tables to check that I had read the syntax correctly.  The Debg table sounds closer to what I want.  It sounds as though it could help with decompilation, just as populating the POST table does.
  • I think I can see the required bits of the table.  There is a component named "com.github.fonttools.feaLib" indexed by OTL table (GSUB or GPOS) and then lookup number.  The elements are an array of source file name, lookup name and as element 2 a further structure which is an array of script, language and feature.  I trust element 2 can be null as most lookups used for a script will be used by most of the language systems within that script.  I can't see where Crowbar will get the line number from.

    I suppose I could modify the compiler to assign multiple index numbers to the same lookup, but it could get really confusing.  If duplicates were run together, I would have to be sure that the second run changed nothing.  (Perhaps I am already too optimistic in hoping that all renderers run a set of lookups, as prescribed, and not a bag of lookups.)
  •   The problem is that the output says that 'start lookup 25 ss99' introduces glyph uni25CC.  No lookup generates glyph uni25CC out of anything but U+25CC - this is inserted as part of Indic rearrangement.
    I think I'm going to need a reproducible example to be able to diagnose and fix this. What's the font / text sequence?

    I trust element 2 can be null as most lookups used for a script will be used by most of the language systems within that script.

    Right.

    I can't see where Crowbar will get the line number from.

    Element 0 is a "location" in whatever format makes sense to the font editor. For FEA code it's "filename:line:column". The Debg table is an ad-hoc experiment that I put together to store this debugging code, although it can be generalized to be used for other debugging information too. The idea is that it can be stripped out to produce a production font.

  • The text is the 5-character word ᨯᩮᩬᩥᩁ.  The font is available as http://wrdingham.co.uk/lanna/dalekh_si.ttf, which is at Version 0.011.  For the source code, see http://wrdingham.co.uk/lanna/renderer_test.htm#fonts .  Since that page was published, I have published the compiler at http://wrdingham.co.uk/fonts/oft.html .  It is in no way ready for general use - I published it so I could release LKLUG_T (http://wrdingham.co.uk/fonts/lklugt.html) so people could read Pali as it is written in the Sinhala script.




  • OK, I understand what's going on, and it was something I was thinking about already - needing to add tracing to Harfbuzz's inter-phase "pause" functions. I'll get on that.
  • You've got a lookup labelling problem with context substitutions (which is a rather tricky area).  Same font, text string ᩈ᩠ᨶ᩻ᩮᩢ᩶ᩣ.  Lookup 94 invokes, with effect, lookups 81 and 95.  Your labelling goes: "start lookup 94 ss02" with same glyphs as before; "(arrow) end lookup 81" and glyphs after lookup 81; "end lookup 81" with glyphs after lookup 95.

    Same font, same lookup, but string ᨳ᩠ᨶ᩻ᩦ᩵ shows different behaviour.  Lookup 94 invokes, with effect, lookups 96 and 95.  Your labelling goes: "start lookup 94 ss02" with same glyphs listed as before; "(arrow) end lookup 96" with contents of glyphs positions changed by lookup 96 updated, "(arrow) end lookup 95" with some shuffling up because lookup 96 merged two glyphs into one, and finally "end lookup 94 ss02" with all the changes shown.  These problems may come from HarfBuzz not being as supportive as one might hope.






  • Yes, that's also a known issue with Harfbuzz, to do with the way that Harfbuzz's buffer is not serializing inside of lookups (which means that it doesn't necessary make sense when lookups chain to other lookups).
  • Has the tool been withdrawn?  I can no longer drag a font to it.
  • Sorry, I broke it while trying to add some new features (USE reordering tracking etc.). Should be fixed now.
  • Belay that - I seem to be having a connectivity problem.
  • RichardWRichardW Posts: 100
    edited September 2020
    Thanks putting the extra stages in.  I've now got it reading my Debg table, so the lookups are also labelled by name, and in addition it's nice to see the font-independent shaping stages.  There are still some glitches in the rendering.  For the word ᨯᩮᩬᩥᩁ /dɯan/ 'month'  the dotted circle is listed but not displayed, though subsequent dotted circles are displayed for ᨯᩮᩬᩥᩁᨯᩮᩬᩥᩁᨴᩘ᩠ᩃᩣ᩠ᨿ, where I have artificially repeated the 'month' word and added ᨴᩘ᩠ᩃᩣ᩠ᨿ /taŋlaːi/ 'all'.  The step by step display of ᩈᩘᨥᩮᩣ <saṅɡho> (nominative singular of Pali saṅɡha) is a bit disconcerting because the mark above for <ṅ>has the repha-like behaviour of Burmese kinzi outside the northwest of the range.  This is achieved by removing it from ᩈ and letting it sprout from ᨥ.

    As I understand the Debg table to be unstable, for the record, and as I seem not to be the only one here who rolls his own compiler, the Debg contents that worked for me were:
    <div>{"com.github.fonttools.feaLib":{"GSUB":[</div>

    <div>["dalekh_si.tmp:40360", "ss02_fake_lanna_2", null],</div><div>…</div>

    </code><code><div>["dalekh_si.tmp:41253", "rlig_test", null]], "GPOS":[<br></div>

    <div>["dalekh_si.tmp:40360", "ss02_fake_lanna_2", null],</div><div>…</div><div>["dalekh_si.tmp:41253", "rlig_test", null]], "GPOS":[<br></div>

    <div>["dalekh_si.tmp:45059", "plkp4", null],<br><div>…</div></div>["dalekh_si.tmp:45195", "side_by_side_below", null],<br>["dalekh_si.tmp:45201", "space_mark", null]]}}





Sign In or Register to comment.