feaLib source debugging #2052

simoncozens · 2020-08-25T19:32:13Z

This PR is not entirely serious. It's a terrible hack and probably breaks all kinds of abstraction boundaries. But I am using it in my work and it's extremely helpful, so maybe there's a germ of an idea in here worth discussing.

Implement the non-standard Debg table, a reverse-DNS namespaced JSON dictionary for holding debugging metadata.
For each lookup built by feaLib.builder, add an entry into the Debg dictionary with the original lookup name (if any), feature name / script / language combination (if any), and original source filename and line location.
Annotate the ttx output for a lookup with the information from the Debg table.

The end result is ttx files that look like this:

      <!-- ybmkmk: features.fea:84:1 in mkmk (arab/dflt) -->
      <Lookup index="39">
        <LookupType value="6"/>
        <LookupFlag value="2"/><!-- ignoreBaseGlyphs -->
        <!-- SubTableCount=1 -->

alerque · 2020-08-25T19:34:17Z

This PR is not entirely serious.

That's what draft mode PR's are for ;-) Conversion link under the Reviewers in the sidebar.

simoncozens · 2020-08-25T19:37:13Z

In theory... in practice, draft mode means "everybody else ignore this until I'm done."

anthrotype

Thanks Simon, this is seriously useful.
Can you address the review feedback and add some tests?

anthrotype · 2020-09-09T08:59:39Z

Lib/fontTools/feaLib/builder.py

@@ -638,6 +642,12 @@ def buildGDEFMarkGlyphSetsDef_(self):
            sets.append(glyphs)
        return otl.buildMarkGlyphSetsDef(sets, self.glyphMap)

+    def buildDebg(self):
+        if not "Debg" in self.font:


Suggested change

if not "Debg" in self.font:

if "Debg" not in self.font:

anthrotype · 2020-09-09T09:01:26Z

Lib/fontTools/feaLib/builder.py

@@ -647,6 +657,11 @@ def buildLookups_(self, tag):
            if lookup.table != tag:
                continue
            lookup.lookup_index = len(lookups)
+            self.lookup_locations[tag][len(lookups)] = [


this looks more readable

Suggested change

self.lookup_locations[tag][len(lookups)] = [

self.lookup_locations[tag][lookup.lookup_index] = [

anthrotype · 2020-09-09T09:06:58Z

Lib/fontTools/feaLib/builder.py

@@ -685,6 +700,9 @@ def makeTable(self, tag):
            if len(lookup_indices) == 0 and not size_feature:
                continue

+            for ix in lookup_indices:
+                self.lookup_locations[tag][ix][2] = key


hm what's [2]? Shall we turn this one into a namedtuple as well?

anthrotype · 2020-09-09T09:09:13Z

Lib/fontTools/feaLib/builder.py

+        if not "Debg" in self.font:
+            self.font["Debg"] = newTable("Debg")
+            self.font["Debg"].data = {}
+        self.font["Debg"].data["io.github.fonttools.feaLib"] = self.lookup_locations


in ufoLib we use "com.github.fonttools.ufoLib" as the fileCreator:

fonttools/Lib/fontTools/ufoLib/__init__.py

Line 887 in 097b255

fileCreator="com.github.fonttools.ufoLib",

anthrotype · 2020-09-09T09:10:11Z

Lib/fontTools/ttLib/tables/D__e_b_g.py

+class table_D__e_b_g(DefaultTable.DefaultTable):
+
+	def decompile(self, data, ttFont):
+		self.data = json.loads(data)


can you please use four spaces? I know we aren't consistent elsewhere but for new modules I'd prefer to ditch the tabs

anthrotype · 2020-09-09T09:11:11Z

Lib/fontTools/ttLib/tables/D__e_b_g.py

+		self.data = json.loads(data)
+
+	def compile(self, ttFont):
+		return str.encode(json.dumps(self.data))


why str.encode instead of the more idomatic s.encode() method on a str instrance?
Also, I think we want to explicitly encode as UTF-8, I'm not sure what the default is these days but explicit is better than implicit.

anthrotype · 2020-09-09T09:13:54Z

Lib/fontTools/ttLib/tables/D__e_b_g.py

+		return str.encode(json.dumps(self.data))
+
+	def toXML(self, writer, ttFont):
+		writer.writecdata(self.compile(ttFont).decode())


why the self.compile and the extra encode/decode step? Can't you just do writer.writecdata(json.dumps(self.data))?

anthrotype · 2020-09-09T09:18:16Z

Lib/fontTools/ttLib/tables/otTables.py

+				for i in range(len(value)):
+					item = value[i]
+					if str(i) in s:
+						thisLu = s[str(i)]


thisLu reads a bit cryptic to me, plain lookup is just fine

anthrotype · 2020-09-09T09:18:46Z

Lib/fontTools/ttLib/tables/otTables.py

+	def table(self):
+		for l in self.Lookup:
+			for st in l.SubTable:
+				if isinstance(st, SingleSubst): return "GSUB"


how about you check if the subtable class name ends with "Subst" or "Pos" instead?

anthrotype · 2020-09-09T09:19:26Z

Lib/fontTools/ttLib/tables/otTables.py

@@ -1187,6 +1187,52 @@ def preWrite(self, font):
 		}


+class LookupList(BaseTable):
+	def table(self):


given the method name is a noun instead of a verb, I guess you could make this a @property getter

…ound-tripping without writing the file out.

anthrotype · 2020-09-10T16:04:06Z

Lib/fontTools/feaLib/__main__.py

@@ -58,7 +64,8 @@ def main(args=None):

    font = TTFont(options.input_font)
    try:
-        addOpenTypeFeatures(font, options.input_fea, tables=options.tables)
+        addOpenTypeFeatures(font, options.input_fea, tables=options.tables,
+            debug=options.debug)


minor: this indentation style is not black-compliant

anthrotype · 2020-09-10T16:05:59Z

Lib/fontTools/feaLib/builder.py

@@ -2,6 +2,7 @@
 from fontTools.misc import sstruct
 from fontTools.misc.textTools import binary2num, safeEval
 from fontTools.feaLib.error import FeatureLibError
+from fontTools.feaLib.lookupdebuginfo import LookupDebugInfo, LOOKUP_DEBUG_INFO_KEY


all-lowercase for multi-word names are unusual in fonttools (or python in general); I suggest you use camelCase like most of the other modules in here

anthrotype · 2020-09-10T16:07:26Z

Lib/fontTools/feaLib/builder.py

@@ -685,6 +704,9 @@ def makeTable(self, tag):
            if len(lookup_indices) == 0 and not size_feature:
                continue

+            for ix in lookup_indices:
+                self.lookup_locations[tag][str(ix)] = self.lookup_locations[tag][str(ix)]._replace(feature = key)


Suggested change

self.lookup_locations[tag][str(ix)] = self.lookup_locations[tag][str(ix)]._replace(feature = key)

self.lookup_locations[tag][str(ix)] = self.lookup_locations[tag][str(ix)]._replace(feature=key)

anthrotype · 2020-09-10T16:12:23Z

Lib/fontTools/ttLib/tables/otTables.py

+	def table(self):
+		for l in self.Lookup:
+			for st in l.SubTable:
+				if type(st).__name__.endswith("Subst"): return "GSUB"


sorry to always nag about style, but can you please stick to indenting all if blocks over multiple lines, no matther how short, thanks :)

Happy to do this, but this would all be so much easier if we could run black on the codebase. If we're going to insist on a code style, let's make the repo reflect that, and then running black on any changes afterwards will produce manageable diffs.

After this is in, I'm going to put in a set of PRs reformatting the parts of the code that I plan to work on.

I've been hesitant reformatting all codebase with black so far, but now I convinced myself that it's worth doing it once and for all so we don't waste more time on these styling debates. But this only makes sense if at the same time we enforce it for all subsequent commits (e.g. black --check --diff in CI before running test suite).
Let's move the "blacken everything" to its own issue/PR.

anthrotype · 2020-09-10T16:15:09Z

Tests/feaLib/builder_test.py

        path = self.temp_path(suffix=".ttx")
        font.saveXML(path, tables=['head', 'name', 'BASE', 'GDEF', 'GSUB',
                                   'GPOS', 'OS/2', 'hhea', 'vhea'])
        actual = self.read_ttx(path)
        expected = self.read_ttx(expected_ttx)
+        if replace:
+            for i in range(len(expected)):
+                for k,v in replace.items():


Suggested change

for k,v in replace.items():

for k, v in replace.items():

anthrotype · 2020-09-10T16:15:24Z

Tests/feaLib/builder_test.py

+        if replace:
+            for i in range(len(expected)):
+                for k,v in replace.items():
+                    expected[i] = expected[i].replace(k,v)


Suggested change

expected[i] = expected[i].replace(k,v)

expected[i] = expected[i].replace(k, v)

anthrotype · 2020-09-10T16:16:59Z

Tests/feaLib/builder_test.py

+        debugttx = self.getpath("%s-debug.ttx" % name)
+        if os.path.exists(debugttx):
+            addOpenTypeFeatures(font, feapath,
+                debug=True)


plaese remove indent and make it a single line with the previous one

anthrotype · 2020-09-10T16:17:38Z

Tests/feaLib/builder_test.py

+        if os.path.exists(debugttx):
+            addOpenTypeFeatures(font, feapath,
+                debug=True)
+            self.expect_ttx(font, debugttx, replace = { "__PATH__": feapath })


Suggested change

self.expect_ttx(font, debugttx, replace = { "__PATH__": feapath })

self.expect_ttx(font, debugttx, replace = {"__PATH__": feapath})

anthrotype · 2020-09-10T16:19:07Z

Tests/feaLib/builder_test.py

@@ -618,7 +628,6 @@ def test_pairPos_enumRuleOverridenBySinglePair_DEBUG(self):
                "} test;")
        captor.assertRegex('Already defined position for pair A V at')

-


don't remove this line here. between definitions at module-level there needs to be two line breaks, as per pep8 and black

simoncozens · 2020-09-13T09:12:42Z

This is now as good as it needs to be technically (IMO), but I am still not sure it should be done!

The positive is some very helpful debugging information, which can also be used by other applications (again, I'm already doing this); the cost is that it introduces a new[1] non-standard and undocumented[2] OT table and creates a complicated special-case in the serialization of lookups.

I'd like to make sure that people are OK with that cost, so would like opinions from others before merging. @justvanrossum, @dscorbett, @brawer?

The reason I didn't use meta is that the intent for Debg is that it can be stripped to produce a production font, while meta should be retained.
And undocumented, although I can easily fix that in CommonType.

dscorbett · 2020-09-13T16:00:56Z

I think it’s okay to introduce a non-standard table, as long as it is opt-in and its tag is neither all uppercase nor all lowercase. The cost of an undocumented table is especially low: if there is no documentation, there are no backwards compatibility constraints.

As for the special-case cost for developers, I don’t think it’s too bad, because it is so very useful, but perhaps it could be less of a special case while still being as useful. What if instead of having special code in LookupList.toXML2 (and any other tables for which debugging information is added), the structure of the stored JSON reflects the nested table structure? The XML writer could know that for every structure it writes, it should check the 'Debg' table for nested keys based on the position of that structure (e.g. {"GSUB": {"LookupList": ["features.fea:4:5 in ccmp (DFLT/dflt)", ...]}}). That would make it no longer specific to feaLib. It would also mean that feaLib would have to format the lookup name, tag, script, and language into a string instead of an array, because a generic mechanism wouldn’t know what to do with an array. Maybe this is too complex to bother doing. As long as there’s no documentation, or the documentation makes few guarantees, it can be done later.

table_D__e_b_g.toXML can produce invalid XML, but that’s not a problem specific to this table.

It is unfortunate that LOOKUP_DEBUG_INFO_KEY should begin with “com.github.fonttools” instead of “io.github.fonttools”, since only the latter is a reversed domain name controlled by the fontTools organization. For consistency with ufoLib I guess it’s okay.

FontForge has its own debug table: 'PfEd'. It might have some good ideas for future extensions to 'Debg'.

Richard57 · 2020-09-15T12:33:53Z

I don't like the storage of the 'feature identification' attribute for lookups. If I understand correctly, the intention is that for each lookup there will be stored a triple of (source location, lookup name, feature identification). Feature is primarily a triple of {script, language, feature) - I presume the last is the 4-letter tag.

When a font supports multiple languages with distinctive rendering, most lookups will be used for most of the languages. Looking at the code, it seems that the (script, language, feature) combination stored will be the last one. One way round this is to have different lookups for different combinations of script, language and feature. At the source code level, this would be bad to inconvenient, depending on how it was implemented. (Commonality could be maintained by use of include files, but even so.) However it was implemented, it would require debug data to be stored for each new lookup, with at the very best extra maps from lookup to offset of the lookup definition.

A better solution would be to store a list of 3-element feature identifications. This would require less debug storage than multiplying lookups, and would avoid increasing complexity in the compilation or encoding of a font.

However, I'm not sure why one needs to store this feature identification in the debug table. The information is already stored in the GSUB and GPOS tables. Is it just to speed up access? The lookup name and source location are, however, useful.

simoncozens · 2020-09-15T13:39:44Z

One way round this...

I'm not sure why you don't like the storage, or what the problem is that you're trying to get around, so my responses below might not be addressing the problem you're talking about. Can you explain what it is that seems wrong about this?

A better solution would be to store a list of 3-element feature identifications.

Such a list would be indexed by lookup ID, right? Isn't that what this patch does - links lookup IDs to 3-element feature identifiers?

This would require less debug storage than multiplying lookups, and would avoid increasing complexity in the compilation or encoding of a font.

It certainly wouldn't be correct for debugging annotations to change the way that lookups are laid out, but even if it was, in a debugging build, I don't think that how much extra storage the debugging info takes up should be a consideration. Just strip it out if you want an optimized file.

Richard57 · 2020-09-15T15:41:01Z

I'll give a real example not crafted by me. In the DejaVu Sans font (book, Version 2.30), substitution lookup 2 occurs in the feature ccmp in the Latin script for the default language and the languages ISM, KSM, LSM, MOL, NSM, ROM, SKS and SSM. I think that with the current code, if one recompiled the font (preserving functional equivalence), as is legal, that will end up as being recorded as saying that GSUB lookup 2 is used by the combination ["Latn", "SSM ", "ccmp"], with no mention of the other Saami languages, let alone the default. I don't see any use in recording that in the debug table. (I'm writing arrays for fixed lists of fields.) In full, the record would be something like. That it pertains to lookup 2 is recorded by its being the third element in the array of 'lookup locations'.

What I'm suggesting as a way of recording the lookup location in a programmer-friendly way would be something like:

["location":"dejavusans.txt:7123:3", "name":"slkp2", "feature":[ ["Latn", "default", "ccmp"], ["Latn", "ISM ", "ccmp"], ["Latn", "KSM ", "ccmp"], ["Latn", "LSM ", "ccmp"], ["Latn", "MOL ", "ccmp"], ["Latn", "NSM ", "ccmp"], ["Latn", "ROM ", "ccmp"], ["Latn", "SKS ", "ccmp"], ["Latn", "SSM ", "ccmp"]]]

In a more limited fashion, substitution lookup 0 occurs in feature locl but only for the Romanian and Moldavian languages, in the Latin script.

Or have I badly misread the code, and you are doing this already?

Storing the data this way is no more excessive than any other way of doing it.

Releasing code with debugging information helps others investigate any problems for one. As there are no trade secrets in my fonts, my only objection to documenting them is the effort it takes. I'd rather release a font with the debugging information in it.

simoncozens · 2020-09-15T17:57:18Z

I'm not sure that what you're saying is correct. We can distinguish between "preamble" lookups and "feature" lookups in AFDFO syntax:

# Preamble lookups here

# lookup 0
lookup thing {
  ...
} thing;

feature ccmp {
    # Feature lookups here
    sub foo by bar; # lookup 1

    script arab
    language URD;
    sub baz by quux; # lookup 2

    script arab;
    language ARA;
    lookup thing; # reference to lookup 0
};

Preamble lookups don't get a feature identification entry in the debug tuple, but feature lookups do: so lookups 1 and 2 will have such entries, 0 will not. And 0 will be the lookup that is shared, potentially between many features.

In other words, the debug info stores information about where the lookup is defined, not where it is "used".

Richard57 · 2020-09-15T19:30:17Z

OK. The way I allocate lookups to features, everything would be a preamble lookup. Moreover if you want to know what features a lookup is in, you have to use the GSUB/GPOS tables.

simoncozens · 2020-09-15T20:48:19Z

Well, this PR is specifically a feaLib debugger! Still, you can use tools based on it if you persuade your compiler to place relevant values in the right places in the Debg table. If they’re all shared lookups, then leave the third member of the tuple empty.

simoncozens added 5 commits August 25, 2020 14:06

The long-threatened Debg table!

aebb2e5

Dump feature locations into Debg

2e112a4

Parse Debg info in LookupList

50e1105

Make Debg table optional

e2433bd

Namespace debug output.

b548c5f

anthrotype approved these changes Sep 9, 2020

View reviewed changes

simoncozens added 10 commits September 9, 2020 19:56

Use a named tuple, tidy ttx dump code

4cddc38

Tidy up as per review

41c1dcc

Reformat, tidy from review comments

98803b7

Add debug arg to addOpenTypeFeatures

b680eaf

Add debug flag to feaLib command line utility

8869e0d

Add debug tests

70b5142

Ensure LookupCount is processed

4e70b30

JSON ends up stringifying number keys, so do that now in case we’re r…

87c65ac

…ound-tripping without writing the file out.

Serialize JSON object before writing to XML

02c305c

Prefer module constant to hard-coded string for Debg table key

af4f1f6

anthrotype reviewed Sep 10, 2020

View reviewed changes

simoncozens added 3 commits September 11, 2020 09:31

Re-case debug info module

95a941e

Style nits

7135255

Debugging should not be the default (oops)

2ef66ea

simoncozens merged commit a18b6bf into fonttools:master Sep 17, 2020

devosb mentioned this pull request Sep 22, 2020

new Debg table breaks our font builds #2065

Closed

simoncozens mentioned this pull request Nov 19, 2020

Turn on feaLib debugging from the environment #2106

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feaLib source debugging #2052

feaLib source debugging #2052

simoncozens commented Aug 25, 2020

alerque commented Aug 25, 2020

simoncozens commented Aug 25, 2020

anthrotype left a comment

anthrotype Sep 9, 2020

anthrotype Sep 9, 2020

anthrotype Sep 9, 2020

anthrotype Sep 9, 2020

anthrotype Sep 9, 2020

anthrotype Sep 9, 2020

anthrotype Sep 9, 2020

anthrotype Sep 9, 2020

anthrotype Sep 9, 2020

anthrotype Sep 9, 2020 •

edited

anthrotype Sep 10, 2020

anthrotype Sep 10, 2020

anthrotype Sep 10, 2020

anthrotype Sep 10, 2020

simoncozens Sep 11, 2020

anthrotype Sep 11, 2020

anthrotype Sep 10, 2020

anthrotype Sep 10, 2020

anthrotype Sep 10, 2020

anthrotype Sep 10, 2020

anthrotype Sep 10, 2020

simoncozens commented Sep 13, 2020

dscorbett commented Sep 13, 2020

Richard57 commented Sep 15, 2020

simoncozens commented Sep 15, 2020

Richard57 commented Sep 15, 2020

simoncozens commented Sep 15, 2020

Richard57 commented Sep 15, 2020

simoncozens commented Sep 15, 2020

	self.lookup_locations[tag][len(lookups)] = [
	self.lookup_locations[tag][lookup.lookup_index] = [

	self.lookup_locations[tag][str(ix)] = self.lookup_locations[tag][str(ix)]._replace(feature = key)
	self.lookup_locations[tag][str(ix)] = self.lookup_locations[tag][str(ix)]._replace(feature=key)

	expected[i] = expected[i].replace(k,v)
	expected[i] = expected[i].replace(k, v)

	self.expect_ttx(font, debugttx, replace = { "__PATH__": feapath })
	self.expect_ttx(font, debugttx, replace = {"__PATH__": feapath})

		@@ -618,7 +628,6 @@ def test_pairPos_enumRuleOverridenBySinglePair_DEBUG(self):
		"} test;")
		captor.assertRegex('Already defined position for pair A V at')

feaLib source debugging #2052

feaLib source debugging #2052

Conversation

simoncozens commented Aug 25, 2020

alerque commented Aug 25, 2020

simoncozens commented Aug 25, 2020

anthrotype left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anthrotype Sep 9, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

simoncozens commented Sep 13, 2020

dscorbett commented Sep 13, 2020

Richard57 commented Sep 15, 2020

simoncozens commented Sep 15, 2020

Richard57 commented Sep 15, 2020

simoncozens commented Sep 15, 2020

Richard57 commented Sep 15, 2020

simoncozens commented Sep 15, 2020

anthrotype Sep 9, 2020 •

edited