Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feaLib source debugging #2052

Merged
merged 18 commits into from Sep 17, 2020
Merged

Conversation

simoncozens
Copy link
Collaborator

This PR is not entirely serious. It's a terrible hack and probably breaks all kinds of abstraction boundaries. But I am using it in my work and it's extremely helpful, so maybe there's a germ of an idea in here worth discussing.

  • Implement the non-standard Debg table, a reverse-DNS namespaced JSON dictionary for holding debugging metadata.
  • For each lookup built by feaLib.builder, add an entry into the Debg dictionary with the original lookup name (if any), feature name / script / language combination (if any), and original source filename and line location.
  • Annotate the ttx output for a lookup with the information from the Debg table.

The end result is ttx files that look like this:

      <!-- ybmkmk: features.fea:84:1 in mkmk (arab/dflt) -->
      <Lookup index="39">
        <LookupType value="6"/>
        <LookupFlag value="2"/><!-- ignoreBaseGlyphs -->
        <!-- SubTableCount=1 -->

@alerque
Copy link
Contributor

alerque commented Aug 25, 2020

This PR is not entirely serious.

That's what draft mode PR's are for ;-) Conversion link under the Reviewers in the sidebar.

@simoncozens
Copy link
Collaborator Author

In theory... in practice, draft mode means "everybody else ignore this until I'm done."

Copy link
Member

@anthrotype anthrotype left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Simon, this is seriously useful.
Can you address the review feedback and add some tests?

@@ -638,6 +642,12 @@ def buildGDEFMarkGlyphSetsDef_(self):
sets.append(glyphs)
return otl.buildMarkGlyphSetsDef(sets, self.glyphMap)

def buildDebg(self):
if not "Debg" in self.font:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if not "Debg" in self.font:
if "Debg" not in self.font:

@@ -647,6 +657,11 @@ def buildLookups_(self, tag):
if lookup.table != tag:
continue
lookup.lookup_index = len(lookups)
self.lookup_locations[tag][len(lookups)] = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks more readable

Suggested change
self.lookup_locations[tag][len(lookups)] = [
self.lookup_locations[tag][lookup.lookup_index] = [

@@ -685,6 +700,9 @@ def makeTable(self, tag):
if len(lookup_indices) == 0 and not size_feature:
continue

for ix in lookup_indices:
self.lookup_locations[tag][ix][2] = key
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm what's [2]? Shall we turn this one into a namedtuple as well?

if not "Debg" in self.font:
self.font["Debg"] = newTable("Debg")
self.font["Debg"].data = {}
self.font["Debg"].data["io.github.fonttools.feaLib"] = self.lookup_locations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in ufoLib we use "com.github.fonttools.ufoLib" as the fileCreator:

fileCreator="com.github.fonttools.ufoLib",

class table_D__e_b_g(DefaultTable.DefaultTable):

def decompile(self, data, ttFont):
self.data = json.loads(data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please use four spaces? I know we aren't consistent elsewhere but for new modules I'd prefer to ditch the tabs

self.data = json.loads(data)

def compile(self, ttFont):
return str.encode(json.dumps(self.data))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why str.encode instead of the more idomatic s.encode() method on a str instrance?
Also, I think we want to explicitly encode as UTF-8, I'm not sure what the default is these days but explicit is better than implicit.

return str.encode(json.dumps(self.data))

def toXML(self, writer, ttFont):
writer.writecdata(self.compile(ttFont).decode())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the self.compile and the extra encode/decode step? Can't you just do writer.writecdata(json.dumps(self.data))?

for i in range(len(value)):
item = value[i]
if str(i) in s:
thisLu = s[str(i)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thisLu reads a bit cryptic to me, plain lookup is just fine

def table(self):
for l in self.Lookup:
for st in l.SubTable:
if isinstance(st, SingleSubst): return "GSUB"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about you check if the subtable class name ends with "Subst" or "Pos" instead?

@@ -1187,6 +1187,52 @@ def preWrite(self, font):
}


class LookupList(BaseTable):
def table(self):
Copy link
Member

@anthrotype anthrotype Sep 9, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

given the method name is a noun instead of a verb, I guess you could make this a @property getter

@@ -58,7 +64,8 @@ def main(args=None):

font = TTFont(options.input_font)
try:
addOpenTypeFeatures(font, options.input_fea, tables=options.tables)
addOpenTypeFeatures(font, options.input_fea, tables=options.tables,
debug=options.debug)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor: this indentation style is not black-compliant

@@ -2,6 +2,7 @@
from fontTools.misc import sstruct
from fontTools.misc.textTools import binary2num, safeEval
from fontTools.feaLib.error import FeatureLibError
from fontTools.feaLib.lookupdebuginfo import LookupDebugInfo, LOOKUP_DEBUG_INFO_KEY
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all-lowercase for multi-word names are unusual in fonttools (or python in general); I suggest you use camelCase like most of the other modules in here

@@ -685,6 +704,9 @@ def makeTable(self, tag):
if len(lookup_indices) == 0 and not size_feature:
continue

for ix in lookup_indices:
self.lookup_locations[tag][str(ix)] = self.lookup_locations[tag][str(ix)]._replace(feature = key)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.lookup_locations[tag][str(ix)] = self.lookup_locations[tag][str(ix)]._replace(feature = key)
self.lookup_locations[tag][str(ix)] = self.lookup_locations[tag][str(ix)]._replace(feature=key)

def table(self):
for l in self.Lookup:
for st in l.SubTable:
if type(st).__name__.endswith("Subst"): return "GSUB"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to always nag about style, but can you please stick to indenting all if blocks over multiple lines, no matther how short, thanks :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to do this, but this would all be so much easier if we could run black on the codebase. If we're going to insist on a code style, let's make the repo reflect that, and then running black on any changes afterwards will produce manageable diffs.

After this is in, I'm going to put in a set of PRs reformatting the parts of the code that I plan to work on.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been hesitant reformatting all codebase with black so far, but now I convinced myself that it's worth doing it once and for all so we don't waste more time on these styling debates. But this only makes sense if at the same time we enforce it for all subsequent commits (e.g. black --check --diff in CI before running test suite).
Let's move the "blacken everything" to its own issue/PR.

path = self.temp_path(suffix=".ttx")
font.saveXML(path, tables=['head', 'name', 'BASE', 'GDEF', 'GSUB',
'GPOS', 'OS/2', 'hhea', 'vhea'])
actual = self.read_ttx(path)
expected = self.read_ttx(expected_ttx)
if replace:
for i in range(len(expected)):
for k,v in replace.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for k,v in replace.items():
for k, v in replace.items():

if replace:
for i in range(len(expected)):
for k,v in replace.items():
expected[i] = expected[i].replace(k,v)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
expected[i] = expected[i].replace(k,v)
expected[i] = expected[i].replace(k, v)

debugttx = self.getpath("%s-debug.ttx" % name)
if os.path.exists(debugttx):
addOpenTypeFeatures(font, feapath,
debug=True)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plaese remove indent and make it a single line with the previous one

if os.path.exists(debugttx):
addOpenTypeFeatures(font, feapath,
debug=True)
self.expect_ttx(font, debugttx, replace = { "__PATH__": feapath })
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.expect_ttx(font, debugttx, replace = { "__PATH__": feapath })
self.expect_ttx(font, debugttx, replace = {"__PATH__": feapath})

@@ -618,7 +628,6 @@ def test_pairPos_enumRuleOverridenBySinglePair_DEBUG(self):
"} test;")
captor.assertRegex('Already defined position for pair A V at')


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't remove this line here. between definitions at module-level there needs to be two line breaks, as per pep8 and black

@simoncozens
Copy link
Collaborator Author

This is now as good as it needs to be technically (IMO), but I am still not sure it should be done!

The positive is some very helpful debugging information, which can also be used by other applications (again, I'm already doing this); the cost is that it introduces a new[1] non-standard and undocumented[2] OT table and creates a complicated special-case in the serialization of lookups.

I'd like to make sure that people are OK with that cost, so would like opinions from others before merging. @justvanrossum, @dscorbett, @brawer?

  1. The reason I didn't use meta is that the intent for Debg is that it can be stripped to produce a production font, while meta should be retained.
  2. And undocumented, although I can easily fix that in CommonType.

@dscorbett
Copy link
Member

I think it’s okay to introduce a non-standard table, as long as it is opt-in and its tag is neither all uppercase nor all lowercase. The cost of an undocumented table is especially low: if there is no documentation, there are no backwards compatibility constraints.

As for the special-case cost for developers, I don’t think it’s too bad, because it is so very useful, but perhaps it could be less of a special case while still being as useful. What if instead of having special code in LookupList.toXML2 (and any other tables for which debugging information is added), the structure of the stored JSON reflects the nested table structure? The XML writer could know that for every structure it writes, it should check the 'Debg' table for nested keys based on the position of that structure (e.g. {"GSUB": {"LookupList": ["features.fea:4:5 in ccmp (DFLT/dflt)", ...]}}). That would make it no longer specific to feaLib. It would also mean that feaLib would have to format the lookup name, tag, script, and language into a string instead of an array, because a generic mechanism wouldn’t know what to do with an array. Maybe this is too complex to bother doing. As long as there’s no documentation, or the documentation makes few guarantees, it can be done later.

table_D__e_b_g.toXML can produce invalid XML, but that’s not a problem specific to this table.

It is unfortunate that LOOKUP_DEBUG_INFO_KEY should begin with “com.github.fonttools” instead of “io.github.fonttools”, since only the latter is a reversed domain name controlled by the fontTools organization. For consistency with ufoLib I guess it’s okay.

FontForge has its own debug table: 'PfEd'. It might have some good ideas for future extensions to 'Debg'.

@Richard57
Copy link

I don't like the storage of the 'feature identification' attribute for lookups. If I understand correctly, the intention is that for each lookup there will be stored a triple of (source location, lookup name, feature identification). Feature is primarily a triple of {script, language, feature) - I presume the last is the 4-letter tag.

When a font supports multiple languages with distinctive rendering, most lookups will be used for most of the languages. Looking at the code, it seems that the (script, language, feature) combination stored will be the last one. One way round this is to have different lookups for different combinations of script, language and feature. At the source code level, this would be bad to inconvenient, depending on how it was implemented. (Commonality could be maintained by use of include files, but even so.) However it was implemented, it would require debug data to be stored for each new lookup, with at the very best extra maps from lookup to offset of the lookup definition.

A better solution would be to store a list of 3-element feature identifications. This would require less debug storage than multiplying lookups, and would avoid increasing complexity in the compilation or encoding of a font.

However, I'm not sure why one needs to store this feature identification in the debug table. The information is already stored in the GSUB and GPOS tables. Is it just to speed up access? The lookup name and source location are, however, useful.

@simoncozens
Copy link
Collaborator Author

One way round this...

I'm not sure why you don't like the storage, or what the problem is that you're trying to get around, so my responses below might not be addressing the problem you're talking about. Can you explain what it is that seems wrong about this?

A better solution would be to store a list of 3-element feature identifications.

Such a list would be indexed by lookup ID, right? Isn't that what this patch does - links lookup IDs to 3-element feature identifiers?

This would require less debug storage than multiplying lookups, and would avoid increasing complexity in the compilation or encoding of a font.

It certainly wouldn't be correct for debugging annotations to change the way that lookups are laid out, but even if it was, in a debugging build, I don't think that how much extra storage the debugging info takes up should be a consideration. Just strip it out if you want an optimized file.

@Richard57
Copy link

I'll give a real example not crafted by me. In the DejaVu Sans font (book, Version 2.30), substitution lookup 2 occurs in the feature ccmp in the Latin script for the default language and the languages ISM, KSM, LSM, MOL, NSM, ROM, SKS and SSM. I think that with the current code, if one recompiled the font (preserving functional equivalence), as is legal, that will end up as being recorded as saying that GSUB lookup 2 is used by the combination ["Latn", "SSM ", "ccmp"], with no mention of the other Saami languages, let alone the default. I don't see any use in recording that in the debug table. (I'm writing arrays for fixed lists of fields.) In full, the record would be something like. That it pertains to lookup 2 is recorded by its being the third element in the array of 'lookup locations'.

What I'm suggesting as a way of recording the lookup location in a programmer-friendly way would be something like:

["location":"dejavusans.txt:7123:3", "name":"slkp2", "feature":[ ["Latn", "default", "ccmp"], ["Latn", "ISM ", "ccmp"], ["Latn", "KSM ", "ccmp"], ["Latn", "LSM ", "ccmp"], ["Latn", "MOL ", "ccmp"], ["Latn", "NSM ", "ccmp"], ["Latn", "ROM ", "ccmp"], ["Latn", "SKS ", "ccmp"], ["Latn", "SSM ", "ccmp"]]]

In a more limited fashion, substitution lookup 0 occurs in feature locl but only for the Romanian and Moldavian languages, in the Latin script.

Or have I badly misread the code, and you are doing this already?

Storing the data this way is no more excessive than any other way of doing it.

Releasing code with debugging information helps others investigate any problems for one. As there are no trade secrets in my fonts, my only objection to documenting them is the effort it takes. I'd rather release a font with the debugging information in it.

@simoncozens
Copy link
Collaborator Author

I'm not sure that what you're saying is correct. We can distinguish between "preamble" lookups and "feature" lookups in AFDFO syntax:

# Preamble lookups here

# lookup 0
lookup thing {
  ...
} thing;

feature ccmp {
    # Feature lookups here
    sub foo by bar; # lookup 1

    script arab
    language URD;
    sub baz by quux; # lookup 2

    script arab;
    language ARA;
    lookup thing; # reference to lookup 0
};

Preamble lookups don't get a feature identification entry in the debug tuple, but feature lookups do: so lookups 1 and 2 will have such entries, 0 will not. And 0 will be the lookup that is shared, potentially between many features.

In other words, the debug info stores information about where the lookup is defined, not where it is "used".

@Richard57
Copy link

OK. The way I allocate lookups to features, everything would be a preamble lookup. Moreover if you want to know what features a lookup is in, you have to use the GSUB/GPOS tables.

@simoncozens
Copy link
Collaborator Author

Well, this PR is specifically a feaLib debugger! Still, you can use tools based on it if you persuade your compiler to place relevant values in the right places in the Debg table. If they’re all shared lookups, then leave the third member of the tuple empty.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants