New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feaLib source debugging #2052
feaLib source debugging #2052
Conversation
That's what draft mode PR's are for ;-) Conversion link under the Reviewers in the sidebar. |
In theory... in practice, draft mode means "everybody else ignore this until I'm done." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Simon, this is seriously useful.
Can you address the review feedback and add some tests?
Lib/fontTools/feaLib/builder.py
Outdated
@@ -638,6 +642,12 @@ def buildGDEFMarkGlyphSetsDef_(self): | |||
sets.append(glyphs) | |||
return otl.buildMarkGlyphSetsDef(sets, self.glyphMap) | |||
|
|||
def buildDebg(self): | |||
if not "Debg" in self.font: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if not "Debg" in self.font: | |
if "Debg" not in self.font: |
Lib/fontTools/feaLib/builder.py
Outdated
@@ -647,6 +657,11 @@ def buildLookups_(self, tag): | |||
if lookup.table != tag: | |||
continue | |||
lookup.lookup_index = len(lookups) | |||
self.lookup_locations[tag][len(lookups)] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks more readable
self.lookup_locations[tag][len(lookups)] = [ | |
self.lookup_locations[tag][lookup.lookup_index] = [ |
Lib/fontTools/feaLib/builder.py
Outdated
@@ -685,6 +700,9 @@ def makeTable(self, tag): | |||
if len(lookup_indices) == 0 and not size_feature: | |||
continue | |||
|
|||
for ix in lookup_indices: | |||
self.lookup_locations[tag][ix][2] = key |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm what's [2]
? Shall we turn this one into a namedtuple as well?
Lib/fontTools/feaLib/builder.py
Outdated
if not "Debg" in self.font: | ||
self.font["Debg"] = newTable("Debg") | ||
self.font["Debg"].data = {} | ||
self.font["Debg"].data["io.github.fonttools.feaLib"] = self.lookup_locations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ufoLib we use "com.github.fonttools.ufoLib" as the fileCreator:
fonttools/Lib/fontTools/ufoLib/__init__.py
Line 887 in 097b255
fileCreator="com.github.fonttools.ufoLib", |
class table_D__e_b_g(DefaultTable.DefaultTable): | ||
|
||
def decompile(self, data, ttFont): | ||
self.data = json.loads(data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please use four spaces? I know we aren't consistent elsewhere but for new modules I'd prefer to ditch the tabs
self.data = json.loads(data) | ||
|
||
def compile(self, ttFont): | ||
return str.encode(json.dumps(self.data)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why str.encode
instead of the more idomatic s.encode()
method on a str instrance?
Also, I think we want to explicitly encode as UTF-8, I'm not sure what the default is these days but explicit is better than implicit.
return str.encode(json.dumps(self.data)) | ||
|
||
def toXML(self, writer, ttFont): | ||
writer.writecdata(self.compile(ttFont).decode()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the self.compile and the extra encode/decode step? Can't you just do writer.writecdata(json.dumps(self.data))
?
for i in range(len(value)): | ||
item = value[i] | ||
if str(i) in s: | ||
thisLu = s[str(i)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thisLu
reads a bit cryptic to me, plain lookup
is just fine
def table(self): | ||
for l in self.Lookup: | ||
for st in l.SubTable: | ||
if isinstance(st, SingleSubst): return "GSUB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about you check if the subtable class name ends with "Subst" or "Pos" instead?
@@ -1187,6 +1187,52 @@ def preWrite(self, font): | |||
} | |||
|
|||
|
|||
class LookupList(BaseTable): | |||
def table(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
given the method name is a noun instead of a verb, I guess you could make this a @property
getter
…ound-tripping without writing the file out.
Lib/fontTools/feaLib/__main__.py
Outdated
@@ -58,7 +64,8 @@ def main(args=None): | |||
|
|||
font = TTFont(options.input_font) | |||
try: | |||
addOpenTypeFeatures(font, options.input_fea, tables=options.tables) | |||
addOpenTypeFeatures(font, options.input_fea, tables=options.tables, | |||
debug=options.debug) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
minor: this indentation style is not black-compliant
Lib/fontTools/feaLib/builder.py
Outdated
@@ -2,6 +2,7 @@ | |||
from fontTools.misc import sstruct | |||
from fontTools.misc.textTools import binary2num, safeEval | |||
from fontTools.feaLib.error import FeatureLibError | |||
from fontTools.feaLib.lookupdebuginfo import LookupDebugInfo, LOOKUP_DEBUG_INFO_KEY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all-lowercase for multi-word names are unusual in fonttools (or python in general); I suggest you use camelCase like most of the other modules in here
Lib/fontTools/feaLib/builder.py
Outdated
@@ -685,6 +704,9 @@ def makeTable(self, tag): | |||
if len(lookup_indices) == 0 and not size_feature: | |||
continue | |||
|
|||
for ix in lookup_indices: | |||
self.lookup_locations[tag][str(ix)] = self.lookup_locations[tag][str(ix)]._replace(feature = key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.lookup_locations[tag][str(ix)] = self.lookup_locations[tag][str(ix)]._replace(feature = key) | |
self.lookup_locations[tag][str(ix)] = self.lookup_locations[tag][str(ix)]._replace(feature=key) |
def table(self): | ||
for l in self.Lookup: | ||
for st in l.SubTable: | ||
if type(st).__name__.endswith("Subst"): return "GSUB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to always nag about style, but can you please stick to indenting all if
blocks over multiple lines, no matther how short, thanks :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to do this, but this would all be so much easier if we could run black on the codebase. If we're going to insist on a code style, let's make the repo reflect that, and then running black on any changes afterwards will produce manageable diffs.
After this is in, I'm going to put in a set of PRs reformatting the parts of the code that I plan to work on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've been hesitant reformatting all codebase with black so far, but now I convinced myself that it's worth doing it once and for all so we don't waste more time on these styling debates. But this only makes sense if at the same time we enforce it for all subsequent commits (e.g. black --check --diff
in CI before running test suite).
Let's move the "blacken everything" to its own issue/PR.
Tests/feaLib/builder_test.py
Outdated
path = self.temp_path(suffix=".ttx") | ||
font.saveXML(path, tables=['head', 'name', 'BASE', 'GDEF', 'GSUB', | ||
'GPOS', 'OS/2', 'hhea', 'vhea']) | ||
actual = self.read_ttx(path) | ||
expected = self.read_ttx(expected_ttx) | ||
if replace: | ||
for i in range(len(expected)): | ||
for k,v in replace.items(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for k,v in replace.items(): | |
for k, v in replace.items(): |
Tests/feaLib/builder_test.py
Outdated
if replace: | ||
for i in range(len(expected)): | ||
for k,v in replace.items(): | ||
expected[i] = expected[i].replace(k,v) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
expected[i] = expected[i].replace(k,v) | |
expected[i] = expected[i].replace(k, v) |
Tests/feaLib/builder_test.py
Outdated
debugttx = self.getpath("%s-debug.ttx" % name) | ||
if os.path.exists(debugttx): | ||
addOpenTypeFeatures(font, feapath, | ||
debug=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
plaese remove indent and make it a single line with the previous one
Tests/feaLib/builder_test.py
Outdated
if os.path.exists(debugttx): | ||
addOpenTypeFeatures(font, feapath, | ||
debug=True) | ||
self.expect_ttx(font, debugttx, replace = { "__PATH__": feapath }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self.expect_ttx(font, debugttx, replace = { "__PATH__": feapath }) | |
self.expect_ttx(font, debugttx, replace = {"__PATH__": feapath}) |
Tests/feaLib/builder_test.py
Outdated
@@ -618,7 +628,6 @@ def test_pairPos_enumRuleOverridenBySinglePair_DEBUG(self): | |||
"} test;") | |||
captor.assertRegex('Already defined position for pair A V at') | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't remove this line here. between definitions at module-level there needs to be two line breaks, as per pep8 and black
This is now as good as it needs to be technically (IMO), but I am still not sure it should be done! The positive is some very helpful debugging information, which can also be used by other applications (again, I'm already doing this); the cost is that it introduces a new[1] non-standard and undocumented[2] OT table and creates a complicated special-case in the serialization of lookups. I'd like to make sure that people are OK with that cost, so would like opinions from others before merging. @justvanrossum, @dscorbett, @brawer?
|
I think it’s okay to introduce a non-standard table, as long as it is opt-in and its tag is neither all uppercase nor all lowercase. The cost of an undocumented table is especially low: if there is no documentation, there are no backwards compatibility constraints. As for the special-case cost for developers, I don’t think it’s too bad, because it is so very useful, but perhaps it could be less of a special case while still being as useful. What if instead of having special code in
It is unfortunate that FontForge has its own debug table: 'PfEd'. It might have some good ideas for future extensions to 'Debg'. |
I don't like the storage of the 'feature identification' attribute for lookups. If I understand correctly, the intention is that for each lookup there will be stored a triple of (source location, lookup name, feature identification). Feature is primarily a triple of {script, language, feature) - I presume the last is the 4-letter tag. When a font supports multiple languages with distinctive rendering, most lookups will be used for most of the languages. Looking at the code, it seems that the (script, language, feature) combination stored will be the last one. One way round this is to have different lookups for different combinations of script, language and feature. At the source code level, this would be bad to inconvenient, depending on how it was implemented. (Commonality could be maintained by use of include files, but even so.) However it was implemented, it would require debug data to be stored for each new lookup, with at the very best extra maps from lookup to offset of the lookup definition. A better solution would be to store a list of 3-element feature identifications. This would require less debug storage than multiplying lookups, and would avoid increasing complexity in the compilation or encoding of a font. However, I'm not sure why one needs to store this feature identification in the debug table. The information is already stored in the GSUB and GPOS tables. Is it just to speed up access? The lookup name and source location are, however, useful. |
I'm not sure why you don't like the storage, or what the problem is that you're trying to get around, so my responses below might not be addressing the problem you're talking about. Can you explain what it is that seems wrong about this?
Such a list would be indexed by lookup ID, right? Isn't that what this patch does - links lookup IDs to 3-element feature identifiers?
It certainly wouldn't be correct for debugging annotations to change the way that lookups are laid out, but even if it was, in a debugging build, I don't think that how much extra storage the debugging info takes up should be a consideration. Just strip it out if you want an optimized file. |
I'll give a real example not crafted by me. In the DejaVu Sans font (book, Version 2.30), substitution lookup 2 occurs in the feature ccmp in the Latin script for the default language and the languages ISM, KSM, LSM, MOL, NSM, ROM, SKS and SSM. I think that with the current code, if one recompiled the font (preserving functional equivalence), as is legal, that will end up as being recorded as saying that GSUB lookup 2 is used by the combination ["Latn", "SSM ", "ccmp"], with no mention of the other Saami languages, let alone the default. I don't see any use in recording that in the debug table. (I'm writing arrays for fixed lists of fields.) In full, the record would be something like. That it pertains to lookup 2 is recorded by its being the third element in the array of 'lookup locations'. What I'm suggesting as a way of recording the lookup location in a programmer-friendly way would be something like:
In a more limited fashion, substitution lookup 0 occurs in feature locl but only for the Romanian and Moldavian languages, in the Latin script. Or have I badly misread the code, and you are doing this already? Storing the data this way is no more excessive than any other way of doing it. Releasing code with debugging information helps others investigate any problems for one. As there are no trade secrets in my fonts, my only objection to documenting them is the effort it takes. I'd rather release a font with the debugging information in it. |
I'm not sure that what you're saying is correct. We can distinguish between "preamble" lookups and "feature" lookups in AFDFO syntax:
Preamble lookups don't get a feature identification entry in the debug tuple, but feature lookups do: so lookups 1 and 2 will have such entries, 0 will not. And 0 will be the lookup that is shared, potentially between many features. In other words, the debug info stores information about where the lookup is defined, not where it is "used". |
OK. The way I allocate lookups to features, everything would be a preamble lookup. Moreover if you want to know what features a lookup is in, you have to use the GSUB/GPOS tables. |
Well, this PR is specifically a feaLib debugger! Still, you can use tools based on it if you persuade your compiler to place relevant values in the right places in the Debg table. If they’re all shared lookups, then leave the third member of the tuple empty. |
This PR is not entirely serious. It's a terrible hack and probably breaks all kinds of abstraction boundaries. But I am using it in my work and it's extremely helpful, so maybe there's a germ of an idea in here worth discussing.
Debg
table, a reverse-DNS namespaced JSON dictionary for holding debugging metadata.Debg
dictionary with the original lookup name (if any), feature name / script / language combination (if any), and original source filename and line location.ttx
output for a lookup with the information from theDebg
table.The end result is ttx files that look like this: