I trained a neural network to kern a font (mostly)

TLDR: I basically have autokerning working. I consider the concept proven. It needs a little bit more polishing, but I don't have time right now. Will come back to it later.

Ever since watching the video of the panel discussion at Typolabs, I have been intrigued with the idea of using a neural network to kern fonts. As Just made it clear, as the number of expected glyphs in a font grows, the number of kerning pairs grows exponentially and, even with kern classes, this isn't a job that human beings should still be doing. At the very least it something we should be trying to get automated.

I've had various philosophical discussions about whether we should be solving spacing or kerning or both, but the way I approach this kind of task is to automate the jobs we're already doing. While of course we integrate spacing and kerning into the design, the usual rule is to space first and kern later, so I wanted to attack this part of the job. We have autospacers; can we next have an autokerner? (Of course we can - Igino Marini has one. But I think we should all get to have one.)

So for the past few months I have been reading all the books about machine learning and deep neural networks, and fiddling around with various ideas to make autokerning happen.

Approach

A neural network solution has two components: a trainer, which learns from available data, and a predictor, which gives you output from data it hasn't seen before. My original plan would be to semi-automate the process: that is, you kern half the font the way you want it kerned, and it learns from that and kerns the other half for you. But neural networks learn best on lots and lots of data, so in order to improve my results I ended up throwing all the fonts I could find (that I considered well-designed and well kerned) at it. Now I have something which does a reasonable job at kerning a font it has not seen before.

So in this case, the training component takes a directory full of fonts together with, for each font, a list of kern pairs generated by kerndump. It then extracts, for each glyph that it considers "safe", a set of 100 samples of the left and right sidebearings, from the origin to the ascender. By samples of sidebearings, this is what I mean: (If you've studied how the HT Autospacer works, you'll be familiar with the concept.) 

Let's call them the "right contour" and the "left contour" because that's what they represent. The right contour of this /a is an array of numbers [576,576,576,250,142,81,81,81,81,...].

The contours are then divided by the width of the font's /m in units, so that different fonts can be compared on the same basis. These contours are extracted for all the safe glyphs; a "safe" glyph is one that I can be reasonably sure that the designer will have checked the kerning against other safe glyphs: upper and lower case basic latin letters, numbers, comma, colon, period.

Now we feed the neural network a bunch of sets of contours. For each kern pair XY, we feed the right contour from X and the left contour from Y. We also give as input to the network the right contour of /n, the left contour of /o, the right contour of /H and the left contour of /O - this is for two reasons; first, because you can't "correctly kern" a pair in isolation without looking at the surround context of a word, and second, so that the network learns how the different contours found in different fonts and different styles affects the kerning process.

Each of these four contours are passed to a 1 dimensional convolutional layer, which essentially extracts the "shapeness" of it. 2-dimensional convolutional layers are used in image recognition for extracting edges and other salient features from a noisy photograph; a 1d layer does the same sort of thing with a one-dimensional object, and so I'm using it to work out what the contour "means" as a shape.

These six convolutional layers are then thrown into one big pot and boiled down through various-sized layers to generate an output. (It's not very easy to describe how neural networks actually work. They're basically a lot of numbers that get multiplied and added together, but it's pretty much impossible to say what each set of numbers contributes to the process.)

I initially planned to have the system output a kern value, making this a regression problem. However, regression problems are really hard for neural networks to do. The kern value under regression could be any float value from minus infinity to plus infinity, when real kern values don't work like that. NNs are much better at classification problems, so next I looked at a three-way classification: given this pair (and the background context /n/o/H/O) does it need to be kerned tighter, looser, or left alone?

Of course it's quite useful to know that a pair needs kerning, but what we really want to know is by how much. So once I had got the three-way classification working to a good degree of accuracy (about 99%), I then got the network to classify a pair into one of 26 "kern buckets". As before, everything is scaled to the /m width to allow comparison between different fonts, so one bucket might be, for example, "negative kern somewhere between 0.1875 and of 0.125 /m width" (giving you a range of, say, -150 to -100 units with a fairly condensed font). Other buckets, particularly around the zero mark, are narrower: "between -5 and 0 units", "between 5 and 10 units" and so on.

Results

I said I had got to 99% accuracy in a three-way classification. To be honest I am no longer sure what degree of accuracy I am getting. It looks good, but it needs more work.

The reason I'm not sure is this: most pairs in most fonts aren't kerned, or have zero kern. There's a subtle but important difference between "aren't kerned" and "have zero kern". If the designer explicitly put a zero in the kern table for that pair, then great, they're happy with how it looks. But if they didn't put a zero in the kern table, there's no kern - which is effectively a kern value of zero - except that this time, we don't know whether the designer looked at the pair with no kern and decided no action was needed, or whether the designer didn't look at it at all and it might actually be ugly and we don't want to use it for learning. (Put your hands up if you kerned /t/Q recently.)

So for a while I didn't want to treat an empty kern entry as useful data. I only fed it explicit entries from the kerning tables. There aren't many explicit zero-valued kerns, so of course the network learned to apply kerning to nearly everything, because that is what it saw. But a font should have lots of zero-valued kern pairs, so I had to put the "silent" zeros back in. And a font should have a lot of them, so I couldn't artificially balance the size of each bucket. The network should learn that most kern pairs are zero.

And of course this makes it pretty had to assess accuracy, because the network learns that if it guesses zero for everything then it gets the right answer 90% of the time, so doesn't bother learning anything else. So I had to write my own loss function which penalized the network very heavily every time it guessed a zero when there should have been a kern. At this point accuracy started going down, because I really wanted the network to be more interested in getting things wrong in interesting ways than getting them right in stupid ways. (This is also a good principle for life in general.)

Here are some of the results of running the predictor on a font that the network had not seen before, my own Coolangatta Thin.



And looking at some of the values it outputs gives me hope that the network really has understood how certain shapes fit together. For instance, it has realised that similar shapes should have similar kern values:

T d -51 - -45 p=23 %
T e -40 - -35 p=21 %
T g -45 - -40 p=20 %
T o -40 - -35 p=21 %

(That's the kern bucket range followed by the probability, how sure the network is that the pair belongs to that category.) Rounds kern against a T, but it is pretty clear that straights don't:

T B 0 p=99 %
T D 0 p=96 %
T E 0 p=94 %
T F 0 p=94 %

Similarly it knows that there is a lot of counterspace between the right of /T and the left of /A, and that these shapes fit together, but that even though there is a lot of counterspace between /A and /A, these shapes cannot fit together. To me, I think this proves that it "gets" the idea of kerning. And that makes me happy.

Now what?

I think I have proved the concept that this is viable. To improve, I would need to come back to this and train the network again on a larger variety of well-kerned fonts (currently I'm using a directory of 110 fonts. I am sure there are more) and for a longer time, on a more powerful computer, to squeeze out even more accuracy. If anyone wants to poke at the code, it's available on github: https://github.com/simoncozens/atokern and I would be very happy to talk you through it.

I'm open to other suggestions of how to develop this. But not yet. I have been obsessing over this for the past few weeks and my family are sick of it, so I am needing to take a break. Will come back to it later.
«1

Comments

  • Nice work!

    I think the concept is undoubtedly viable – but the question is whether it can be simultaneously useful and legal/ethical.

    Since this ultimately works by processing the data of other designers, surely it constitutes data/IP theft? I think unless you're training exclusively on libre fonts, and generating new kern data exclusively for libre fonts, it probably isn't legal – and if you can't use it for commercial projects it isn't exceptionally useful.

    Side note: I played around a while ago, trying to train a neural network to generate glyph paths — just for my own amusement — but the initial results were so awful I lost enthusiasm. Here's probably the best /A that my approach produced :neutral:


  • Ben Blom
    Ben Blom Posts: 250

    Interesting approach.

    But if they didn’t put a zero in the kern table, there’s no kern - which is effectively a kern value of zero - except that this time, we don’t know whether the designer looked at the pair with no kern and decided no action was needed, or whether the designer didn’t look at it at all and it might actually be ugly and we don’t want to use it for learning.

    In a well-spaced and well-kerned font, you can expect that the designer is happy with the sum of the spacing and kerning between pairs. For this reason, you may believe that the spacing between common unkerned pairs in such a font, represents spacing the designer is happy with (just like you may believe that the designer is happy with the non-zero kerning in such a font).

    The weakness of your approach, is that it seems to ignore the spacing information which is in a font. Kerning information is irrelevant without the accompanying spacing information. Spacing and kerning are “communicating vessels”: the spacing between a pair without a kern, is equal to a little more spacing between that pair with a small negative kern, or to a little less spacing between that pair with a small positive kern.

  • If the spacing is done right, most kerning will come from the punctuation. How do you handle punctuation ? 

  • Lewis: I’m not a lawyer but I am not worried about the ethical dimensions of this. How does a human learn to kern a font? Initially and unconsciously, by looking at lots of well-kerned fonts and learning from them the relationships between shapes. Every time a kern a sans /A/V, I am “processing” my memories of Helvetica or Univers or whatever. That’s just how the discipline works.

    Ben: I’m not sure why you say that this ignores the spacing. The spacing data is there in the sidebearing contours - the contours represent how far the shape is away from the edge. A contour of [50,75,100,100,...] and one of [25,50,75,75,..] both represent curves into the center but one has tighter spacing than the other. So I think the network is getting this information about the glyph shapes and their spacing together.
  • Adam Jagosz
    Adam Jagosz Posts: 689
    edited November 2017
    Simon, I don't think computers and humans have the same rights. I'm not a lawyer either, but that's what my gut says. I mean, humans have way more rights. As a human you can look at people legally, but as a computer, you'd look way more suspicious (e.g. as a smartphone taking people's photos).
  • Memo to self: I was thinking about how accurate this needs to be to be trustworthy. It’s never going to be 100% accurate but where do the failures fall? If it’s 99% accurate over 160,000 kern pairs that’s still 1,600 mistakes. And the point is to stop a human having to evaluate each pair by hand. If there’s a chance that the network gets, say, “To” badly wrong then nobody’s going to use it.

    When I come back to this, I should weight the samples by the frequency of each letter pair in a text corpus, so that getting “To” wrong is penalised much more strongly than “w7”. That should enable us to have confidence in the pairs that really matter.
  • John Hudson
    John Hudson Posts: 3,186
    Unless a license explicitly prohibits analysing the kerning data, where's the infringement? Collecting and analysing kerning data doesn't even necessarily involve decompiling the font, since it can be collected from text output. Kern values, as implemented in the kern table and typical GPOS pair adjustment lookups, are data, which typically has minimal copyright protection. In the US, for instance, I believe it is still the case that data itself is not protected by copyright, only particular selections or arrangements of data. So a kern table might be protected as part of the font software, but the kern values in that table would not be. [Usual caveat: I am not a lawyer.]
  • This really neat, Simon, and I hope you succeed. Keep at it and keep us informed!
  • Nick Shinn
    Nick Shinn Posts: 2,207
    A better method would be to identify the most “similar to” font (by algorithm)—with the proviso of your subjective filter “well-designed and well kerned”—and just copy its spacing and kerning.

    Alternatively, you might identify a suite of fonts in a specific genre, and apply your AI to that.

    This will make the results slightly less generic.

    The way it is now will produce a very bland average, that will not be specifically tailored to the font in hand.

    **

    The problem with neural net AI is that of all algorithmic “creation’—it is mired in the past and generics. However, as a new technology, it can create newness, but not in imitating the effects of existing technologies.
  • Nick, aren't you mixing up creativity and drudgework? No, the algorithm will not be creative – hopefully! But if a designer wants "creative" kerning (if such a thing is possible; surprise us!), s/he won't be using this tool to begin with.

    I imagine robots in car factories aren't meant to be creative either.
  • Ben Blom
    Ben Blom Posts: 250
    edited November 2017
    Ben: I’m not sure why you say that this ignores the spacing. The spacing data is there in the sidebearing contours

    Simon, I’m sorry. I didn’t read careful enough.

    If it’s 99% accurate over 160,000 kern pairs that’s still 1,600 mistakes.

    The question is: How bad are those mistakes? If those mistakes are nothing more than a slight deviation from what the kerning should be, then those mistakes are not a real problem. (Perhaps many of those mistakes only concern uncommon pairs.) 

  • Nick Shinn
    Nick Shinn Posts: 2,207
    edited November 2017
    Theunis, I wasn’t addressing drudgery, but the quality of a type design. 

    By artificially removing the type designer’s exercise of taste in determining spacing, design veers towards impoverished me-too sequel, proscribing emergent qualities. That’s how algorithms create bubbles to trap people in cultural stasis.

    Artificial means fake, dontcha know. What’s intelligent about that?

    On the subject of drudgery, I suspect many type designers might actually enjoy putting on some favorite tunes and playing with the lovely glyphs they’ve drawn, getting to know the relationships between them all a little more profoundly, exercising one’s judgement, thereby keeping one’s good taste in shape (fitness!) and up to date.

    As well as drudgery, kerning might also be considered meaningful human activity, after Morris, or Gandhi’s spinning. Type drawing is craft.

    It’s important to develop new techology in a humanistic manner. For algorithms, that means treating them as tools to enhance the user’s abilities, not bypass and atrophy them. Therefore, the interface is critical. The worst situation is offering no control of the settings (preferences). Even when those are editable, so often the user just sticks with the default. Worse, many agencies and design firms change the default kerning settings in InDesign etc. to “Optical”, a dodgy proposition which can go spectacularly wrong.

    But all this is theory; let’s see some typography.



  • Nick, 

    I consider automation of the type design process an assistive tool for the designer. In old Fontographer, one had to position each diacritical mark over each base letter separately, and to kern each letter against each. But then we got anchors and class kerning.

    Which doesn't mean that the designer shouldn't be free to adjust the exceptions in class-derived kerning or anchor-derived mark positioning. Basically, set rules, have software do the repetitive task of applying those rules and then fix exceptions. 

    If I draw a large set of basic glyphs and then create bold or condensed detivatives of a few dozen glyphs, I think that software should predict how to create bold or condensed derivations for the other hundreds. Or when I space and kern a smaller set of glyphs, software should predict the spacing and kerning for the rest. 

    These predictions don't need to replace the designer. The designer would always need to be able to intervene and fix. But today, digital type design already works so that the designer makes some key decisions at the beginning, and then often implements these decisions to a multitude of other symbols — sometimes tweaking on the way. 

    Of course with machine learning, you wouldn't need to train the system just on fonts from other foundries. The designer would be able to train such a system only on her/his own data (previous fonts  or the current project). 
  • This tool sounds extremely useful and I am excited to watch its continued development.
  • I came back to this yesterday and threw another couple of hundred fonts to learn from and gave it lots of time on some very expensive hardware. It claims to be up to 92% validation accuracy, which certainly sounds good. I've uploaded the latest kerning model and added some instructions: https://github.com/simoncozens/atokern/blob/master/README.md

    I would be interested to hear feedback on whether it's unusably off the mark in its predictions; off-the-mark but promising; reasonably close to accurate; or whatever.

  • T B 0 p=99 %
    T D 0 p=96 %
    T E 0 p=94 %
    T F 0 p=94 %
    Pardon my ignorance, but those pairs the script outputs, am I understanding correctly that the first is a kern value for the pair that the script thinks appropriate based on its training material (so 0 in the above), and the p value indicates the confidence in grouping this kern pair in the correct bracket?

    So... presumably you are curious about three things, the kern value accuracy, the correct bracket identification for glyph groups, and the confidence of that identification, yes?
  • Johannes Neumeier said:
    Pardon my ignorance, but those pairs the script outputs, am I understanding correctly that the first is a kern value for the pair that the script thinks appropriate based on its training material (so 0 in the above), and the p value indicates the confidence in grouping this kern pair in the correct bracket?
    Right, yes.
    So... presumably you are curious about three things, the kern value accuracy, the correct bracket identification for glyph groups, and the confidence of that identification, yes?
    In a way. We can do mathematical tests and find that it's 92% accurate, but that's not very meaningful. The only test that matters with a font is whether or not it looks right to a well-trained eye. I want to see what happens when you run this script on your fonts, and apply the kerning settings it suggests - is it helpful, is it credible, is it consistent, or does it produce rubbish?
  • Ben Blom
    Ben Blom Posts: 250
    Simon Cozens: I want to see what happens when you run this script on your fonts, and apply the kerning settings it suggests

    So the tool produces only kerning data. I am not sure if I understand the logic of this. The tool is trained by feeding it the spacing and kerning data from a corpus of well-spaced and well-kerned fonts. However, in the output the tool produces, only kerning data is provided. Does that make sense? Spacing and kerning are interrelated. Why would the tool refrain from producing spacing data, while it contains knowledge of both spacing and kerning?

    If only the kerning data is provided by the tool, to please those who still live in the paradigm of spacing and kerning as separate things that belong to a different phase of making a font (or to please those who consider spacing to be an intrinsic part of the design of glyphs, while happily using a tool to assist with the kerning)—I would suggest to add an option to switch between (1) providing only spacing data, (2) providing only kerning data, (3) providing both spacing and kerning data.

    Perhaps the tool does produce spacing data. If so, I’m sorry for not reading careful enough.

  • No, you’re right, it is based on the “space first and then kern” model. And it’s a legitimate question. I guess the answer is: because that’s how I trained the model.

    I imagine that in theory you could send in glyph pairs with zero sidebearings and it would output the correct spacing between the glyphs. I just haven’t done that.

    I agree that it is the space between a glyph pair that matters. I suppose there were two reasons why I didn’t go that way. One is that the way we store this information in a font file is via separate spacing and kerning tables. So you need those data separately. And since we have autospacers already, the other part of the puzzle is filling the kern tables, so that’s what I focused on.

    Feel free to train the model differently! :smile:
  • Without retraining the network some quick tests seem to mostly point you towards "potential" kerning pairs. Aside from the p% value I'd much more like to know how confident the script is about the amount it suggest a pair should be kerned - is it, based on the training data, a heavily scattered or homogenous pair. I see the end goal of the p% for automating kerning classes, though. Based on how often a character shares kerns with high confidence with other characters those could be suggested as groups.

    I did only run the script on two fonts and manually went in to compare only a few of the kerning finds. Some observations:
    • Characters that I'd often myself consider to have mirrored kerning, so like A V T O o, seem not to be recognised as such (mirrored kern values for also the same symmetrical glyph on either side). I can't surmise how the learning material would not reflect that also, but maybe other factors have more weight in the output, e.g. kerns that are mirrored or near equal should probably emphasise that in evaluation to skew the result to symmetry, where such is identifiable from the training material
    • Some kerns that I would think are common and are also non-negligent in value from my own manual kerning seem to have been completely disregarded, e.g. F-o A-T (first image)
    • Identifying classes seems to work reasonably well, meaning it finds similar shapes to receive the same kerning
    Examples:

    A-T kern missing completely (the serif?), while T-A does have an okayish value, A-V-A a tad more lose than my manual:

    A-V-A-T-A all with kerns, but A-V-A and A-T-A disturbingly unsymmetrical:

    Example of visually quite obvious and also not uncommon missing F-o kern, that wasn't in the table at all:

    Good example of appropriate identification of kern classes, even though the actual kern value is small:

    <div>A c -10 p= 83 %</div><div>A d -10 p= 95 %</div><div>A e -10 p= 87 %</div><div>...</div><div>A g -10 p= 92 %</div><div>A o -10 p= 95 %</div><div>A q -10 p= 95 %<br></div>

    Mixed results of kern class identification:

    <div>F n -25 p= 99 %</div><div>F p -25 p= 99 %</div><div>...</div><div>F r -25 p= 81 % < Good: Less certain, but left stem identified correctly as same group with n and p</div><div>...</div><div>F v -25 p= 55 %</div><div>F w -25 p= 69 %</div><div>F x -65 p= 61 %&nbsp;< How come different class?</div><div>F y -25 p= 83 % < ~25% more certainty than for F-v and F-w?</div><div>F z -20 p= 40 %</div>

    Very interesting work! I'm curious how this develops.
  • Thanks for that - it was useful feedback. I've changed the training process so that it also trains against a mirrored dataset. i.e. each time it learns about a set (right-side-of-L, left-side-of-R, kernvalue) it also learns (mirrored-left-side-of-R, mirrored-right-side-of-L, kernvalue). This seems to improve accuracy by a few percent as well as making the symmetrical kerns more regular.

    The new model is too big for github so I will have to work out somewhere else to put it.

    Here's another fun experiment: An advantage (?) of automated kerning is that can be moved into the layout layer, meaning that you can kern pairs between different fonts.


  • Thomas Phinney
    Thomas Phinney Posts: 2,883
    edited November 2017
    Well, that's an advantage of automated kerning *in the layout app*. It doesn't help if that automated kerning is being done in a font editor, as the fonts can't store cross-font kerning pairs. (And due to space concerns, one wouldn't want them to, either!)

    And for one-off cases, which font changes almost always are, the auto-kerning already in the layout apps is generally just fine.

    Not that I don't think this is cool. It definitely is, even if it isn’t “there” yet. The potential for such things is clear already.
  • Presumably a layout system would only apply these autokerns at the interfont moments, and continue to use the fonts own kerning for same-font strings
  • Presumably a layout system would only apply these autokerns at the interfont moments, and continue to use the fonts own kerning for same-font strings
    We should be so lucky ;)  
  • John Hudson
    John Hudson Posts: 3,186
    Presumably a layout system would only apply these autokerns at the interfont moments...


    In a directionally agnostic manner, across run boundaries.  :#

  • Adam Twardoch
    Adam Twardoch Posts: 515
    edited December 2017
    Presumably a layout system would only apply these autokerns at the interfont moments...


    In a directionally agnostic manner, across run boundaries.  :#

    Yes! For example between Latin and Greek letters, where kerning is possible using Type 1, TrueType or AAT but isn't possible using OpenType. I.e. generally to fix 20-years-old misguidances and design flaws in next-generation layout systems. 
  • jeremy tribby
    jeremy tribby Posts: 246
    edited December 2017
    Very cool project! IANAL, but I think kerning metadata is (copyrightable) software, which might mean you could be violating the EULA of the fonts by running them through this? I dunno. Personally I think it's awesome, looking forward to digging through the code
  • Jeremy, if that's the case, if this is trained on ofl font data then any font spaced and kerned with that training set would have to be ofl too?
  • jeremy tribby
    jeremy tribby Posts: 246
    edited December 2017
    That sounds correct -- other than reserved name stuff, OFL's only real requirement is that derivative work be OFL as well. If the tool were run on well-kerned OFL fonts and released under the OFL too I can't imagine anyone taking exception to that. Unless people using the tool would then have to then release their own kerning data under OFL as well? Seems unlikely, but IDK.

    A non-trivial way to get around all legal stuff and include copyrighted fonts is to do this to printed matter using vision learning, because then I think Simon's "this is the same as a human studying a font" comparison is apt. Right now it's really just making averages after being fed kerning metadata (or as a lawyer might call it, reverse engineering font software). Humans don't look at the font software, they see a representation of it, so maybe you could make some sort of legal ceci n'est pas une font argument for vision learning.

    But I think that would also be overkill given your OFL suggestion ;)
  • I really hope this is not true, because most of what I learnt about OpenType layout (especially in my early years) was by looking into OpenType layout table in existing fonts, libre and non-libre.