I trained a neural network to kern a font (mostly)

TLDR: I basically have autokerning working. I consider the concept proven. It needs a little bit more polishing, but I don't have time right now. Will come back to it later.

Ever since watching the video of the panel discussion at Typolabs, I have been intrigued with the idea of using a neural network to kern fonts. As Just made it clear, as the number of expected glyphs in a font grows, the number of kerning pairs grows exponentially and, even with kern classes, this isn't a job that human beings should still be doing. At the very least it something we should be trying to get automated.

I've had various philosophical discussions about whether we should be solving spacing or kerning or both, but the way I approach this kind of task is to automate the jobs we're already doing. While of course we integrate spacing and kerning into the design, the usual rule is to space first and kern later, so I wanted to attack this part of the job. We have autospacers; can we next have an autokerner? (Of course we can - Igino Marini has one. But I think we should all get to have one.)

So for the past few months I have been reading all the books about machine learning and deep neural networks, and fiddling around with various ideas to make autokerning happen.

Approach

A neural network solution has two components: a trainer, which learns from available data, and a predictor, which gives you output from data it hasn't seen before. My original plan would be to semi-automate the process: that is, you kern half the font the way you want it kerned, and it learns from that and kerns the other half for you. But neural networks learn best on lots and lots of data, so in order to improve my results I ended up throwing all the fonts I could find (that I considered well-designed and well kerned) at it. Now I have something which does a reasonable job at kerning a font it has not seen before.

So in this case, the training component takes a directory full of fonts together with, for each font, a list of kern pairs generated by kerndump. It then extracts, for each glyph that it considers "safe", a set of 100 samples of the left and right sidebearings, from the origin to the ascender. By samples of sidebearings, this is what I mean: (If you've studied how the HT Autospacer works, you'll be familiar with the concept.) 

Let's call them the "right contour" and the "left contour" because that's what they represent. The right contour of this /a is an array of numbers [576,576,576,250,142,81,81,81,81,...].

The contours are then divided by the width of the font's /m in units, so that different fonts can be compared on the same basis. These contours are extracted for all the safe glyphs; a "safe" glyph is one that I can be reasonably sure that the designer will have checked the kerning against other safe glyphs: upper and lower case basic latin letters, numbers, comma, colon, period.

Now we feed the neural network a bunch of sets of contours. For each kern pair XY, we feed the right contour from X and the left contour from Y. We also give as input to the network the right contour of /n, the left contour of /o, the right contour of /H and the left contour of /O - this is for two reasons; first, because you can't "correctly kern" a pair in isolation without looking at the surround context of a word, and second, so that the network learns how the different contours found in different fonts and different styles affects the kerning process.

Each of these four contours are passed to a 1 dimensional convolutional layer, which essentially extracts the "shapeness" of it. 2-dimensional convolutional layers are used in image recognition for extracting edges and other salient features from a noisy photograph; a 1d layer does the same sort of thing with a one-dimensional object, and so I'm using it to work out what the contour "means" as a shape.

These six convolutional layers are then thrown into one big pot and boiled down through various-sized layers to generate an output. (It's not very easy to describe how neural networks actually work. They're basically a lot of numbers that get multiplied and added together, but it's pretty much impossible to say what each set of numbers contributes to the process.)

I initially planned to have the system output a kern value, making this a regression problem. However, regression problems are really hard for neural networks to do. The kern value under regression could be any float value from minus infinity to plus infinity, when real kern values don't work like that. NNs are much better at classification problems, so next I looked at a three-way classification: given this pair (and the background context /n/o/H/O) does it need to be kerned tighter, looser, or left alone?

Of course it's quite useful to know that a pair needs kerning, but what we really want to know is by how much. So once I had got the three-way classification working to a good degree of accuracy (about 99%), I then got the network to classify a pair into one of 26 "kern buckets". As before, everything is scaled to the /m width to allow comparison between different fonts, so one bucket might be, for example, "negative kern somewhere between 0.1875 and of 0.125 /m width" (giving you a range of, say, -150 to -100 units with a fairly condensed font). Other buckets, particularly around the zero mark, are narrower: "between -5 and 0 units", "between 5 and 10 units" and so on.

Results

I said I had got to 99% accuracy in a three-way classification. To be honest I am no longer sure what degree of accuracy I am getting. It looks good, but it needs more work.

The reason I'm not sure is this: most pairs in most fonts aren't kerned, or have zero kern. There's a subtle but important difference between "aren't kerned" and "have zero kern". If the designer explicitly put a zero in the kern table for that pair, then great, they're happy with how it looks. But if they didn't put a zero in the kern table, there's no kern - which is effectively a kern value of zero - except that this time, we don't know whether the designer looked at the pair with no kern and decided no action was needed, or whether the designer didn't look at it at all and it might actually be ugly and we don't want to use it for learning. (Put your hands up if you kerned /t/Q recently.)

So for a while I didn't want to treat an empty kern entry as useful data. I only fed it explicit entries from the kerning tables. There aren't many explicit zero-valued kerns, so of course the network learned to apply kerning to nearly everything, because that is what it saw. But a font should have lots of zero-valued kern pairs, so I had to put the "silent" zeros back in. And a font should have a lot of them, so I couldn't artificially balance the size of each bucket. The network should learn that most kern pairs are zero.

And of course this makes it pretty had to assess accuracy, because the network learns that if it guesses zero for everything then it gets the right answer 90% of the time, so doesn't bother learning anything else. So I had to write my own loss function which penalized the network very heavily every time it guessed a zero when there should have been a kern. At this point accuracy started going down, because I really wanted the network to be more interested in getting things wrong in interesting ways than getting them right in stupid ways. (This is also a good principle for life in general.)

Here are some of the results of running the predictor on a font that the network had not seen before, my own Coolangatta Thin.



And looking at some of the values it outputs gives me hope that the network really has understood how certain shapes fit together. For instance, it has realised that similar shapes should have similar kern values:

T d -51 - -45 p=23 %
T e -40 - -35 p=21 %
T g -45 - -40 p=20 %
T o -40 - -35 p=21 %

(That's the kern bucket range followed by the probability, how sure the network is that the pair belongs to that category.) Rounds kern against a T, but it is pretty clear that straights don't:

T B 0 p=99 %
T D 0 p=96 %
T E 0 p=94 %
T F 0 p=94 %

Similarly it knows that there is a lot of counterspace between the right of /T and the left of /A, and that these shapes fit together, but that even though there is a lot of counterspace between /A and /A, these shapes cannot fit together. To me, I think this proves that it "gets" the idea of kerning. And that makes me happy.

Now what?

I think I have proved the concept that this is viable. To improve, I would need to come back to this and train the network again on a larger variety of well-kerned fonts (currently I'm using a directory of 110 fonts. I am sure there are more) and for a longer time, on a more powerful computer, to squeeze out even more accuracy. If anyone wants to poke at the code, it's available on github: https://github.com/simoncozens/atokern and I would be very happy to talk you through it.

I'm open to other suggestions of how to develop this. But not yet. I have been obsessing over this for the past few weeks and my family are sick of it, so I am needing to take a break. Will come back to it later.

Comments

  • Nice work!

    I think the concept is undoubtedly viable – but the question is whether it can be simultaneously useful and legal/ethical.

    Since this ultimately works by processing the data of other designers, surely it constitutes data/IP theft? I think unless you're training exclusively on libre fonts, and generating new kern data exclusively for libre fonts, it probably isn't legal – and if you can't use it for commercial projects it isn't exceptionally useful.

    Side note: I played around a while ago, trying to train a neural network to generate glyph paths — just for my own amusement — but the initial results were so awful I lost enthusiasm. Here's probably the best /A that my approach produced :neutral:


  • Ben BlomBen Blom Posts: 193

    Interesting approach.

    But if they didn’t put a zero in the kern table, there’s no kern - which is effectively a kern value of zero - except that this time, we don’t know whether the designer looked at the pair with no kern and decided no action was needed, or whether the designer didn’t look at it at all and it might actually be ugly and we don’t want to use it for learning.

    In a well-spaced and well-kerned font, you can expect that the designer is happy with the sum of the spacing and kerning between pairs. For this reason, you may believe that the spacing between common unkerned pairs in such a font, represents spacing the designer is happy with (just like you may believe that the designer is happy with the non-zero kerning in such a font).

    The weakness of your approach, is that it seems to ignore the spacing information which is in a font. Kerning information is irrelevant without the accompanying spacing information. Spacing and kerning are “communicating vessels”: the spacing between a pair without a kern, is equal to a little more spacing between that pair with a small negative kern, or to a little less spacing between that pair with a small positive kern.

  • If the spacing is done right, most kerning will come from the punctuation. How do you handle punctuation ? 

  • Lewis: I’m not a lawyer but I am not worried about the ethical dimensions of this. How does a human learn to kern a font? Initially and unconsciously, by looking at lots of well-kerned fonts and learning from them the relationships between shapes. Every time a kern a sans /A/V, I am “processing” my memories of Helvetica or Univers or whatever. That’s just how the discipline works.

    Ben: I’m not sure why you say that this ignores the spacing. The spacing data is there in the sidebearing contours - the contours represent how far the shape is away from the edge. A contour of [50,75,100,100,...] and one of [25,50,75,75,..] both represent curves into the center but one has tighter spacing than the other. So I think the network is getting this information about the glyph shapes and their spacing together.
  • Adam JagoszAdam Jagosz Posts: 82
    edited November 14
    Simon, I don't think computers and humans have the same rights. I'm not a lawyer either, but that's what my gut says. I mean, humans have way more rights. As a human you can look at people legally, but as a computer, you'd look way more suspicious (e.g. as a smartphone taking people's photos).
  • Memo to self: I was thinking about how accurate this needs to be to be trustworthy. It’s never going to be 100% accurate but where do the failures fall? If it’s 99% accurate over 160,000 kern pairs that’s still 1,600 mistakes. And the point is to stop a human having to evaluate each pair by hand. If there’s a chance that the network gets, say, “To” badly wrong then nobody’s going to use it.

    When I come back to this, I should weight the samples by the frequency of each letter pair in a text corpus, so that getting “To” wrong is penalised much more strongly than “w7”. That should enable us to have confidence in the pairs that really matter.
  • John HudsonJohn Hudson Posts: 1,228
    Unless a license explicitly prohibits analysing the kerning data, where's the infringement? Collecting and analysing kerning data doesn't even necessarily involve decompiling the font, since it can be collected from text output. Kern values, as implemented in the kern table and typical GPOS pair adjustment lookups, are data, which typically has minimal copyright protection. In the US, for instance, I believe it is still the case that data itself is not protected by copyright, only particular selections or arrangements of data. So a kern table might be protected as part of the font software, but the kern values in that table would not be. [Usual caveat: I am not a lawyer.]
  • AbrahamLeeAbrahamLee Posts: 111
    This really neat, Simon, and I hope you succeed. Keep at it and keep us informed!
  • Nick ShinnNick Shinn Posts: 1,179
    A better method would be to identify the most “similar to” font (by algorithm)—with the proviso of your subjective filter “well-designed and well kerned”—and just copy its spacing and kerning.

    Alternatively, you might identify a suite of fonts in a specific genre, and apply your AI to that.

    This will make the results slightly less generic.

    The way it is now will produce a very bland average, that will not be specifically tailored to the font in hand.

    **

    The problem with neural net AI is that of all algorithmic “creation’—it is mired in the past and generics. However, as a new technology, it can create newness, but not in imitating the effects of existing technologies.
  • Nick, aren't you mixing up creativity and drudgework? No, the algorithm will not be creative – hopefully! But if a designer wants "creative" kerning (if such a thing is possible; surprise us!), s/he won't be using this tool to begin with.

    I imagine robots in car factories aren't meant to be creative either.
  • Ben BlomBen Blom Posts: 193
    edited November 15
    Ben: I’m not sure why you say that this ignores the spacing. The spacing data is there in the sidebearing contours

    Simon, I’m sorry. I didn’t read careful enough.

    If it’s 99% accurate over 160,000 kern pairs that’s still 1,600 mistakes.

    The question is: How bad are those mistakes? If those mistakes are nothing more than a slight deviation from what the kerning should be, then those mistakes are not a real problem. (Perhaps many of those mistakes only concern uncommon pairs.) 

  • Nick ShinnNick Shinn Posts: 1,179
    edited November 15
    Theunis, I wasn’t addressing drudgery, but the quality of a type design. 

    By artificially removing the type designer’s exercise of taste in determining spacing, design veers towards impoverished me-too sequel, proscribing emergent qualities. That’s how algorithms create bubbles to trap people in cultural stasis.

    Artificial means fake, dontcha know. What’s intelligent about that?

    On the subject of drudgery, I suspect many type designers might actually enjoy putting on some favorite tunes and playing with the lovely glyphs they’ve drawn, getting to know the relationships between them all a little more profoundly, exercising one’s judgement, thereby keeping one’s good taste in shape (fitness!) and up to date.

    As well as drudgery, kerning might also be considered meaningful human activity, after Morris, or Gandhi’s spinning. Type drawing is craft.

    It’s important to develop new techology in a humanistic manner. For algorithms, that means treating them as tools to enhance the user’s abilities, not bypass and atrophy them. Therefore, the interface is critical. The worst situation is offering no control of the settings (preferences). Even when those are editable, so often the user just sticks with the default. Worse, many agencies and design firms change the default kerning settings in InDesign etc. to “Optical”, a dodgy proposition which can go spectacularly wrong.

    But all this is theory; let’s see some typography.



  • Nick, 

    I consider automation of the type design process an assistive tool for the designer. In old Fontographer, one had to position each diacritical mark over each base letter separately, and to kern each letter against each. But then we got anchors and class kerning.

    Which doesn't mean that the designer shouldn't be free to adjust the exceptions in class-derived kerning or anchor-derived mark positioning. Basically, set rules, have software do the repetitive task of applying those rules and then fix exceptions. 

    If I draw a large set of basic glyphs and then create bold or condensed detivatives of a few dozen glyphs, I think that software should predict how to create bold or condensed derivations for the other hundreds. Or when I space and kern a smaller set of glyphs, software should predict the spacing and kerning for the rest. 

    These predictions don't need to replace the designer. The designer would always need to be able to intervene and fix. But today, digital type design already works so that the designer makes some key decisions at the beginning, and then often implements these decisions to a multitude of other symbols — sometimes tweaking on the way. 

    Of course with machine learning, you wouldn't need to train the system just on fonts from other foundries. The designer would be able to train such a system only on her/his own data (previous fonts  or the current project). 
  • This tool sounds extremely useful and I am excited to watch its continued development.
Sign In or Register to comment.