Most common kern pairs

Simon Cozens · May 2018

@Wei Huang asked me on Twitter whether, in the course of my messing about with automated kerning, I had compiled a table of the most common kern pairs. I replied that I hadn't but that it wouldn't be hard to do. So I did it. Based on a set of 514 fonts, here is a file with the top ten-thousand most common kern pairs and how many of those 514 fonts implement them.

The top fifty are:

504	L	T
499	L	V
495	L	Y
493	T	A
491	P	comma
490	T	a
489	V	A
489	T	period
489	A	V
488	Y	o
488	Y	a
488	P	A
486	Y	A
486	T	o
486	T	comma
486	P	period
484	F	comma
484	A	Y
481	V	period
481	V	a
480	F	period
480	A	T
477	T	e
475	V	o
474	L	W
473	Y	period
472	Y	e
470	Y	comma
469	F	A
467	V	comma
462	V	e
461	T	AE
460	Y	u
459	W	A
459	L	quoteright
457	T	hyphen
457	A	W
452	W	a
452	T	oe
449	v	period
446	y	period
445	w	period
443	W	period
442	T	w
442	T	ae
441	V	AE
439	W	o
438	P	AE
437	r	period
436	v	comma

Benedikt Bramböck · May 2018

Interesting stuff, thanks for sharing.

Could you also give some background information on what kind of fonts you were examining? Are these fonts from 514 different families?
What stylistic characteristics do they have – sans, serif, script, blackletter, … ?
Did you also include italics?
How is your analysis dealing with class kerning?
Does the size of the kerning value play a role too?

Ramiro Espinoza · May 2018

One list I am really interested on is one of kerning pairs that never occur in the Latin alphabet, organised by language.

Theunis de Jong · May 2018

Ramiro, you mean every possible combination that is not in use anywhere? Kern pairs such as "rX" and "pY"? There must be 100!s of these (using ! in its mathematical meaning as 'factorial').

Craig Eliason · May 2018

Though count on things like RacerX brand and LeapYear Inc. to arise and make even those useful.

Ramiro Espinoza · May 2018

@Theunis de Jong Nope, I mean kerning pairs combinations that makes sense. I don't kern lc/uc. I kern every possible combination of uc/uc, uc/lc, lc/lc, uc/sc, sc/sc. Of course some of these will never occur in any language. Therefore, I would like to know which these pairs are to avoid spending time on them and also to crop existing kerning tables.

Kent Lew · May 2018

Ramiro — A couple years ago, I spent a few days compiling a massive word list from several Latin-using languages, using the corpora at the An Crúbadán project, and then writing a script to extract the most common words for every possible lc-lc (including a subset of Uc-lc where the pair is at the beginning of the word).

I could probably run an analysis for which pairs are absent. It would be limited by my initial choice of languages to combine and by the limitations of the source corpora. If I find myself in the next few days looking for a distraction from work for a while, I'll see what I can come up with.

Unless someone beats me to it. ;-)

Hrant Հրանդ Փափազեան Papazian · May 2018

Somewhat related:
http://www.typophile.com/node/5106

@Ramiro Espinoza https://en.wikipedia.org/wiki/Electronvolt

Andreas Stötzner · May 2018

How useful is that sort of statistics in practice?

It gives a hint on the most common instances regarding ‘normal’ typefaces, sure. Hence it says very little about equally neccessary cases like f” f] (j – and so on. But the choice of pairings is also dependent of the specific design of a typeface, e.g. if it is a blackletter or scriptish design or otherwise different from mainstream models.
I was always more interested in a systematic approach towards the generally most important pairings, as well as for a practical overview on language-specific pairings which are not that much obvious at first-hand for most of us (e.g. f_ð).

Mark Simonson · May 2018

I realize that this is somewhat off-topic, but discussions like this always remind me how limiting the kerning-pair system is, especially as character sets have grown so large. Class kerning helps, but...

I've long thought that a more flexible approach to sidebearings--something beyond simple advance widths that takes into account glyph shape--would eliminate most of the need for kerning pairs, and even allow automatic spacing between different styles and fonts, or even different sizes of the same font/style.

There have been some tools on the developer side that let the type designer work as if this were the case, but in the end you still need to generate kerning pairs in the finished font since that's all the font formats support.

Ideally, something like this could be incorporated into the font spec, eliminating the need to worry about wasting time on kerning combinations that will never arise in actual use. Fonts would, in effect, be self-kerning.

Anyway, sorry to go off-topic.

Hrant Հրանդ Փափազեան Papazian · May 2018

@Mark Simonson However (and this is veering even more off-topic...) automating/detaching spacing from the letterforms implies a favoring of the black over notan. We spend so much time tweaking the black to within 1/1000 or finer, but when it comes to the white (at least the whites between glyphs) we settle for things like ±5/1000 or even ignore may pairs. Sure it's an expediency, but at the very least we need to admit the flaw. And ideally, if we do automate the inter-glyph white, it needs to depend on more than the black's lateral profiles.

Simon Cozens · May 2018

Answers to Benedikt's questions:

Are these fonts from 514 different families? What stylistic characteristics do they have – sans, serif, script, blackletter, … ? Did you also include italics?

It's a really mixed bag; the idea was to produce a fairly representative training set of all the different things you might throw at an autokerner, so that it would cope well with whatever it saw. In practice this meant some families and some individual styles; no blackletters but lots of sans, a few serif, and one or two script. Some text, but generally display. (I wish there were a subcategory of "display" meaning "not exactly a text font, but at the same time something more like Optima rather than graffiti, handdrawn, roughened unicase alphabets, and pictures of cats.")

How is your analysis dealing with class kerning?

These are computed kern values between the pair - the value you would send to a layout engine. So after class kerns have been "decomposed".

Does the size of the kerning value play a role too?

Nope, anything non-zero is in there.

Kent Lew · May 2018

Ramiro — I don’t know what value these lists will actually have, from a practical standpoint; but I went ahead and ran an analysis last night.

As mentioned, my master word list was compiled from An Crúbadán corpora. Since those corpora were basically scraped from the internet, there’s a lot of flotsam in there. When originally compiling my list, I did some broad work to remove acronyms and such. But there’s no accounting for misspellings, etc. It’s not a carefully curated set.

The languages that I combined were English, German, French, Italian, Spanish, Portuguese, Dutch, Danish, Swedish, Finnish, Norwegian, Estonian, Polish, Czech, Slovak, Albanian, Hungarian, Turkish, Lithuanian, Romanian, Latvian, Maltese, Northern Sami, Welsh. The goal was to have a representative for all the extended latin characters in my typical char set.

My combined list consists of 951,208 words.

Using a character set gleaned from the word list itself — abcdefghijklmnopqrstuvwxyzàáâãäāăąåæçćčċðďđèéêěëēėęğġģħìíîïīįıķĺļľłńňñņŋòóôõöőøŕřśšşșßțťŧþùúûüūůűųŵẁỳýŷÿźžż — yields 11,449 possible combinations.

Out of those combinations, 7917 pairs had no occurrence in the word list. Checking against initial position only, that list increases to 9200 non-occurring UC-lc pairs.

The vast majority of these are combinations of two accented letters that don’t occur in the same language (out of those surveyed).

There may be some false positives here. For instance, I saw some basic ß-combinations in the results for which folks came up with rare examples in another thread here, but which must not have occurred in the corpora I used.

So, like I said, I’m not sure what practical use this has. But here you go anyway.

Ramiro Espinoza · May 2018

Cool! Thanks!

notdef · May 2018

Would you not also want your kerning to work for acronyms, business names, product names etc? Some languages merge compounds and form letter pairs not found in any dictionary. Some languages stick lowercase letters to the left of capitals.

Ramiro Espinoza · May 2018

IMHO, you shouldn't kern everything (and I kern a lot). Some situations like brands, etc; are best left to the type setters or designers.

notdef · May 2018

I’ve recently tried to approach this slightly less specific/detailed, yet broader reaching, by making generalised groups of shapes. For example, in my next release the left-hand side of all accented lowercase a’s are grouped as “a_accent”. I omit the á and others that do not interfere too much with the whitespace top left. Tä Tà in “a_accented” — Ta Tá in “a”.

Thomas Phinney · May 2018

A few years back I started adopting the same approach as Frode as far as how I group kern shapes for kern classes.

Where we differ is that I kern lc-to-cap combinations, and nearly everything.

I have also come in recent years to feel that for oddball glyphs that “sort of almost fit” but not perfectly, in a class, but wouldn't be worth kerning on their own.... Well, as long as it does not cause over-tight kerning of those glyphs, it is better to lump them in a class and give them imperfect kerning, than to leave them unkerned.

notdef · May 2018

Oh, I kern lc-to-cap. And do my best to solve the problematic ones with drawing and spacing before kerning.

Kent Lew · May 2018

Would you not also want your kerning to work for acronyms, business names, product names etc? Some languages merge compounds and form letter pairs not found in any dictionary. Some languages stick lowercase letters to the left of capitals.

Indeed. Before the recent rise of the far-right in Germany, I would not have thought I needed to consider the “fD” combination.

I definitely took notice when the following showed up in my mailbox in a Sept 2016 issue of TIME magazine.

I was somewhat relieved to see that the combination did not crash horribly in the fonts that I designed for them. I’m not sure that a kern pair would have improved things much, anyway, but it certainly put this combination on my radar.

Kent Lew · May 2018

I once toyed with the idea of using different lookup types to create a “cascading” kern feature, where accents might be handled as additive kern adjustments instead of enumerated exceptions.

Something like this:

feature kern {<br>    @MMK_R_a = [a agrave aacute acircumflex atilde adieresis aring amacron abreve aogonek ae];<br>    @MMK_R_e = [e egrave eacute ecircumflex edieresis emacron ebreve edotaccent eogonek ecaron];<br>    pos T @MMK_R_a -90;<br>    pos V @MMK_R_a -100;<br>    pos Y @MMK_R_a -110;<br>    pos T @MMK_R_e -80;<br>    pos V @MMK_R_e -95;<br>    pos Y @MMK_R_e -110;<br>     <br>lookup accent_kern {<br>    pos [V Y]' atilde 60;<br>    pos [T V Y]' [atilde adieresis amacron edieresis emacron] 80;<br>    pos [T V Y]' agrave 40;<br>    pos [T V Y]' egrave 20;<br>    pos [T V Y]' aring 40;<br>    pos [T V Y]' [abreve ebreve] 60;<br>    pos T' [acircumflex ecircumflex] 50;<br>    pos [V Y]' [acircumflex ecircumflex] 30;<br>} accent_kern;<br><br>} kern;<br>

So, for example, the final rendered adjustment for Tà would be -90+40 = -50.

It always seemed like a promising idea, but I never took the time to try to develop a tool for managing, a format for storing, and a routine for generating or compiling the feature.

Hrant Հրանդ Փափազեան Papazian · May 2018

There's even a country name with an intercap now (although rarely requiring a kern).
http://www.bbc.com/news/world-africa-43821512

Wei Huang · May 2018

@Kent Lew
I like your idea a lot!

@Simon Cozens
Is there a way to make it find the group kerning, like one can assume all the following:

Aring V

Atilde V

Agrave V

Acircumflex V

Aacute V

Should be `A V`. Can you run it through the kerning table instead?

Hrant Հրանդ Փափազեան Papazian · September 2020

@Ramiro Espinoza https://typedrawers.com/discussion/comment/49078/#Comment_49078

John Hudson · September 2020

Of course some of these will never occur in any language.

Very few, in fact. Almost all letter pair combinations occur in some language or other. Even uppercase-to-lowercase, word-initial pairs are attested for almost all combinations:
https://docs.google.com/document/d/1fQOvESU6pVXkNAFWzc83zwpBirBbyCb-G3Ov1ebKQNQ

Albert_Jan_Pool · September 2020

Kent Lew said:

[…]
[…] Before the recent rise of the far-right in Germany, I would not have thought I needed to consider the “fD” combination.
I definitely took notice when the following showed up in my mailbox in a Sept 2016 issue of TIME magazine.

The AfD is not worth a single kerning pair!

Simon Cozens · September 2020

There's some merit to allowing that combination to look ugly.

Hrant Հրանդ Փափազեան Papazian · September 2020

@Albert_Jan_Pool @Simon Cozens No, because that only makes the person talking about it look bad... Design is not about your feelings, but serving the user. Otherwise for example all Aldine revivals would be unusable because Griffo was a murderer...

Piotr Grochowski · September 2020

How much kerning necessary depends on how the font has been designed. A font already designed to space in an optically correct way would not require any kerning, while a font not so optically spaced (for instance if the giant shapes of display fonts make it impossible) would have to compensate for this by kerning.

John Hudson · September 2020

A font already designed to space in an optically correct way would not require any kerning

? That assumes that all shapes in a writing system can be optically spaced using only a single method. But if e.g. the spacing of T is set to be optically correct in HTH then it won't be optically correct in ATA, and if you set A to be correct in TAT, then it won't be correct in HAH. So the need for a secondary spacing method (kerning) is determined by the shapes. How much kerning is needed will depend how well the initial spacing method is implemented, but it is unlikely to be zero in any non-monospaced font.

Piotr Grochowski · September 2020

John Hudson said:

A font already designed to space in an optically correct way would not require any kerning
? That assumes that all shapes in a writing system can be optically spaced using only a single method. But if e.g. the spacing of T is set to be optically correct in HTH then it won't be optically correct in ATA, and if you set A to be correct in TAT, then it won't be correct in HAH. So the need for a secondary spacing method (kerning) is determined by the shapes. How much kerning is needed will depend how well the initial spacing method is implemented, but it is unlikely to be zero in any non-monospaced font.

It highly depends on the design of the glyphs, in some designs it is absolutely necessary to kern HTH/ATA/TAT/HAH, while in others the spacing would be always optically correct and kerning unnecessary. Particularly designs with more width variety in capital letterforms tend to require more kerning to compensate. Design flaws are unavoidable, every typeface has design flaws no matter the design. Monospaced fonts, due to lack of kerning, force type designers to use more optical corrections to design letterforms, more properly, but still not. I personally do not like the way kerning looks like, as it looks like someone had to compensate for unfortunate design flaws. And design flaws are found in every single typeface. For type designing I would rather not add kerning but find other ways to fix the problem such as by reducing wide capitals width (which monospacing already does).

▒spacing test HTH ATA TAT HAH

All critiques reserved

Most common kern pairs

Comments

Categories