504 | L | T |
499 | L | V |
495 | L | Y |
493 | T | A |
491 | P | comma |
490 | T | a |
489 | V | A |
489 | T | period |
489 | A | V |
488 | Y | o |
488 | Y | a |
488 | P | A |
486 | Y | A |
486 | T | o |
486 | T | comma |
486 | P | period |
484 | F | comma |
484 | A | Y |
481 | V | period |
481 | V | a |
480 | F | period |
480 | A | T |
477 | T | e |
475 | V | o |
474 | L | W |
473 | Y | period |
472 | Y | e |
470 | Y | comma |
469 | F | A |
467 | V | comma |
462 | V | e |
461 | T | AE |
460 | Y | u |
459 | W | A |
459 | L | quoteright |
457 | T | hyphen |
457 | A | W |
452 | W | a |
452 | T | oe |
449 | v | period |
446 | y | period |
445 | w | period |
443 | W | period |
442 | T | w |
442 | T | ae |
441 | V | AE |
439 | W | o |
438 | P | AE |
437 | r | period |
436 | v | comma |
Comments
Could you also give some background information on what kind of fonts you were examining? Are these fonts from 514 different families?
What stylistic characteristics do they have – sans, serif, script, blackletter, … ?
Did you also include italics?
How is your analysis dealing with class kerning?
Does the size of the kerning value play a role too?
I could probably run an analysis for which pairs are absent. It would be limited by my initial choice of languages to combine and by the limitations of the source corpora. If I find myself in the next few days looking for a distraction from work for a while, I'll see what I can come up with.
Unless someone beats me to it. ;-)
http://www.typophile.com/node/5106
@Ramiro Espinoza https://en.wikipedia.org/wiki/Electronvolt
It gives a hint on the most common instances regarding ‘normal’ typefaces, sure. Hence it says very little about equally neccessary cases like f” f] (j – and so on. But the choice of pairings is also dependent of the specific design of a typeface, e.g. if it is a blackletter or scriptish design or otherwise different from mainstream models.
I was always more interested in a systematic approach towards the generally most important pairings, as well as for a practical overview on language-specific pairings which are not that much obvious at first-hand for most of us (e.g. f_ð).
I've long thought that a more flexible approach to sidebearings--something beyond simple advance widths that takes into account glyph shape--would eliminate most of the need for kerning pairs, and even allow automatic spacing between different styles and fonts, or even different sizes of the same font/style.
There have been some tools on the developer side that let the type designer work as if this were the case, but in the end you still need to generate kerning pairs in the finished font since that's all the font formats support.
Ideally, something like this could be incorporated into the font spec, eliminating the need to worry about wasting time on kerning combinations that will never arise in actual use. Fonts would, in effect, be self-kerning.
Anyway, sorry to go off-topic.
Are these fonts from 514 different families? What stylistic characteristics do they have – sans, serif, script, blackletter, … ? Did you also include italics?
It's a really mixed bag; the idea was to produce a fairly representative training set of all the different things you might throw at an autokerner, so that it would cope well with whatever it saw. In practice this meant some families and some individual styles; no blackletters but lots of sans, a few serif, and one or two script. Some text, but generally display. (I wish there were a subcategory of "display" meaning "not exactly a text font, but at the same time something more like Optima rather than graffiti, handdrawn, roughened unicase alphabets, and pictures of cats.")
How is your analysis dealing with class kerning?
These are computed kern values between the pair - the value you would send to a layout engine. So after class kerns have been "decomposed".
Does the size of the kerning value play a role too?
Nope, anything non-zero is in there.
As mentioned, my master word list was compiled from An Crúbadán corpora. Since those corpora were basically scraped from the internet, there’s a lot of flotsam in there. When originally compiling my list, I did some broad work to remove acronyms and such. But there’s no accounting for misspellings, etc. It’s not a carefully curated set.
The languages that I combined were English, German, French, Italian, Spanish, Portuguese, Dutch, Danish, Swedish, Finnish, Norwegian, Estonian, Polish, Czech, Slovak, Albanian, Hungarian, Turkish, Lithuanian, Romanian, Latvian, Maltese, Northern Sami, Welsh. The goal was to have a representative for all the extended latin characters in my typical char set.
My combined list consists of 951,208 words.
Using a character set gleaned from the word list itself — abcdefghijklmnopqrstuvwxyzàáâãäāăąåæçćčċðďđèéêěëēėęğġģħìíîïīįıķĺļľłńňñņŋòóôõöőøŕřśšşșßțťŧþùúûüūůűųŵẁỳýŷÿźžż — yields 11,449 possible combinations.
Out of those combinations, 7917 pairs had no occurrence in the word list. Checking against initial position only, that list increases to 9200 non-occurring UC-lc pairs.
The vast majority of these are combinations of two accented letters that don’t occur in the same language (out of those surveyed).
There may be some false positives here. For instance, I saw some basic ß-combinations in the results for which folks came up with rare examples in another thread here, but which must not have occurred in the corpora I used.
So, like I said, I’m not sure what practical use this has. But here you go anyway.
Where we differ is that I kern lc-to-cap combinations, and nearly everything.
I have also come in recent years to feel that for oddball glyphs that “sort of almost fit” but not perfectly, in a class, but wouldn't be worth kerning on their own.... Well, as long as it does not cause over-tight kerning of those glyphs, it is better to lump them in a class and give them imperfect kerning, than to leave them unkerned.
Indeed. Before the recent rise of the far-right in Germany, I would not have thought I needed to consider the “fD” combination.
I definitely took notice when the following showed up in my mailbox in a Sept 2016 issue of TIME magazine.
I was somewhat relieved to see that the combination did not crash horribly in the fonts that I designed for them. I’m not sure that a kern pair would have improved things much, anyway, but it certainly put this combination on my radar.
http://www.bbc.com/news/world-africa-43821512
I like your idea a lot!
@Simon Cozens
Is there a way to make it find the group kerning, like one can assume all the following:
Aring V
Should be `A V`. Can you run it through the kerning table instead?
https://docs.google.com/document/d/1fQOvESU6pVXkNAFWzc83zwpBirBbyCb-G3Ov1ebKQNQ
▒spacing test HTH ATA TAT HAH
All critiques reserved