How to determine a perfect/test proofing word ?
Hugo Jourdan
Posts: 10
Hey,
I'm currently working on an OpenSource algorythm to determine from a list of word, which is the best to proof/test a font.
I work mostly with language dictionary as input. My script set a "score" for each words based on different condition.
Depending on whether the word responds positively or negatively to a condition its score increases or decreases.
For now, here are some conditions already set :
- Word length : The more a word is long, the more it gains points
- No repeated letters : If word don't have the same letter twice, it gain points
Because the goal is the check as many letters as possible, repeated letter is not something you want (most of the time).
- No hyphen : if Hyphen in word, it loose points
Hyphen breack word rhythm
- Letter singularity : for each letter in the word, it gain points depending if letter is a "singular". An "a" brings more points than an "i" or an "n" for example.
Ideally, we want a word contains letters that shows the characteristics of a font.
- Diversity : If a word contain too much oblique/round/descender/ascender/short letters, it loose points.
Be able to see the height of the descendants/ascendants, how obliques behave against vertical stems, how round shapes work with straight shapes, are useful things.
At the end my script return an ordered list with the words with the highest scores.
Here is a short example with en English Dict as entry :
{'housewarming': 156, 'motherfucking': 155, 'backgrounds': 147, 'thunderclaps': 146, 'considerably': 146, 'unforgivable': 146, 'indistinguishable': 145, 'malnourished': 145, 'counterintelligence': 145, 'misunderstandings': 145, 'guardsmen': 144, 'macpherson': 143, 'motherfuckin': 143, 'buckingham' ...}
For the moment, my script is specially made for Latin but I would like to integrate as many scripts as possible.
Do you have any others idea of conditions that would improve the algorithm for Latin ?
Or any other additional conditions for others scripts ?
I'm currently working on an OpenSource algorythm to determine from a list of word, which is the best to proof/test a font.
I work mostly with language dictionary as input. My script set a "score" for each words based on different condition.
Depending on whether the word responds positively or negatively to a condition its score increases or decreases.
For now, here are some conditions already set :
- Word length : The more a word is long, the more it gains points
- No repeated letters : If word don't have the same letter twice, it gain points
Because the goal is the check as many letters as possible, repeated letter is not something you want (most of the time).
- No hyphen : if Hyphen in word, it loose points
Hyphen breack word rhythm
- Letter singularity : for each letter in the word, it gain points depending if letter is a "singular". An "a" brings more points than an "i" or an "n" for example.
Ideally, we want a word contains letters that shows the characteristics of a font.
- Diversity : If a word contain too much oblique/round/descender/ascender/short letters, it loose points.
Be able to see the height of the descendants/ascendants, how obliques behave against vertical stems, how round shapes work with straight shapes, are useful things.
At the end my script return an ordered list with the words with the highest scores.
Here is a short example with en English Dict as entry :
{'housewarming': 156, 'motherfucking': 155, 'backgrounds': 147, 'thunderclaps': 146, 'considerably': 146, 'unforgivable': 146, 'indistinguishable': 145, 'malnourished': 145, 'counterintelligence': 145, 'misunderstandings': 145, 'guardsmen': 144, 'macpherson': 143, 'motherfuckin': 143, 'buckingham' ...}
For the moment, my script is specially made for Latin but I would like to integrate as many scripts as possible.
Do you have any others idea of conditions that would improve the algorithm for Latin ?
Or any other additional conditions for others scripts ?
7
Comments
-
Maybe should have an initial cap, to show cap height.
(R may be the prefererable cap letter, as it has vertical stem, bowl, *and* diagonal. What was your highest scoring word that started with R?)1 -
You are pointing something important. My script use lowercase letters by default.
So the results are the "best" words set in lowercase. It could be great to add another option to find the "best" word set in UPPERCASE.
Here is the 10 words with the highest score, starting with an "r":
{'r': ['regulations', 'republicans', 'representatives', 'rosamund', 'replaying', 'ramblings', 'relocating', 'regionals', 'regulation', 'readings']
3 -
Hugo Jourdan said:It could be great to add another option to find the "best" word set in UPPERCASE.1
-
I like this approach very much. Will be interested to see it adapted to words from languages other than English first, and then to Cyrillic and Greek. Your concept of ‘singular’ will be useful for Cyrillic; less so for Greek, since basically all the Greek lowercase letters can be considered singulars.
For many languages, you will probably want to ignore the presence of accent signs on letters when identifying words in this way; i.e. your results should include words with accented letters scored the same as the base letters. Although such words may not be helpful for testing during initial type design stages when diacritic marks have not been created, they give a more complete impression of characters within the language.2 -
I have no ‘algorythm’ for this but I always found endumachtoligryphe fairly demonstrative for reviewing my designs.For looking at various languages I like to use pangrams, like:Árvíztűrő tükörfúrógépNechť již hříšné saxofony ďáblů rozezvučí síň úděsnými tóny waltzuTörkylempijävongahdusHøj bly gom vandt fræk sexquiz på wc
2 -
Great tool! You don't mention whether diagonal strokes are part of the criteria. I’d give letters with diagonals some points in your system because diagonals usually differ from other straight strokes and often hide errors in a type design. Letters with diagonals are missing from many of your highest scoring words.
2 -
This reminds me of my font name generator, which allows to specify letters that should be showcased in the name and sorts a dictionary based on that:
https://github.com/jenskutilek/WoLiBaFoNaGen
5 -
What is the score for Hamburgefonts?
It lacks, of course, diagonals—an example of what Alastair Johnston called “prime rib” in Alphabets to Order, omitting the bothersome, nasty, spiky characters in specimens which were, after all, sales tools. For a similar reason Latin lower case (Quousque tandem etc…) was favored for body text, the spacing always being nice, before massive kerning became viable, to better fit the diagonal letters.
Of more use to type designers, “-iv” has been added: Hamburgefontsiv.1 -
Hamburgefonts is a classic, Hamburgefonstiv one of my favourites.In the early stage of a design process I like to look at daumenchor. Nonsense words don’t distract the mind. As next come Hengoldyframpusch or Dampfguldenchorist. The trick is, they look like real words …Of course Downton Abbey Road has some merits, too.
0 -
Some time ago I wrote a Glyphs plugin for this purpose, but crucially it will find words with the selected letters and spellable with the available letters in the font, which is much more useful for work in progress proofing. There is two criteria for your algorithm.
0 -
How about a "Next glyph to design" script that assesses which glyphs are not yet designed, and reports which of those, combined with the ones that are done, would add most to the list of words that could be spelled?0
-
For capitals I like MASSIVE ATTACK, HEMINGWAY, and LAWYERS.
The premise being that if you get the hard stuff right first, the rest falls into place.
1 -
This a bit of a digression from your project, Hugo, but I hope it’s helpful: When I began working on The Anatomy of Type, it became clear I’d need a bank of pithy words that could be used to illustrate the distinctive characteristics of each featured typeface. We settled on a set of letters that contained most of a typeface’s DNA: aegilos, at least one capital (preferring GRSQ), and at least one diagonal stroke. The list is now up in the Typographica Library.1
-
Nick Shinn said:What is the score for Hamburgefonts?
It lacks, of course, diagonals—an example of what Alastair Johnston called “prime rib” in Alphabets to Order, omitting the bothersome, nasty, spiky characters in specimens which were, after all, sales tools. For a similar reason Latin lower case (Quousque tandem etc…) was favored for body text, the spacing always being nice, before massive kerning became viable, to better fit the diagonal letters.
Of more use to type designers, “-iv” has been added: Hamburgefontsiv.
{'hamburgefontsiv': 178, 'hamburgefonts': 159, 'housewarming': 156, 'motherfucking': 155, 'backgrounds': 147, 'thunderclaps': 146, 'considerably': 146, 'unforgivable': 146, 'malnourished': 145, 'guardsmen': 144, 'macpherson': 143, 'motherfuckin': 143, ...
0 -
Johannes Neumeier said:Some time ago I wrote a Glyphs plugin for this purpose, but crucially it will find words with the selected letters and spellable with the available letters in the font, which is much more useful for work in progress proofing. There is two criteria for your algorithm.
1 -
Stephen Coles said:This a bit of a digression from your project, Hugo, but I hope it’s helpful: When I began working on The Anatomy of Type, it became clear I’d need a bank of pithy words that could be used to illustrate the distinctive characteristics of each featured typeface. We settled on a set of letters that contained most of a typeface’s DNA: aegilos, at least one capital (preferring GRSQ), and at least one diagonal stroke. The list is now up in the Typographica Library.
Here is the rating for each lowercase/uppercase :
"a":4, "A":2"b":2, "B":3"c":1, "C":1"d":2, "D":1"e":2, "E":1"f":2, "F":1"g":4, "G":3"h":2, "H":1"i":1, "I":1"j":2, "J":1"k":3, "K":2"l":1, "L":1"m":2, "M":2"n":1, "N":2"o":1, "O":1"p":2, "P":2"q":2, "Q":2"r":3, "R":3"s":3, "S":2"t":2, "T":1"u":1, "U":1"v":2, "V":1"w":2, "W":2"x":2, "X":1"y":3, "Y":2"z":2, "Z":12 -
The usefulness of particular test or specimen words or pseudo-words depends on what one is wanting to test or illustrate, which will vary at different stages of typeface development and, after, in presentation of the design. The first pseudo-word I type in the early stages of a Latin typeface design is almost always ‘nihilim’, because those—with n and sometimes o—are the letters I have on hand and this sequence of letters is useful for testing rhythm of proportion and spacing.1
-
nuonuvnu is where I start.1
-
John Hudson said:The usefulness of particular test or specimen words or pseudo-words depends on what one is wanting to test or illustrate, which will vary at different stages of typeface development and, after, in presentation of the design. The first pseudo-word I type in the early stages of a Latin typeface design is almost always ‘nihilim’, because those—with n and sometimes o—are the letters I have on hand and this sequence of letters is useful for testing rhythm of proportion and spacing.
I added an option to do the opposite, when a word contain more than one letter of a type of letter it gain more points (so for example, with this option chocolate have a better score, because it contains 3 "round" letters ["o", "c", "e"].
This is useful to find words to check consistency of round /or/ oblique /or/ ascender /or/ descender letters.2 -
But no diagonals, unfortunately.2 -
I finished this script and use it's data in my new Glyphs Plugin : Context Manager
But this algorythm to find best proofing words work fine for Latin letters but can be highly improved for others script.
For each script I need to have something like this :Latin = {"lowercase":{
"groups":{
'descender' : ['p', 'q', 'y', 'j', 'g'],'ascender' : ['b', 'd', 'f', 'h', 'k', 'l', 't'],'round' : ['c', 'e', 'o', 's'],'oblique' : ['v', 'w', 'y', 'z', 'k', 'x'],'short' : ['r', 't', 'f', 'i', 'j', 'l'],'shoulder' : ['n', 'h', 'u'],'bowl' : ['b', 'd', 'p', 'q']},"rating" :
{'a': 4, 'b': 2, 'c': 1, 'd': 2, 'e': 2, 'f': 2, 'g': 4, 'h': 2, 'i': 1, 'j': 2, 'k': 3, 'l': 1, 'm': 2, 'n': 1, 'o': 1, 'p': 2, 'q': 2, 'r': 3, 's': 3, 't': 2, 'u': 1, 'v': 2, 'w': 2, 'x': 2, 'y': 3, 'z': 2}},"uppercase":{
"groups":{
'round' : ['C', 'D', 'G', 'O', 'Q'],'horizontal' : ['A', 'E', 'F', 'H', 'I', 'T', 'Z'],'bowl' : ['B', 'P', 'R'],'oblique' : ['V', 'W', 'Y', 'Z', 'K', 'X', 'A']},"rating":groups are list of letters sharing same characteristics.
{'A': 2, 'B': 3, 'C': 1, 'D': 1, 'E': 1, 'F': 1, 'G': 3, 'H': 1, 'I': 1, 'J': 1, 'K': 2, 'L': 1, 'M': 2, 'N': 2, 'O': 1, 'P': 2, 'Q': 2, 'R': 3, 'S': 2, 'T': 1, 'U': 1, 'V': 1, 'W': 2, 'X': 1, 'Y': 2, 'Z': 1}}}
rating are rate for each letter depending if the letter show a lot/less characteristic of the font.
If anyone is interested in helping me with other scripts, I'd love to hear from you.
1 -
t does not belong to ‘ascender’.
0 -
Hugo Jourdan said:
Depending on whether the word responds positively or negatively to a condition its score increases or decreases.
... Do you have any others idea of condition? ...
I have an idea for a new condition in your list: "Kerning"
Because some letter pairs have a natural tendency for the need of kerning while others not. (for example: AV,LV, LT, PJ, Av, Vo, vo, ke, xc etc)
It can be useful for us to purposely avoid this letter pairs combinations in the early development stages (when we have not kerned the font yet) since they can misled our spacing/rhythm decisions.
In early stages we may want the occurrence of this pairs to lower your score.
On the other side, in later stages when we work on kerning we may want the oposite, we may want to avoid letters with no kerning pairs to find words with many kerned pairs, so we may want to increase the score.
So, we may want to choose how kerning scores from these 3 options:
1) Ignore kerning pairs (as it is now)
2) No kerning pairs (decrease score, to avoid rhythm contamination)
3) Many kerning pairs (increase score, to check for kerning consistency)
The problem is that there is no "standard" kerning pairs list for all typeface designs, since they vary on each different typeface. And to solve that I've made python script (1) that summarized the most common pairs across 1000 great fonts and compiled the results in my Font Testing Page a few years ago. You can see the resulting pairs here (They are showed in the context of control letters, like HH or OO, but I hope you can easily extract the pairs to create a list you can use):
http://www.cyreal.org/Font-Testing-Page/index-latin-02.php (navigate to "Minimal Kerning Pairs" tab)
(1) https://github.com/impallari/Impallari-Fontlab-Macros/blob/master/IMP Kerning/21 Analize Kerning all fonts.py
By the way, if you need a spanish dictionary, mine is here for you to grab it:
https://github.com/impallari/Font-Testing-Page/tree/master/includes/dictionaries
Also, if you are curious, I have another "proof of concept" idea for a tool that shows kerning in a easy and intuitive way, for both kerned and unkerned pairs, in the context of long words here: https://github.com/impallari/Contextual-Kerning-Tool
Feel free to make it happen as a glyphs plugin if you like it.
Again, many many congrats for your awesome algorythm. I love it!!!2 -
PabloImpallari said:
I have an idea for a new condition in your list: "Kerning"
Because some letter pairs have a natural tendency for the need of kerning while others not. (for example: AV,LV, LT, PJ, Av, Vo, vo, ke, xc etc)
I think I will use the Revelant Kerning List made by Andre Fuchs.
I will also add an extra filter in my Context Manager Plugin, to filter words with potential kerning.
I will soon make a repo with my algorithm.2
Categories
- All Categories
- 43 Introductions
- 3.7K Typeface Design
- 803 Font Technology
- 1K Technique and Theory
- 622 Type Business
- 444 Type Design Critiques
- 542 Type Design Software
- 30 Punchcutting
- 136 Lettering and Calligraphy
- 83 Technique and Theory
- 53 Lettering Critiques
- 485 Typography
- 303 History of Typography
- 114 Education
- 68 Resources
- 499 Announcements
- 80 Events
- 105 Job Postings
- 148 Type Releases
- 165 Miscellaneous News
- 270 About TypeDrawers
- 53 TypeDrawers Announcements
- 116 Suggestions and Bug Reports