How to determine a perfect/test proofing word ?

Hugo Jourdan
Hugo Jourdan Posts: 10
edited October 2022 in Technique and Theory
Hey,

I'm currently working on an OpenSource algorythm to determine from a list of word, which is the best to proof/test a font.
I work mostly with language dictionary as input. My script set a "score" for each words based on different condition.
Depending on whether the word responds positively or negatively to a condition its score increases or decreases.

For now, here are some conditions already set :

- Word length : The more a word is long, the more it gains points

- No repeated letters : If word don't have the same letter twice, it gain points
Because the goal is the check as many letters as possible, repeated letter is not something you want (most of the time).

- No hyphen : if Hyphen in word, it loose points
Hyphen breack word rhythm

- Letter singularity : for each letter in the word, it gain points depending if letter is a "singular". An "a" brings more points than an "i" or an "n" for example.
Ideally, we want a word contains letters that shows the characteristics of a font.

- Diversity : If a word contain too much oblique/round/descender/ascender/short letters, it loose points.
Be able to see the height of the descendants/ascendants, how obliques behave against vertical stems, how round shapes work with straight shapes, are useful things.

At the end my script return an ordered list with the words with the highest scores.
Here is a short example with en English Dict as entry :

{'housewarming': 156, 'motherfucking': 155, 'backgrounds': 147, 'thunderclaps': 146, 'considerably': 146, 'unforgivable': 146, 'indistinguishable': 145, 'malnourished': 145, 'counterintelligence': 145, 'misunderstandings': 145, 'guardsmen': 144, 'macpherson': 143, 'motherfuckin': 143, 'buckingham' ...}


For the moment, my script is specially made for Latin but I would like to integrate as many scripts as possible.
Do you have any others idea of conditions that would improve the algorithm for Latin ?
Or any other additional conditions for others scripts ?







Comments

  • Craig Eliason
    Craig Eliason Posts: 1,436
    edited October 2022
    Maybe should have an initial cap, to show cap height. 
    (R may be the prefererable cap letter, as it has vertical stem, bowl, *and* diagonal. What was your highest scoring word that started with R?)
  • You are pointing something important. My script use lowercase letters by default.
    So the results are the "best" words set in lowercase. It could be great to add another option to find the "best" word set in UPPERCASE.

    Here is the 10 words with the highest score, starting with an "r":
    {'r': ['regulations', 'republicans', 'representatives', 'rosamund', 'replaying', 'ramblings', 'relocating', 'regionals', 'regulation', 'readings']
  • It could be great to add another option to find the "best" word set in UPPERCASE.

    Yes but also Mixed Case, as neither UPPERCASE nor lowercase would in themselves show the relationships between the two.
  • John Hudson
    John Hudson Posts: 3,186
    I like this approach very much. Will be interested to see it adapted to words from languages other than English first, and then to Cyrillic and Greek. Your concept of ‘singular’ will be useful for Cyrillic; less so for Greek, since basically all the Greek lowercase letters can be considered singulars.

    For many languages, you will probably want to ignore the presence of accent signs on letters when identifying words in this way; i.e. your results should include words with accented letters scored the same as the base letters. Although such words may not be helpful for testing during initial type design stages when diacritic marks have not been created, they give a more complete impression of characters within the language.
  • I have no ‘algorythm’ for this but I always found  endumachtoligryphe  fairly demonstrative for reviewing my designs.
    For looking at various languages I like to use pangrams, like:
    Árvíztűrő tükörfúrógép
    Nechť již hříšné saxofony ďáblů rozezvučí síň úděsnými tóny waltzu
    Törkylempijävongahdus
    Høj bly gom vandt fræk sexquiz på wc




  • Great tool! You don't mention whether diagonal strokes are part of the criteria. I’d give letters with diagonals some points in your system because diagonals usually differ from other straight strokes and often hide errors in a type design. Letters with diagonals are missing from many of your highest scoring words.
  • Nick Shinn
    Nick Shinn Posts: 2,207
    edited October 2022
    What is the score for Hamburgefonts?

    It lacks, of course, diagonals—an example of what Alastair Johnston called “prime rib” in Alphabets to Order, omitting the bothersome, nasty, spiky characters in specimens which were, after all, sales tools. For a similar reason Latin lower case (Quousque tandem etc…) was favored for body text, the spacing always being nice, before massive kerning became viable, to better fit the diagonal letters.

    Of more use to type designers, “-iv” has been added: Hamburgefontsiv.
  • Hamburgefonts is a classic, Hamburgefonstiv one of my favourites.
    In the early stage of a design process I like to look at daumenchor. Nonsense words don’t distract the mind. As next come Hengoldyframpusch or Dampfguldenchorist. The trick is, they look like real words …
    Of course Downton Abbey Road has some merits, too.
  • Some time ago I wrote a Glyphs plugin for this purpose, but crucially it will find words with the selected letters and spellable with the available letters in the font, which is much more useful for work in progress proofing. There is two criteria for your algorithm. :)
  • How about a "Next glyph to design" script that assesses which glyphs are not yet designed, and reports which of those, combined with the ones that are done, would add most to the list of words that could be spelled?
  • Nick Shinn
    Nick Shinn Posts: 2,207
    edited October 2022
    For capitals I like MASSIVE ATTACK, HEMINGWAY, and LAWYERS.
    The premise being that if you get the hard stuff right first, the rest falls into place.
  • This a bit of a digression from your project, Hugo, but I hope it’s helpful: When I began working on The Anatomy of Type, it became clear I’d need a bank of pithy words that could be used to illustrate the distinctive characteristics of each featured typeface. We settled on a set of letters that contained most of a typeface’s DNA: aegilos, at least one capital (preferring GRSQ), and at least one diagonal stroke. The list is now up in the Typographica Library.
  • What is the score for Hamburgefonts?

    It lacks, of course, diagonals—an example of what Alastair Johnston called “prime rib” in Alphabets to Order, omitting the bothersome, nasty, spiky characters in specimens which were, after all, sales tools. For a similar reason Latin lower case (Quousque tandem etc…) was favored for body text, the spacing always being nice, before massive kerning became viable, to better fit the diagonal letters.

    Of more use to type designers, “-iv” has been added: Hamburgefontsiv.
    Not surprisingly, they are at the top of the list :

     {'hamburgefontsiv': 178, 'hamburgefonts': 159, 'housewarming': 156, 'motherfucking': 155, 'backgrounds': 147, 'thunderclaps': 146, 'considerably': 146, 'unforgivable': 146, 'malnourished': 145, 'guardsmen': 144, 'macpherson': 143, 'motherfuckin': 143, ...
  • Some time ago I wrote a Glyphs plugin for this purpose, but crucially it will find words with the selected letters and spellable with the available letters in the font, which is much more useful for work in progress proofing. There is two criteria for your algorithm. :)
    This script is a part of a future Glyphs Plugin, presented here : Find Context - Glyph Plugin. I decided to wrote this script to help me build a basic database for this plugin.

  • This a bit of a digression from your project, Hugo, but I hope it’s helpful: When I began working on The Anatomy of Type, it became clear I’d need a bank of pithy words that could be used to illustrate the distinctive characteristics of each featured typeface. We settled on a set of letters that contained most of a typeface’s DNA: aegilos, at least one capital (preferring GRSQ), and at least one diagonal stroke. The list is now up in the Typographica Library.
    This is not a digression at all, thanks for this ressource. It's exactly how my script work.
    Here is the rating for each lowercase/uppercase : 

    "a":4, "A":2
    "b":2, "B":3
    "c":1, "C":1
    "d":2, "D":1
    "e":2, "E":1
    "f":2, "F":1
    "g":4, "G":3
    "h":2, "H":1
    "i":1, "I":1
    "j":2, "J":1
    "k":3, "K":2
    "l":1, "L":1
    "m":2, "M":2
    "n":1, "N":2
    "o":1, "O":1
    "p":2, "P":2
    "q":2, "Q":2
    "r":3, "R":3
    "s":3, "S":2
    "t":2, "T":1
    "u":1, "U":1
    "v":2, "V":1
    "w":2, "W":2
    "x":2, "X":1
    "y":3, "Y":2
    "z":2, "Z":1
  • John Hudson
    John Hudson Posts: 3,186
    edited October 2022
    The usefulness of particular test or specimen words or pseudo-words depends on what one is wanting to test or illustrate, which will vary at different stages of typeface development and, after, in presentation of the design. The first pseudo-word I type in the early stages of a Latin typeface design is almost always ‘nihilim’, because those—with n and sometimes o—are the letters I have on hand and this sequence of letters is useful for testing rhythm of proportion and spacing.
  • Chris Lozos
    Chris Lozos Posts: 1,458
    nuonuvnu is where I start.

  • The usefulness of particular test or specimen words or pseudo-words depends on what one is wanting to test or illustrate, which will vary at different stages of typeface development and, after, in presentation of the design. The first pseudo-word I type in the early stages of a Latin typeface design is almost always ‘nihilim’, because those—with n and sometimes o—are the letters I have on hand and this sequence of letters is useful for testing rhythm of proportion and spacing.
    By default my script set a better score when a word contain only one letter of a type of letter (descender, ascender, oblique, short, round). 

    I added an option to do the opposite, when a word contain more than one letter of a type of letter it gain more points (so for example, with this option chocolate have a better score, because it contains 3 "round" letters ["o", "c", "e"].

    This is useful to find words to check consistency of round /or/ oblique /or/ ascender /or/ descender letters.
  • Nick Shinn
    Nick Shinn Posts: 2,207

    But no diagonals, unfortunately.
  • Hugo Jourdan
    Hugo Jourdan Posts: 10
    edited October 2022
    I finished this script and use it's data in my new Glyphs Plugin : Context Manager
    But this algorythm to find best proofing words work fine for Latin letters but can be highly improved for others script.

    For each script I need to have something like this : 

    Latin = {

    "lowercase":{

    "groups":{
    'descender' : ['p', 'q', 'y', 'j', 'g'],
    'ascender' : ['b', 'd', 'f', 'h', 'k', 'l', 't'],
    'round' : ['c', 'e', 'o', 's'],
    'oblique' : ['v', 'w', 'y', 'z', 'k', 'x'],
    'short' : ['r', 't', 'f', 'i', 'j', 'l'],
    'shoulder' : ['n', 'h', 'u'],
    'bowl' : ['b', 'd', 'p', 'q']},

    "rating" :
    {'a': 4, 'b': 2, 'c': 1, 'd': 2, 'e': 2, 'f': 2, 'g': 4, 'h': 2, 'i': 1, 'j': 2, 'k': 3, 'l': 1, 'm': 2, 'n': 1, 'o': 1, 'p': 2, 'q': 2, 'r': 3, 's': 3, 't': 2, 'u': 1, 'v': 2, 'w': 2, 'x': 2, 'y': 3, 'z': 2}},

    "uppercase":{
    "groups":{
    'round' : ['C', 'D', 'G', 'O', 'Q'],
    'horizontal' : ['A', 'E', 'F', 'H', 'I', 'T', 'Z'],
    'bowl' : ['B', 'P', 'R'],
    'oblique' : ['V', 'W', 'Y', 'Z', 'K', 'X', 'A']},

    "rating":
    {'A': 2, 'B': 3, 'C': 1, 'D': 1, 'E': 1, 'F': 1, 'G': 3, 'H': 1, 'I': 1, 'J': 1, 'K': 2, 'L': 1, 'M': 2, 'N': 2, 'O': 1, 'P': 2, 'Q': 2, 'R': 3, 'S': 2, 'T': 1, 'U': 1, 'V': 1, 'W': 2, 'X': 1, 'Y': 2, 'Z': 1}}}

    groups are list of letters sharing same characteristics.
    rating are rate for each letter depending if the letter show a lot/less characteristic of the font.

    If anyone is interested in helping me with other scripts, I'd love to hear from you. 

  • t does not belong to ‘ascender’.
  • PabloImpallari
    PabloImpallari Posts: 806
    edited October 2022

    Depending on whether the word responds positively or negatively to a condition its score increases or decreases.
    ... Do you have any others idea of condition? ...

    Hi Hugo!! Awesome idea. Love it!

    I have an idea for a new condition in your list: "Kerning"
    Because some letter pairs have a natural tendency for the need of kerning while others not. (for example: AV,LV, LT, PJ, Av, Vo, vo, ke, xc etc)

    It can be useful for us to purposely avoid this letter pairs combinations in the early development stages (when we have not kerned the font yet) since they can misled our spacing/rhythm decisions.
    In early stages we may want the occurrence of this pairs to lower your score.

    On the other side, in later stages when we work on kerning we may want the oposite, we may want to avoid letters with no kerning pairs to find words with many kerned pairs, so we may want to increase the score.

    So, we may want to choose how kerning scores from these 3 options:
    1) Ignore kerning pairs (as it is now)
    2) No kerning pairs (decrease score, to avoid rhythm contamination)
    3) Many kerning pairs (increase score, to check for kerning consistency)

    The problem is that there is no "standard" kerning pairs list for all typeface designs, since they vary on each different typeface. And to solve that I've made python script (1) that summarized the most common pairs across 1000 great fonts and compiled the results in my Font Testing Page a few years ago. You can see the resulting pairs here (They are showed in the context of control letters, like HH or OO, but I hope you can easily extract the pairs to create a list you can use):
    http://www.cyreal.org/Font-Testing-Page/index-latin-02.php (navigate to "Minimal Kerning Pairs" tab)

    (1) https://github.com/impallari/Impallari-Fontlab-Macros/blob/master/IMP Kerning/21 Analize Kerning all fonts.py

    By the way, if you need a spanish dictionary, mine is here for you to grab it:
    https://github.com/impallari/Font-Testing-Page/tree/master/includes/dictionaries

    Also, if you are curious, I have another "proof of concept" idea for a tool that shows kerning in a easy and intuitive way, for both kerned and unkerned pairs, in the context of long words here: https://github.com/impallari/Contextual-Kerning-Tool
    Feel free to make it happen as a glyphs plugin if you like it.

    Again, many many congrats for your awesome algorythm. I love it!!!

  • I have an idea for a new condition in your list: "Kerning"
    Because some letter pairs have a natural tendency for the need of kerning while others not. (for example: AV,LV, LT, PJ, Av, Vo, vo, ke, xc etc)

    This is a really revelant idea, I will add it, Thanks Pablo !
    I think I will use the Revelant Kerning List made by Andre Fuchs.

    I will also add an extra filter in my Context Manager Plugin, to filter words with potential kerning.

    I will soon make a repo with my algorithm.