Type specimen word/phrase generator?
Stephen Coles
Posts: 1,009
While working on my book, I needed words or phrases that contained the most distinctive glyphs from each typeface while meeting width and language requirements, along with other criteria. Finding these phrases can be fun, but it can also be much more difficult and time-consuming than one might think, particularly with unfamiliar languages. I got some help from Scrabble tools and a few friends (Frank Grießhammer, Tânia Raposo, Miguel Sousa, Laura Serra) and ended up with a pretty good list, but the experience made me wish for a tool to assist with future projects.
Of course, there are a few fine tools developed to help type designers test and proof their fonts. These include:
Of course, there are a few fine tools developed to help type designers test and proof their fonts. These include:
- adhesiontext by Miguel Sousa
- Just Another Test Text Generator by Tim Ahrens
- Wordlist Maker by Matthew Butterick
- word-o-mat (RoboFont extension) by Nina Stössinger
- Typable by Ondrej Jób
- Select language(s)
- Define key characters: ______ (case sensitive)
- Allow characters other than key characters: yes/no
- Allow proper nouns: yes/no
- Allow words, phrases, or both
- Sort results: alphabetically, by word length (# of letters), or by # of key characters contained
e.g. Input: gaoesf
Output: flagpoles flages poles foes lop …
4
Comments
-
Bravo!0
-
I'd like to be able to input some select caps along with lowercase and have the output do initial caps on words that start with them. E.g.:
input: HOEFTLhamburgefonstiv
output: rotgut art ion is gotten Hue man in a rumbustious overstriving Toosie brig a Fearnought gent Fine Fibroneuroma Ha unifarious aures Of unsnib This Tour attaint reinsist unmoor bumf tubbier Hah romaines sag banns revisitation internists as us Out roe veneers Ferrivorous stir Tubae a venerative gemmating Tee Or gig atrematous Fanga Entomion Firethorn Fetes so mob at nan Hot anaerobiosis some raga Engrammes gonangia Fuse is...
I don't think any of those tools do this yet (correct me if I'm wrong). Maybe stewf's "proper nouns" option would be getting at this already.0 -
The code for my wordlist generator is available on GitHub, under the MIT license.
FWIW, this would be an excellent project for anyone who wants to learn a little about programming. It’s simple and teaches useful essentials (input / processing / output). I wrote mine in a combination of Perl & PHP (seemed like a good idea at the time) but you could do it equally well in Python or Racket or whatever pleases you. The basic idea — start with giant list of words, filter them down to the ones you want — remains the same.0 -
Phrases gets a little bit more complicated, but markov chains could be a fun way of generating reasonable phrases from a base word list (a python library: https://pypi.python.org/pypi/PyMarkovChain/). You'd also potentially be able to choose what kind of phrases: tweets, Dickens, New York Times, etc.1
-
Tim's is also libre, in Python, GPLv2, at https://github.com/justanotherfoundry/text-generator
There's also http://libretext.org/ which is PHP, Affero GPLv3, at https://github.com/garethsprice/libretext/
And there's also http://www.impallari.com/testing/ which has a Tools section with test text generators; its PHP, MIT licensed, at https://github.com/impallari/font-testing-page/
I'm working on http://www.testmyfont.com (probably 100% JS, Apache) which I hope will provide family/style management, test text generation, sample test texts, comparison tools, and a question/answer UI.0 -
Hey Stephen, your post got me thinking, so I started playing with some code.
Instead of sorting by # of letters, I'm making a tool that enables sorting by the total sum of the advance-width of all the glyphs included in each word (this is different for each font).
Attached two examples where all the words have the same advance width total sum, but the glyph count increases +1 in each row.
In the Encode sample, all the words have an total Advance Width sum of 4189 units.
In the Libre Caslon Display sample, all have 2675.5 -
Playing a little bit more...
Here each word advance width follows a Lucas progression with the next row.
Things get more interesting
3 -
Ok, last one...
Claus's Playfair Display, all words have the same advance width across the family.
They are all set at 72pt.
No scaling, no tracking, no nothing... jut typed as they are, Old-School style!
Making this last one was so fast and so easy, that feels like cheating...8 -
If you're merely summing advance widths, you are cheating, and your proof demonstrates the flaw in this approach.0
-
Yes Matthew, we also need to take kerning into account. But it's a lot more difficult to code that feature.
Already noted on my To-do list for next versions. In the meantime, we can simply pick words that are a few units larger or shorter, to compensate for kerning pairs.
All in all, it's not bad for an initial version. It can already save lot of time as it is now.
Will record a mini tutorial on how to use the tool and post shortly.2 -
we can simply pick words that are a few units larger or shorter, to compensate for kerning pairs.
Ah, but doing this accurately requires knowing the size of the kerning pairs, and if you know that … then you could’ve compensated for them at the outset. (And we still haven’t gotten to GPOS/GSUB adjustments, etc.)
If you like your approach, keep it. Having gone down a similar road with my text generator, I came to feel that the advance-widths shortcut doesn’t solve a problem, but rather defers it.
More recently I’ve been working with Pango and Harfbuzz (open-source text-layout engines) to get precise measurements of styled text. Ideally you would render the text with the same engine that you use to measure it. But short of that, it’s still a more accurate approach to text measurement. And not really “more difficult to code,” as most of the heavy lifting has already been done.
This relates to Stephen’s original question: existing text-generation tools (mine included) all rely on convenient-but-naive assumptions that limit their functionality. A genuinely better tool would probably want to be less naive.0 -
Kerning feature added.
Sample just typed as is, no scaling, no tracking.
Early 1900 ATF style.4 -
Nice work, Pablo.2
-
Really nice. If I could feed it my own word lists that would be awesome.0
-
Very cool, Pablo. Where can I try it?0
-
Instead of sorting by # of letters, I'm making a tool that enables sorting by the total sum of the advance-width of all the glyphs included in each word (this is different for each font).
God, I love this. Genius stuff.0 -
Stephen and all:
The first version of the Specimen-Helper tool is ready!
I will write a short tutorial on how to use it and record a short screen cast tomorrow.
For those of you who are brave enough to explore it without any indication, you can still do it at http://www.impallari.com/testing/specimen-helper.php (although I recommend to wait for the tutorial, since it may look confusing at first view)
You will also need this FontLab macro to grab some data for the advance widths and kerning values. It will generate 2 text files that you can copy and paste into the tool.
https://github.com/impallari/Impallari-Fontlab-Macros/blob/master/IMP Specimens/Get Data for Specimen.py4 -
Quick and dirty intro
https://www.youtube.com/watch?v=kXYcDNrA5uI&feature=youtu.be3 -
Es genial. Sos un groso, Impallari.1
-
Phrases gets a little bit more complicated, but markov chains could be a fun way of generating reasonable phrases from a base word list (a python library: https://pypi.python.org/pypi/PyMarkovChain/). You'd also potentially be able to choose what kind of phrases: tweets, Dickens, New York Times, etc.
I love that idea, I think OurTypemust be using markov text generators for their text samples.0 -
I tried this a few years ago - https://code.google.com/p/telaro/wiki/Home - but with limited success. Would love to see it done!Phrases gets a little bit more complicated, but markov chains could be a fun way of generating reasonable phrases from a base word list (a python library: https://pypi.python.org/pypi/PyMarkovChain/). You'd also potentially be able to choose what kind of phrases: tweets, Dickens, New York Times, etc.
I love that idea,0 -
I did a few php markov experiments a while ago, they worked acceptably... but ended up abandoning it because to train it well, your need a really huge huge huge seed corpus.0
-
Maybe for 'adhesion' the limited number of words means you could develop a comprehensive corpus?1
-
Not sure if it is helpful but I watched a talk by Darius Kazemi (tinysubversions) and he mentioned he used the Wordnik Api which returned bigrams as part of one of his twitter bots…0
-
Pablo’s tool is super handy, especially for its width-fitting capabilities, but I’m still waiting for something that includes all the features in my challenge. Litscape’s Contain Miminally is still the best way I’ve found to get a list of words that include (but are not limited to) a specific set of letters, and then sort that list by length.0
-
Frank Grießhammer just introduced me to Word Matcher, which can do the same thing, though with different source dictionaries and without the sorting functions.1
-
For testing your font online, this is a script by John Harrington to replace the font of any website by yours. Choose your favourite site and see how your typeface would look like there, https://github.com/misemefein/font-tester3
-
I need some python help to improve the macro that collects the data for the specimen tool:
https://github.com/impallari/Impallari-Fontlab-Macros/issues
Issue1: If the font has classes, it reports only the main pairs.
The script should "expand" the kerning table and report ALL the pairs.
This does affect the results of the specimen tool, so I consider it a important issue.
Issue2: It reports duplicated kerning pairs.
This is not a big problems since it don't affect the results of the specimen tool.
If anyone can help, thanks in advance.
https://github.com/impallari/Impallari-Fontlab-Macros/blob/master/IMP Specimens/Get Data for Specimen.py
In the meantime, to avoid issue 1:
Run the macro on a copy of your font having the kerning manually expanded.
0
Categories
- All Categories
- 43 Introductions
- 3.7K Typeface Design
- 806 Font Technology
- 1.1K Technique and Theory
- 622 Type Business
- 446 Type Design Critiques
- 543 Type Design Software
- 30 Punchcutting
- 137 Lettering and Calligraphy
- 84 Technique and Theory
- 53 Lettering Critiques
- 489 Typography
- 304 History of Typography
- 115 Education
- 70 Resources
- 500 Announcements
- 80 Events
- 105 Job Postings
- 149 Type Releases
- 165 Miscellaneous News
- 271 About TypeDrawers
- 53 TypeDrawers Announcements
- 117 Suggestions and Bug Reports