Type specimen word/phrase generator?

Stephen Coles · July 2014

While working on my book, I needed words or phrases that contained the most distinctive glyphs from each typeface while meeting width and language requirements, along with other criteria. Finding these phrases can be fun, but it can also be much more difficult and time-consuming than one might think, particularly with unfamiliar languages. I got some help from Scrabble tools and a few friends (Frank Grießhammer, Tânia Raposo, Miguel Sousa, Laura Serra) and ended up with a pretty good list, but the experience made me wish for a tool to assist with future projects.

Of course, there are a few fine tools developed to help type designers test and proof their fonts. These include:

adhesiontext by Miguel Sousa
Just Another Test Text Generator by Tim Ahrens
Wordlist Maker by Matthew Butterick
word-o-mat (RoboFont extension) by Nina Stössinger
Typable by Ondrej Jób

Beyond font testing, these tools can also be useful for creating word lists for typeface showings and specimens. But because they are mostly intended for font development, they miss a few features that would aid in showing off a finished font. Here’s what I’d love to see in a single tool:

Select language(s)
Define key characters: ______ (case sensitive)
Allow characters other than key characters: yes/no
Allow proper nouns: yes/no
Allow words, phrases, or both
Sort results: alphabetically, by word length (# of letters), or by # of key characters contained
e.g. Input: gaoesf
Output: flagpoles flages poles foes lop …

I have a feeling I'm not the only one who would use this. Anyone who makes type specimens could benefit. Perhaps one of those tools mentioned above could be quickly amended to include these features. (I’m ashamed to say I don't have the coding knowledge to know how difficult that would be.) I’m happy to pitch in a few bucks if that will help. Maybe others would too.

Chris Lozos · July 2014

Bravo!

Craig Eliason · July 2014

I'd like to be able to input some select caps along with lowercase and have the output do initial caps on words that start with them. E.g.:

input: HOEFTLhamburgefonstiv

output: rotgut art ion is gotten Hue man in a rumbustious overstriving Toosie brig a Fearnought gent Fine Fibroneuroma Ha unifarious aures Of unsnib This Tour attaint reinsist unmoor bumf tubbier Hah romaines sag banns revisitation internists as us Out roe veneers Ferrivorous stir Tubae a venerative gemmating Tee Or gig atrematous Fanga Entomion Firethorn Fetes so mob at nan Hot anaerobiosis some raga Engrammes gonangia Fuse is...

I don't think any of those tools do this yet (correct me if I'm wrong). Maybe stewf's "proper nouns" option would be getting at this already.

Matthew Butterick · July 2014

The code for my wordlist generator is available on GitHub, under the MIT license.

FWIW, this would be an excellent project for anyone who wants to learn a little about programming. It’s simple and teaches useful essentials (input / processing / output). I wrote mine in a combination of Perl & PHP (seemed like a good idea at the time) but you could do it equally well in Python or Racket or whatever pleases you. The basic idea — start with giant list of words, filter them down to the ones you want — remains the same.

Jack Jennings · July 2014

Phrases gets a little bit more complicated, but markov chains could be a fun way of generating reasonable phrases from a base word list (a python library: https://pypi.python.org/pypi/PyMarkovChain/). You'd also potentially be able to choose what kind of phrases: tweets, Dickens, New York Times, etc.

Dave Crossland · July 2014

Tim's is also libre, in Python, GPLv2, at https://github.com/justanotherfoundry/text-generator

There's also http://libretext.org/ which is PHP, Affero GPLv3, at https://github.com/garethsprice/libretext/

And there's also http://www.impallari.com/testing/ which has a Tools section with test text generators; its PHP, MIT licensed, at https://github.com/impallari/font-testing-page/

I'm working on http://www.testmyfont.com (probably 100% JS, Apache) which I hope will provide family/style management, test text generation, sample test texts, comparison tools, and a question/answer UI.

PabloImpallari · August 2014

Hey Stephen, your post got me thinking, so I started playing with some code.

Instead of sorting by # of letters, I'm making a tool that enables sorting by the total sum of the advance-width of all the glyphs included in each word (this is different for each font).

Attached two examples where all the words have the same advance width total sum, but the glyph count increases +1 in each row.

In the Encode sample, all the words have an total Advance Width sum of 4189 units.
In the Libre Caslon Display sample, all have 2675.

PabloImpallari · August 2014

Playing a little bit more...
Here each word advance width follows a Lucas progression with the next row.
Things get more interesting

PabloImpallari · August 2014

Ok, last one...
Claus's Playfair Display, all words have the same advance width across the family.
They are all set at 72pt.
No scaling, no tracking, no nothing... jut typed as they are, Old-School style!
Making this last one was so fast and so easy, that feels like cheating...

Matthew Butterick · August 2014

If you're merely summing advance widths, you are cheating, and your proof demonstrates the flaw in this approach.

PabloImpallari · August 2014

Yes Matthew, we also need to take kerning into account. But it's a lot more difficult to code that feature.
Already noted on my To-do list for next versions. In the meantime, we can simply pick words that are a few units larger or shorter, to compensate for kerning pairs.
All in all, it's not bad for an initial version. It can already save lot of time as it is now.
Will record a mini tutorial on how to use the tool and post shortly.

Matthew Butterick · August 2014

we can simply pick words that are a few units larger or shorter, to compensate for kerning pairs.

Ah, but doing this accurately requires knowing the size of the kerning pairs, and if you know that … then you could’ve compensated for them at the outset. (And we still haven’t gotten to GPOS/GSUB adjustments, etc.)

If you like your approach, keep it. Having gone down a similar road with my text generator, I came to feel that the advance-widths shortcut doesn’t solve a problem, but rather defers it.

More recently I’ve been working with Pango and Harfbuzz (open-source text-layout engines) to get precise measurements of styled text. Ideally you would render the text with the same engine that you use to measure it. But short of that, it’s still a more accurate approach to text measurement. And not really “more difficult to code,” as most of the heavy lifting has already been done.

This relates to Stephen’s original question: existing text-generation tools (mine included) all rely on convenient-but-naive assumptions that limit their functionality. A genuinely better tool would probably want to be less naive.

PabloImpallari · August 2014

Kerning feature added.
Sample just typed as is, no scaling, no tracking.
Early 1900 ATF style.

James Puckett · August 2014

Nice work, Pablo.

Thomas Phinney · August 2014

Really nice. If I could feed it my own word lists that would be awesome.

Stephen Coles · August 2014

Very cool, Pablo. Where can I try it?

Max Phillips · August 2014

Instead of sorting by # of letters, I'm making a tool that enables sorting by the total sum of the advance-width of all the glyphs included in each word (this is different for each font).

God, I love this. Genius stuff.

PabloImpallari · September 2014

Stephen and all:
The first version of the Specimen-Helper tool is ready!

I will write a short tutorial on how to use it and record a short screen cast tomorrow.
For those of you who are brave enough to explore it without any indication, you can still do it at http://www.impallari.com/testing/specimen-helper.php (although I recommend to wait for the tutorial, since it may look confusing at first view)

You will also need this FontLab macro to grab some data for the advance widths and kerning values. It will generate 2 text files that you can copy and paste into the tool.
https://github.com/impallari/Impallari-Fontlab-Macros/blob/master/IMP Specimens/Get Data for Specimen.py

PabloImpallari · September 2014

Quick and dirty intro

https://www.youtube.com/watch?v=kXYcDNrA5uI&feature=youtu.be

Andres Torresi · November 2014

Es genial. Sos un groso, Impallari.

WH Typefaces · November 2014

Phrases gets a little bit more complicated, but markov chains could be a fun way of generating reasonable phrases from a base word list (a python library: https://pypi.python.org/pypi/PyMarkovChain/). You'd also potentially be able to choose what kind of phrases: tweets, Dickens, New York Times, etc.

I love that idea, I think OurTypemust be using markov text generators for their text samples.

Dave Crossland · November 2014

Phrases gets a little bit more complicated, but markov chains could be a fun way of generating reasonable phrases from a base word list (a python library: https://pypi.python.org/pypi/PyMarkovChain/). You'd also potentially be able to choose what kind of phrases: tweets, Dickens, New York Times, etc.
I love that idea,

I tried this a few years ago - https://code.google.com/p/telaro/wiki/Home - but with limited success. Would love to see it done!

PabloImpallari · November 2014

I did a few php markov experiments a while ago, they worked acceptably... but ended up abandoning it because to train it well, your need a really huge huge huge seed corpus.

Dave Crossland · November 2014

Maybe for 'adhesion' the limited number of words means you could develop a comprehensive corpus?

WH Typefaces · November 2014

Not sure if it is helpful but I watched a talk by Darius Kazemi (tinysubversions) and he mentioned he used the Wordnik Api which returned bigrams as part of one of his twitter bots…

Stephen Coles · October 2015

Pablo’s tool is super handy, especially for its width-fitting capabilities, but I’m still waiting for something that includes all the features in my challenge. Litscape’s Contain Miminally is still the best way I’ve found to get a list of words that include (but are not limited to) a specific set of letters, and then sort that list by length.

Stephen Coles · October 2015

Frank Grießhammer just introduced me to Word Matcher, which can do the same thing, though with different source dictionaries and without the sorting functions.

María Ramos · November 2015

For testing your font online, this is a script by John Harrington to replace the font of any website by yours. Choose your favourite site and see how your typeface would look like there, https://github.com/misemefein/font-tester

PabloImpallari · July 2016

I need some python help to improve the macro that collects the data for the specimen tool:

https://github.com/impallari/Impallari-Fontlab-Macros/issues

Issue1: If the font has classes, it reports only the main pairs.
The script should "expand" the kerning table and report ALL the pairs.
This does affect the results of the specimen tool, so I consider it a important issue.

Issue2: It reports duplicated kerning pairs.
This is not a big problems since it don't affect the results of the specimen tool.

If anyone can help, thanks in advance.
https://github.com/impallari/Impallari-Fontlab-Macros/blob/master/IMP Specimens/Get Data for Specimen.py

In the meantime, to avoid issue 1:
Run the macro on a copy of your font having the kerning manually expanded.

Type specimen word/phrase generator?

Comments

Categories