Type specimen word/phrase generator?

Stephen Coles
Stephen Coles Posts: 1,008
edited October 2023 in Technique and Theory
While working on my book, I needed words or phrases that contained the most distinctive glyphs from each typeface while meeting width and language requirements, along with other criteria. Finding these phrases can be fun, but it can also be much more difficult and time-consuming than one might think, particularly with unfamiliar languages. I got some help from Scrabble tools and a few friends (Frank Grießhammer, Tânia Raposo, Miguel Sousa, Laura Serra) and ended up with a pretty good list, but the experience made me wish for a tool to assist with future projects.

Of course, there are a few fine tools developed to help type designers test and proof their fonts. These include: Beyond font testing, these tools can also be useful for creating word lists for typeface showings and specimens. But because they are mostly intended for font development, they miss a few features that would aid in showing off a finished font. Here’s what I’d love to see in a single tool:
  • Select language(s)
  • Define key characters: ______ (case sensitive)
  • Allow characters other than key characters: yes/no
  • Allow proper nouns: yes/no
  • Allow words, phrases, or both
  • Sort results: alphabetically, by word length (# of letters), or by # of key characters contained
    e.g. Input: gaoesf
    Output: flagpoles flages poles foes lop …
I have a feeling I'm not the only one who would use this. Anyone who makes type specimens could benefit. Perhaps one of those tools mentioned above could be quickly amended to include these features. (I’m ashamed to say I don't have the coding knowledge to know how difficult that would be.) I’m happy to pitch in a few bucks if that will help. Maybe others would too.

Comments

  • Chris Lozos
    Chris Lozos Posts: 1,458
    Bravo!
  • Craig Eliason
    Craig Eliason Posts: 1,440
    I'd like to be able to input some select caps along with lowercase and have the output do initial caps on words that start with them. E.g.:

    input: HOEFTLhamburgefonstiv

    output: rotgut art ion is gotten Hue man in a rumbustious overstriving Toosie brig a Fearnought gent Fine Fibroneuroma Ha unifarious aures Of unsnib This Tour attaint reinsist unmoor bumf tubbier Hah romaines sag banns revisitation internists as us Out roe veneers Ferrivorous stir Tubae a venerative gemmating Tee Or gig atrematous Fanga Entomion Firethorn Fetes so mob at nan Hot anaerobiosis some raga Engrammes gonangia Fuse is...

    I don't think any of those tools do this yet (correct me if I'm wrong). Maybe stewf's "proper nouns" option would be getting at this already.
  • Matthew Butterick
    Matthew Butterick Posts: 143
    edited July 2014
    The code for my wordlist generator is available on GitHub, under the MIT license.

    FWIW, this would be an excellent project for anyone who wants to learn a little about programming. It’s simple and teaches useful essentials (input / processing / output). I wrote mine in a combination of Perl & PHP (seemed like a good idea at the time) but you could do it equally well in Python or Racket or whatever pleases you. The basic idea — start with giant list of words, filter them down to the ones you want — remains the same.
  • Phrases gets a little bit more complicated, but markov chains could be a fun way of generating reasonable phrases from a base word list (a python library: https://pypi.python.org/pypi/PyMarkovChain/). You'd also potentially be able to choose what kind of phrases: tweets, Dickens, New York Times, etc. :)
  • Dave Crossland
    Dave Crossland Posts: 1,431
    edited July 2014
    Tim's is also libre, in Python, GPLv2, at https://github.com/justanotherfoundry/text-generator

    There's also http://libretext.org/ which is PHP, Affero GPLv3, at https://github.com/garethsprice/libretext/

    And there's also http://www.impallari.com/testing/ which has a Tools section with test text generators; its PHP, MIT licensed, at https://github.com/impallari/font-testing-page/

    I'm working on http://www.testmyfont.com (probably 100% JS, Apache) which I hope will provide family/style management, test text generation, sample test texts, comparison tools, and a question/answer UI.
  • PabloImpallari
    PabloImpallari Posts: 806
    edited August 2014
    Playing a little bit more...
    Here each word advance width follows a Lucas progression with the next row.
    Things get more interesting :)

  • If you're merely summing advance widths, you are cheating, and your proof demonstrates the flaw in this approach.
  • PabloImpallari
    PabloImpallari Posts: 806
    edited August 2014
    Yes Matthew, we also need to take kerning into account. But it's a lot more difficult to code that feature.
    Already noted on my To-do list for next versions. In the meantime, we can simply pick words that are a few units larger or shorter, to compensate for kerning pairs.
    All in all, it's not bad for an initial version. It can already save lot of time as it is now.
    Will record a mini tutorial on how to use the tool and post shortly.
  • we can simply pick words that are a few units larger or shorter, to compensate for kerning pairs.
    Ah, but doing this accurately requires knowing the size of the kerning pairs, and if you know that … then you could’ve compensated for them at the outset. (And we still haven’t gotten to GPOS/GSUB adjustments, etc.)

    If you like your approach, keep it. Having gone down a similar road with my text generator, I came to feel that the advance-widths shortcut doesn’t solve a problem, but rather defers it.

    More recently I’ve been working with Pango and Harfbuzz (open-source text-layout engines) to get precise measurements of styled text. Ideally you would render the text with the same engine that you use to measure it. But short of that, it’s still a more accurate approach to text measurement. And not really “more difficult to code,” as most of the heavy lifting has already been done.

    This relates to Stephen’s original question: existing text-generation tools (mine included) all rely on convenient-but-naive assumptions that limit their functionality. A genuinely better tool would probably want to be less naive.
  • PabloImpallari
    PabloImpallari Posts: 806
    edited August 2014
    Kerning feature added.
    Sample just typed as is, no scaling, no tracking.
    Early 1900 ATF style.
  • James Puckett
    James Puckett Posts: 1,998
    Nice work, Pablo.
  • Really nice. If I could feed it my own word lists that would be awesome.
  • Stephen Coles
    Stephen Coles Posts: 1,008
    Very cool, Pablo. Where can I try it?
  • Max Phillips
    Max Phillips Posts: 474
    edited August 2014
    Instead of sorting by # of letters, I'm making a tool that enables sorting by the total sum of the advance-width of all the glyphs included in each word (this is different for each font).
    God, I love this. Genius stuff.
  • Stephen and all:
    The first version of the Specimen-Helper tool is ready!

    I will write a short tutorial on how to use it and record a short screen cast tomorrow.
    For those of you who are brave enough to explore it without any indication, you can still do it at http://www.impallari.com/testing/specimen-helper.php (although I recommend to wait for the tutorial, since it may look confusing at first view)

    You will also need this FontLab macro to grab some data for the advance widths and kerning values. It will generate 2 text files that you can copy and paste into the tool.
    https://github.com/impallari/Impallari-Fontlab-Macros/blob/master/IMP Specimens/Get Data for Specimen.py
  • Es genial. Sos un groso, Impallari.
  • Phrases gets a little bit more complicated, but markov chains could be a fun way of generating reasonable phrases from a base word list (a python library: https://pypi.python.org/pypi/PyMarkovChain/). You'd also potentially be able to choose what kind of phrases: tweets, Dickens, New York Times, etc. :)
    I love that idea, I think OurTypemust be using markov text generators for their text samples.
  • Phrases gets a little bit more complicated, but markov chains could be a fun way of generating reasonable phrases from a base word list (a python library: https://pypi.python.org/pypi/PyMarkovChain/). You'd also potentially be able to choose what kind of phrases: tweets, Dickens, New York Times, etc. :)
    I love that idea,
    I tried this a few years ago - https://code.google.com/p/telaro/wiki/Home - but with limited success. Would love to see it done! :)
  • I did a few php markov experiments a while ago, they worked acceptably... but ended up abandoning it because to train it well, your need a really huge huge huge seed corpus.
  • Maybe for 'adhesion' the limited number of words means you could develop a comprehensive corpus?
  • Not sure if it is helpful but I watched a talk by Darius Kazemi (tinysubversions) and he mentioned he used the Wordnik Api which returned bigrams as part of one of his twitter bots…
  • Stephen Coles
    Stephen Coles Posts: 1,008
    edited October 2015
    Pablo’s tool is super handy, especially for its width-fitting capabilities, but I’m still waiting for something that includes all the features in my challenge. Litscape’s Contain Miminally is still the best way I’ve found to get a list of words that include (but are not limited to) a specific set of letters, and then sort that list by length.
  • Stephen Coles
    Stephen Coles Posts: 1,008
    edited October 2015
    Frank Grießhammer just introduced me to Word Matcher, which can do the same thing, though with different source dictionaries and without the sorting functions.
  • For testing your font online, this is a script by John Harrington to replace the font of any website by yours. Choose your favourite site and see how your typeface would look like there, https://github.com/misemefein/font-tester
  • PabloImpallari
    PabloImpallari Posts: 806
    edited July 2016
    I need some python help to improve the macro that collects the data for the specimen tool:

    https://github.com/impallari/Impallari-Fontlab-Macros/issues

    Issue1: If the font has classes, it reports only the main pairs.
    The script should "expand" the kerning table and report ALL the pairs.
    This does affect the results of the specimen tool, so I consider it a important issue.

    Issue2: It reports duplicated kerning pairs.
    This is not a big problems since it don't affect the results of the specimen tool.

    If anyone can help, thanks in advance.
    https://github.com/impallari/Impallari-Fontlab-Macros/blob/master/IMP Specimens/Get Data for Specimen.py

    In the meantime, to avoid issue 1:
    Run the macro on a copy of your font having the kerning manually expanded.