Letter-fitting and spacing research

Nathan Willis · May 2021

Hi everyone,
I'm a PhD student at the University of Reading, and I've got an online research project running that I'd like to bring to the forum's attention, because getting data from trained & experienced type people is so valuable.

It's a survey web app, anonymous, and it asks you to look at some type specimens and mark anything that looks like a letter-fitting problem (using the highlight tool built-in to the specimen). This means "fitting problem" in the broadest possible sense: doesn't matter if the source of the trouble is the default side bearings, a kern, a ligature, or even something about the glyphs themselves. It's just an exercise in "what looks wrong in the final output", you might say.

My research is about letter-fitting algorithms, and this is essentially a way of putting an algorithm (or several) to the test by having people look at & respond to its results in a font with actual, readable text. So the more people, the stronger the analysis of what happens!

You can find it at http://Letter.fit ... it's live right now, but certainly let me know in a comment or DM if it appears to vanish. You DO have to be at least 18 years old to complete the survey; that's a legal requirement imposed on the University.

The app will show you up to five specimens. They are all independent of each other, though, so if something happens and you have to leave after fewer, that's totally fine and there's no harm done. It also asks some generic background questions, but it doesn't ask for any personal info and it doesn't store any or track you.

The intent is that people can look through five specimens at a comfortable, not-feeling-rushed pace and do the entire thing in under a half hour, while still feeling like they looked and saw what they wanted to. But obviously everyone is different; you can go as fast or as slow as you like.

One other thing to add: the site's been up for a while (in fact, you may have seen me blather on about it if you stumbled into one of my last two ATypI talks), but although data-collection has been good so far, I wanted to post something here since I'm really trying to ramp up the promotional machine, to get a wider audience of type- and design-interested people to take a crack at it. And the general public as well.

SO it would be immensely valuable if some people are kind enough to share the link to the site with other interested parties (as many as you can ... without invoke their ire, of course). Industry folk, aficionados, students (who are at least 18 years old), meet-ups or forums — if you can share the link in a way that's respectful of the rules/norms of the relevant community, I would greatly appreciate it.

(There are Twitter Card and OpenGraph bits sprinkled in, so hopefully sharing it online looks a bit nicer. If those seem broken, I hope somebody will give me a poke.)

Anyway, guess that's it. Hope you can give it a run-through!

Thanks,
Nate

Nathan Willis · May 2021

(Argh. Not that the above wasn't excessively long to begin with, but I thought of a couple of additions to put out there before I forget...)

Since this is a type-specific forum, I suppose somebody might be interested in a bit more detail as to what is happening. The text in the specimens is randomized and each font you see is randomized. The fonts may have their original letter fitting or they may have fitting that's been adjusted by an algorithm. There's a variety of typographic styles and design-space variation in the test pool, but they're all "body-text" fonts, and it is all shown at "text" sizes (for my experimental definition of those terms anyway). Texts are all in English, for now.

Side note, because of the randomization, it *is* possible to take the survey multiple times and not see repeats. However, I've been tacitly downplaying that for a couple of reasons, not telling people "please come back", etc.

First, although combinatorically you're extremely unlikely to encounter the same font+specimen+version combo twice, there is still a decent chance you'd see the same text block more than once, and overdoing that can be a problem (either via highway-hypnosis or subconsciously tempting you to look at the same letter pairs that stood out to you on the previous run, which is the opposite of the desired effect).

Second and more importantly, though, is fatigue. I know looking at specimens is fun (and I hope it is for you, too), but in practice if people go through many many iterations of the test app, the reliability of what they highlight will suffer.

All that to say that if you are interested in going through the survey more than once, that's great enthusiasm and we're happy to have your data. But don't binge on it; make that a "come back to it a different day" sort of thing.

Also, please do feel free to ask any questions here — but I can't/shouldn't/won't disclose info about the exact nature of the test material & algorithms, since that would spoil the cupcake....

Scott-Martin Kosofsky · May 2021

Nathan, I tried to be charitable and give it a shot, but all I can say to you is this:
Quousque tandem abutere, Nathana, patientia nostra?

Nick Shinn · May 2021

Why not just ask respondents to identify three options: (a) “metrics” kerning, (b) no kerning, or (3) “Optical Kerning”—and which is preferable?

Or just show three versions of the same paragraph, and ask which is preferable?

However, I thought the accepted standard for body text readability was speed, and perhaps comprehension, not taste in appearance.

Nathan Willis · May 2021

Nick Shinn said:

Why not just ask respondents to identify three options: (a) “metrics” kerning, (b) no kerning, or (3) “Optical Kerning”—and which is preferable?

Or just show three versions of the same paragraph, and ask which is preferable?

However, I thought the accepted standard for body text readability was speed, and perhaps comprehension, not taste in appearance.

On the first bit, the answer is that (a) and (3) seem to be the Adobe InDesign options, which are not what I'm testing; in fact it is not just kerning and the algorithms under scrutiny are (potentially, though not always) allowed to modify the default side bearings.

On the second, indeed there are quite a few possible ways that one *could* test letter fitting. The short answer is that a whole-paragraph, good-or-bad response just isn't granular enough for what I'm needing to do. That is, it's important to know not just that "something" looked wrong, but to know which glyphs people think looked wrong.

There are a lot of trade-offs is designing any such testing scheme, of course. (If you're really, really curious about the test-design aspect, I did actually present a talk on that topic for Type@Cooper's Lubalin lecture series last September. I could probably scare up a link to it, but it is a rather deep dive and might not be of such broad interest to get into in this thread.)

Thomas Phinney · May 2021

Scare up a link!

John Hudson · May 2021

I tried, and got about halfway through the second page before I gave up. I flagged what I thought were a few missing cap-to-lowercase kerning pairs in the first page, but as I went on I found myself considering the extent to which some of what I was looking at was so dependent on device resolution, rendering engine, pixel rounding, that I decided I had no way of knowing what I was actually responding to: font spacing, or some other factor?

Nick Shinn · May 2021

I started, but ran into an old problem in judging the work of others (which I find impossible)—namely, “That’s not how I’d do it, but if that’s the way this person chooses to, so be it.”

In other words, I don’t believe in right and wrong in the nuances of type design, certainly not for small amounts of letter fitting that have no measurable effect on immersive reading. If that’s how it comes out in toto, as the result of the type designer’s decisions, or the font software or layout engine designer’s, or the typographer’s or whatever, it is what it is.

What are you hoping to discover?

konrad ritter · May 2021

Mr. Kosovsky, I believe the vocative of 'Nathan' is 'Nathane.'

Quousque abutere, Nathane...!

Claudio Piccinini · May 2021

I quite agree with Nick — also as I tried I almost immediately gave up, as judging type onscreen is always deceiving for me: there are too much factors, and depending on the monitor/screen quality you could end up in a trap of "hyperprecision" or, on the contrary, see "wrong" things where it’s just a rendering problem.

Scott-Martin Kosofsky · May 2021

Quite right, Mr. Ritter. That was my instinct, though I was haunted by Cicero's "Catalina" for Lucius Sergius Catilina (who would have a been a good fit for today's Republican Party). Oh, the things one forgets after fifty years! "In the vocative, if the name ends in 'ius,' then change to 'ii'; all others change to 'e'." Are names ending in "a" an exception? Tell me if you know. Anyway, it's a good quotation for occasions such as these.

@John Hudson, I couldn't agree more.

Nick Shinn · May 2021

“Aldus” was none other than Aldo Manuzio.
My favorite typographical Latinization is however “Birminghamiae”.
But I’m disappointed Johannis couldn’t come up with anything for his surname, “Baskervillius” for instance?

Image: https://us.v-cdn.net/5019405/uploads/editor/7v/xtk6w9kl4617.jpg

Craig Eliason · May 2021

I thought we discouraged Latinization around here!

...But back to the topic. Speaking frankly I think it's a worrisome result of the experiment's setup that folks on this thread, who are at once some of the most patient type reviewers and possessors of the most valuable opinions you could consult on the topic, can't get through the test. I too gave up on it after two pages or so--it just takes too long. Is it too late to rescale the instrument?

Nathan Willis · May 2021

Thomas Phinney said:

Scare up a link!

Scared up! Here: http://coopertype.org/event/thinking_about_talking_about_spacing#vimeo

Like I alluded to earlier, this is a rather lengthy dive into the test-design process itself. So it might be informative for people in this thread who have concerns that I may not have thought through the impacts of intervening variables. Accounting for these effects across the full pool of respondents is part of the testing process.

Nathan Willis · May 2021

John Hudson said:

I tried, and got about halfway through the second page before I gave up. I flagged what I thought were a few missing cap-to-lowercase kerning pairs in the first page, but as I went on I found myself considering the extent to which some of what I was looking at was so dependent on device resolution, rendering engine, pixel rounding, that I decided I had no way of knowing what I was actually responding to: font spacing, or some other factor?

Certainly it's true that there can be a lot of factors differing between your encounter of a particular specimen/font combination and the next person's encounter with that same combination. But the fact that those variables exist is not a reason to do no testing; it just means that the test design and the analysis have to take them into account — and that, when looking at the results, I as the researcher have to be conscious of how they restrict what is and isn't possible to conclude about the data.

The technical factors you mention, fortunately, are the sort of thing that the web app itself can help record. More importantly, the fact that there are lots of versions of different browsers and OSes in the wild is part of what makes it vital to get large numbers of eyes and as much variety in the respondents as possible. Across all those permutations, the frequency of statistical skewing due to each individual factor averages out, closer and closer to whatever the total population of people is.

But, as for being a confounding effect, the differences between screens and browsers are like the differences between age groups, typographic training, language fluency, and the other human factors that the app asks about. Noting the browser-window size and browser/OS version lets us, during analysis, actually look and see whether or not people on different platforms marks different things as looking wrong — and, more importantly, to see what specifically they mark differently. When that occurs, having the data lets us delve deeper and look for what be the causes.

Fortunately, there is a lot MORE information about the actual changes between browser releases and OS releases than there is about (e.g.) what people will self-report as being their typographic experience level. So we can, for example, look and see "oh, starting with this:____ version of Chrome, people using Windows start marking significantly more straight-profile–to–straight-profile letters as being too close than do people on Macs", then try and determine if there was a change in the rendering or layout coinciding with that release. That's actually a better situation to be in than when comparing "do people who say that they're a type designer make the same marks as people who say they are not a type designer" and trying to conclude something significant about type designers as a population. But, of course, everyone wants to know if type designers and non–type-designers agree or disagree, so the app asks.

As for the specificity of the conclusions that can be drawn, in the context of all the variables, I suppose I would again say that I do bring that up in the Lubalin video, but in brief: it's never possible to say "data says this is correct fitting", but it is possible to say "when we show a whoooole bunch of people this sample and that sample, they either do or don't respond the same way." To a certain extent at present, because the thesis isn't done, this part boils down to me saying you either trust me (and my supervisors and assessors) to know what we're testing or you don't. Or, on the flip side, you could also wait and if you see me making grandiose, unsupportable claims later down the road, leap out and call me on it.

Last but not least, if I was to play devil's advocate for a brief minute ... whether or not any one individual likes or dislikes what they see when they're shopping for a commercial font is also dependent on the exact setup of the device they look at the sample on (as well as on their training, the physical environment, and so on). Consequently, type designers and foundries do lots of testing, over as many combinations as they can afford to. The same is true here.

Scott-Martin Kosofsky · May 2021

Nathan, you write:

As for the specificity of the conclusions that can be drawn, in the context of all the variables, I suppose I would again say that I do bring that up in the Lubalin video, but in brief: it's never possible to say "data says this is correct fitting", but it is possible to say "when we show a whoooole bunch of people this sample and that sample, they either do or don't respond the same way." To a certain extent at present, because the thesis isn't done, this part boils down to me saying you either trust me (and my supervisors and assessors) to know what we're testing or you don't. Or, on the flip side, you could also wait and if you see me making grandiose, unsupportable claims later down the road, leap out and call me on it.

What you’re left with is a one-liner: “Some people observe this and other people observe that, though it must be understood that their evaluations were in no way uniform or equivalent, given that their viewing hardware and software varied greatly.” I don’t see how any conclusion drawn from such a data set could be useful. I see that one of your dissertation supervisors has written widely about the effect of design on the presentation of uncertain information, such as people’s perception of climactic issues, but type design is not at all like that. Type design begins with a point of view, the desire to make something specific. Otherwise, why bother?

Nathan Willis · May 2021

Scott-Martin Kosofsky said:

What you’re left with is a one-liner: “Some people observe this and other people observe that, though it must be understood that their evaluations were in no way uniform or equivalent, given that their viewing hardware and software varied greatly.”

Sorry, but that's not how statistical modeling works. You make observations, then you analyze them, look for patterns, test them, and only then do you make statements about how the data behaves and the variables are or are not related.

Trying to decipher the remainder of this comment, but it's quite muddy; it sounds like you're fearful of some theoretical conclusion someone might state, based on as-yet unknown data. But it's not really possible to reply to that without any specifics. If you want to have a substantive discussion, then that would need to start at the beginning; with what, exactly, you presume the data is going to show.

Nathan Willis · May 2021

Nick Shinn said:

I started, but ran into an old problem in judging the work of others (which I find impossible)—namely, “That’s not how I’d do it, but if that’s the way this person chooses to, so be it.”

In other words, I don’t believe in right and wrong in the nuances of type design, certainly not for small amounts of letter fitting that have no measurable effect on immersive reading. If that’s how it comes out in toto, as the result of the type designer’s decisions, or the font software or layout engine designer’s, or the typographer’s or whatever, it is what it is.

It sort of sounds like you might be trying to take on a bigger question than the task itself asks you to. By which I mean, this app is not "give feedback to a type designer about their fitting".

This is strictly a question of when you, as an individual, look at Sample X and at Sample X', where the only thing that is different between them is the fitting stored in the font, what do you notice? Since you don't get told (for the obvious, clinical–trial-y reasons), which is X and which is X', the question phrased as "what looks incorrect". That's it.

And yes, I agree on the binaryness point; this survey is also not getting into the matter of "this fitting is right and this fitting is wrong".

Nick Shinn · May 2021

It sort of sounds like you might be trying to take on a bigger question than the task itself asks you to.

As I said, you asked for something I found impossible, not too small. As a type designer, I assumed that I was looking for bad kern pairs, and I would have had to examine the font as a whole to determine what those were, because kerning is supposed to take into account the totality of possible glyph combinations. And even then, it’s the type designer’s aesthetic—who am I to say they are incorrect? If you’d asked me to manually kern a paragraph of text à la Shinn, I might have done that.

If I’m going to use an algorithm to help space a font, I’d like to be able to adjust its parameters as I adjust glyph shapes, as an interactive design process, to optimize the type design—glyphs and spacing as a dynamic, integral, designed system. That’s how I use class kerning.

Any kerning algorithms that are subsequently applied to my finished work, I detest on principle. Like horizontal scaling, faux bold and faux italics, in that respect. I suppose this is rather old-fashioned of me, in our post-modern world, but I will cling to my authorship.

Scott-Martin Kosofsky · May 2021

No, no, Nathan. My criticism has all to do with your method and nothing to do with being “fearful” (an odd choice of word) of your conclusions. I shall explain:

• It seems to me that the vast number of viewing variables of viewing hardware and software, as described above by John Hudson, makes it impossible to obtain a meaningful sampling.

• Who will benefit from this research? You never say. Because you assert that it “doesn't matter if the source of the trouble is the default side bearings, a kern, a ligature, or even something about the glyphs themselves,” I wonder whether this study has anything to do with type design, per se. In the online survey, you divide the viewers into two groups: type designers and not type designers. “Not type designers” is a lot of people, and you might, at least, ask them whether or not the Latin alphabet is their native script, or whether or not they are engaged in typography as a professional or profession-related activity. That might add a kernel of interest.

• Given these open-ended parameters, it appears that your work may have more to do with the field of Cognitive Psychology than type design. Every now and then, researchers from that field have dabbled in issues involving type design, such as the group from Australia who, about five years ago, claimed to have developed a font that improved memory and alleviated perceptual difficulties amongst people with dyslexia. It was much discussed on this board, where it didn’t get much love—rightly so, in my opinion.

John Hudson · May 2021

Hi Nate. Thinking about this some more, I think the main problem with your test is that it is simply too arduous. There is too much text to look at. Presenting text in long pararaph blocks makes the task of identifying pair spacing problems difficult (there are good reasons why font makers do kerning while looking at discrete strings and individual words, not long blocks of text). The test was hard work, and no one was paying me, so I gave up.

Letter-fitting and spacing research

Comments

Categories