JL's magic list of GREP styles for body text!

Joshua Langman · August 2024

Greetings, TypeDrawers community!

Given that I make very extensive use of InDesign's GREP styles for achieving typographic sophistication and finesse in running body copy, and given that I'm currently in the midst of a project for which I am doing a great deal of GREP engineering, I thought I'd share some of my commonly used GREP expressions with the group.

Each expression below is a GREP query that can be incorporated into a paragraph style in InDesign, so that it applies to all body text in a document. (In the paragraph style options window, go to the tab called "GREP Styles.") These are all expressions that I habitually or commonly apply to all body text in, say, a book or journal. Not all of them will be relevant for all kinds of text, but there's nothing out of the ordinary here. These are the basics that I use to get body text to behave the way I like.

I'm sharing these because they might be useful to other typographers, but also because type designers might be interested in seeing the kinds of things I expect a typeface for body text to do. Perhaps some of my GREP styles will suggest ways that type designers could incorporate some of these features into their fonts themselves.

Here's the list:

Expression:

-~_

Apply: Some percentage of horizontal scaling, e.g., 80%

Result: Shorten em dashes to the percentage of width specified; useful for fonts in which the em dash occupies the full em square, which often looks too wide in text.

Expression:

(?<=\.)~S(?=\.)

Apply: 50% horizontal scaling

Result: Condense nonbreaking spaces within ellipses to your preferred width; effectively creates, e.g., an 8/em nonbreaking space; specify your own dot spacing in ellipses to your taste.

Expression:

\u\u+

Apply: OpenType “all small caps” case

Result: Make acronyms like NASA automatically set in small caps.

Expression:

\u+(?=\u)

Apply: some amount of letterspacing (tracking)

Result: Apply letterspacing to all terms matched by the expression above, except for the last letter (to avoid extra space after the term).

Expression:

(?<=\u)\u(?=[.,])

Apply: same amount of letterspacing as above

Result: Add letterspacing to the last letter in small-caps acronyms, only if followed by a period or comma (to avoid the punctuation seeming too tight against the small caps).

[Note: more sophisticated styles for automating the application of small caps and letterspacing are included below.]

Expression:

(?<!\u)(USSR|USA|UAE|US|UK|AL|AK|AZ|AR|CA|CO|CT|DE|DC|FL|GA|HI|ID|IL|IN|IA|KS|KY|LA|ME|MD|MA|MI|MN|MS|MO|MT|NE|NV|NH|NJ|NM|NY|NC|ND|OH|OK|OR|PA|PR|RI|SC|SD|TN|TX|UT|VT|VA|WA|WV|WI|WY)(?!\u)

Apply: full caps, no letterspacing

Result: Override the auto small caps for postal abbreviations and other geographic abbreviations, like USA, which should remain in full caps.

Expression:

[()[\]]

Apply: roman (not italic)

Result: Force all delimiters () [] {} to remain upright (not italic) in all contexts. Traditional in classical typography.

[Note: the following three queries could be combined, but are separated because often the amount of space desired before punctuation will differ depending on the mark.]

Expression:

.(?=[:;])

Apply: a small amount of letterspacing (tracking)

Result: Add a little extra space before colons and semicolons. Traditional in classical typography.

Expression:

.(?=\?)

Apply: a small amount of letterspacing (tracking)

Result: Add a little extra space before question marks. Traditional in classical typography.

Expression:

.(?=!)

Apply: a small amount of letterspacing (tracking)

Result: Add a little extra space before exclamation points. Traditional in classical typography.

[Note: the following two queries should be used together.]

Expression:

[([{]

Apply: a small amount of letterspacing (tracking)

Result: Add a little extra space after opening delimiters ( [ {.

Expression:

.(?=[)\]}])

Apply: same amount of letterspacing as above

Result: Add a little extra space before closing delimiters ) ] }.

Expression:

~}(?=\d)

Apply: a little bit of letterspacing

Result: Adds a bit of spacing between a closing quotation mark and a superscript note marker immediately following.

Expression:

\x{0026}

Apply: italic

Result: Italicizes all ampersands, regardless of context (per Bringhurst and others).

Expression:

~_(?=~_)

Apply: a little bit of negative tracking (depending on font)

Result: In fonts where em dashes do not connect seamlessly, this removes the space between consecutive em dashes. Useful for bibliographies.

[Note: the following two queries should be used together. They make use of an ingenious solution for preventing unwanted hyphenation that I found on an Adobe discussion board.]

Expression:

\w\w+(?=-)

Apply: InDesign’s “no language” option in character settings

Result: Forces already hyphenated words like “self-esteem” to break only at the existing hyphen and nowhere else.

Expression:

(?<=-)\w\w+

Apply: InDesign’s “no language” option in character settings

Result: Forces already hyphenated words like “self-esteem” to break only at the existing hyphen and nowhere else.

Expression:

(?<=\d\d\d)\d(?=\l[,;)\]])

Apply: a little bit of letterspacing

Result: A smidgen of extra space before the letters in bibliographic citations like “1996a” and “2024b.”

Expression:

(?<=\d\d\d\d)\l(?=[,;)\]])

Apply: italic

Result: Automatically italicize the letters in expressions such as those above: 1996a.

Expression:

(page|pages|p\.|pp\.|figure|figures|fig\.|figs\.|table|tables|chapter|chapters)\K\x{20}(?=\d)

Apply: InDesign’s “no break” option in character settings

Result: Disallows line breaks in the middle of expressions like “chapter 4,” “pp. 21–22,” “fig, 8,” etc.

Expression:

~2

Apply: a small amount of negative baseline shift

Result: Shifts the copyright symbol down to be vertically centered on the height of oldstyle figures.

Expression:

~m~m~m~m

Apply: some decorative “strikethough” style, e.g., a double rule

Result: A handy way to make ornamental section-break rules; just type 4 em quads to make the rule. Adjust as needed for length.

Expression:

(?<!\w)[ABCDF][+~=](?!\w)

Apply: OpenType “full caps” case (case-sensitive forms)

Result: Apply “full caps” to plus and minus signs in grades like A+, B–, to center the signs vertically on the capitals.

[The following two expressions should be used together.]

Expression:

\b\u+-?\d+\b

Apply: OpenType “all small caps”

Result: Apply small caps to the letters in terms like 3-D, AK-47, 221B, A1, A-1.

Expression:

\b\d+-?\u+\b

Apply: OpenType “all small caps”

Result: Apply small caps to the letters in terms like 3-D, AK-47, 221B, A1, A-1.

Expression:

\<[-\u\x{20}'!?,:;][-\u\x{20}'!?,:;]+\>

Apply: OpenType “all small caps”

Result: Apply small caps to extended phrases or sentences in all caps, as in characters shouting. Includes internal punctuation within such phrases.

Expression:

\<[-\u\x{20}'!?,:;][-\u\x{20}'!?,:;]+(?=[-\u\x{20}'!?,:;]\>)

Apply: some amount of letterspacing (tracking)

Result: Applies letterspacing to all sequences matched by the above query, except for the last letter (to prevent unwanted space from appearing after the phrase).

Expression:

\d

Apply: “normal” case and OpenType proportional oldstyle figures

Result: Forces all digits to appear as oldstyle figures. Useful when the font includes special “small cap figures” and you wish to use OSF instead, even in small-cap contexts. Do not include in styles for full-caps headings.

Expression:

(?<=\[).+?(?=\])

Apply: italic

Result: Italicizes phrases within brackets, as for stage directions in scripts and interview transcripts.

That's all for now; I may return to add more at some point. I'll leave the community with a couple of questions:

1. Do other typographers use GREP styles similarly? Does anyone have any particularly useful or clever expressions that I've overlooked? If so, please share.

2. Is anyone else interested in seeing some of the results that I achieve with GREP baked into more fonts via stylistic sets or other OpenType features?

Many thanks,

Josh

Joshua Langman · August 2024

A few more notes about GREP styles:

First, I should mention that I've received a great deal of assistance from Peter Kahrel's book GREP in InDesign, which is highly recommended for anyone starting out with GREP styles. I still consider myself a beginner, but I've picked up the basics quickly thanks to Kahrel. He spends only one chapter discussing GREP styles specifically, but most of the content is relevant to them nonetheless.

I should also explain, if it's not obvious, why the expressions for adjusting space between characters use letterspacing (tracking) rather than kerning. The answer is simply that kerning cannot be recorded into a character style in InDesign, so this is the only option.

Also, a note for anyone starting out: the order of GREP styles matters, especially when multiple expressions target the same or similar runs of text. Later queries override previous queries if they contain mutually exclusive instructions. Otherwise, their instructions simply pile on.

Oh, and here's a question for the community: I am not well-versed in CSS or web typography. Is there a way to achieve GREP styles on the web? I am working with a web developer on a project and would like to ask him to incorporate GREP styles or their equivalent into the CSS code. Some brief searching suggests this may be possible, but it does not appear to be common. Has anyone tried this?

LeMo aka PatternMan aka Frank E Blokland · August 2024

Hi Joshua, thanks for posting the regular expressions! I reckon quite a few members of this forum use GREP: it is awesome after all. I use it a lot in scripts for font production here, because it fits well with modifying the various text files of the IKARUS-based file formats, such as for finding certain numbers to apply calculations. However, I also use it in (scripts for) InDesign. For more complex GREP to recognize certain structures in, for example, metrics and feature files, I consult ChatGPT from time to time. Overall, Chattie has a tendency to be a bit verbose at times, but suggesting some simpler patterns often helps to keep things more compact. FYI, on this page, which I have set up to discuss systematic structures -including the application of GREP– in typography with my students, you will find a link to a relatively simple AppleScript.

In my GREP examples I sometimes use POSIX and, as you probably know, there is an interesting difference between ‘\u\u+’ and ‘[[:upper:]]+’, because the latter also captures a single uppercase letter. For handling acronyms, one could argue that your expression is the better one, but by using word boundaries, the outcome is essentially the same. However, the effect of ‘\u\u+’ and ‘ \b[[:upper:]]+\b’ on, for example, ‘ABc’ will be different, of course. That is also the kind of stuff I discuss with my students. And then they can suggest ‘\b\u\u+\b’ or ‘\b\u+\b’ as an alternative. That said, the relevant POSIX notation generally works with locales, meaning it can match uppercase characters in different languages and character sets. Additionally, in some engines, ‘\u’ matches only ASCII uppercase letters (A–Z).

Since I reckon there are better experts on this forum in the field of GREP and related scripting than I am, there is most probably some information that can be added here.

LeMo aka PatternMan aka Frank E Blokland · August 2024

This might be a bit off topic and I certainly do not mean to hijack Joshua’s interesting thread on the application of GREP in typography. My apologies if it feels that way. However, in my previous post I mentioned my use of GREP in scripts for adjusting metrics and features files. I am not sure if anyone is interested, but for the sake of completeness (and for what it is worth), I will post an example here.

To add some mark-to-base support in the DTL fonts I created a workflow that is driven by an AppleScript. The script extracts information for the automatic calculation of the positions of the diacritics from our proprietary metrics and also AFM files. Some GREP is applied for this. Our file system does not directly support composites (these are created during TTF generation) and mark positioning. That is why I set up this workflow.

Checking the positioning of the diacritics is done in OTM, which, as probably most of you know, has HarfBuzz under the hood. Then the mark-to-base features are invoked during (batch) font generation. It is, of course, not rocket science and nothing special compared to what the programmers of the DTL and any other font tools do. But it is fun to make and nice to control –so partly thanks to GREP.

Joshua Langman · August 2024

Hi all,

Frank, no need to apologize — I appreciate your contributions and insights!

I've been continuing to tinker with GREP styles for my current project. I don't have time to share all of what I'm working on right now, but I'm especially pleased with these expressions I just wrote:

(\p{Ps}|\p{Pi})(\w|[,.:;'!?/&-]|\x{20}){1,5}

(\w|[,.:;'!?/&-]|\x{20}){1,5}(\p{Pe}|\p{Pf})

I apply "no break" to the matched text. What this does is solve the problem of opening a parenthesis or quotation just before the end of a line or closing one just after the start of a line. I've always found it disconcerting to have a line that ends, for instance, like this:

He said, "I

or that starts like this:

is.)

These GREP styles avoid this by not allowing a break within the first or last five characters inside of a parenthesis or quotation.

More to come when I have more time. Just wanted to share this one.

Josh

pereelmagne · August 2024

Many thanks for your contributions. I would like to add just a comment about the ’no-language’ parameter in InDesign. When we apply this parameter, there can be typographical consequences:

When writing/pasting (into a segment in no-language mode), typographer’s quotes are not applied.
In URLs and email accounts, end-of-line partitions occur after full stops, slashes, hyphens and @.
Kerning can go to default.
Parameters about ligatures may become inactive.
Parameters about small caps may become inactive.

For all these reasons, the no-language parameter is ideal in the following types of text:

code (any programming language)
technical and scientific notations
URLs and email accounts
phonetic transcriptions
some initialisms and acronyms

Joshua Langman · September 2024

Hello, comrades!

I've been doing a lot of GREP engineering for my current project, and I wanted to share my current list of GREP styles that I'm including in the basic body text style. This is for scholarly publishing. I've copied the following from the specification document I'm currently writing. The numbers under "apply" below refer to style sheets, but in most cases it should be obvious what styling is being applied.

Some of these expressions improve on ones I listed above; others are new. I remain interested in any feedback from the community. Do other folks use GREP in this way? Any other recommended expressions for body text? Other thoughts?

Josh

match

(?<=\.)~S(?=\.)

apply 941

result Condense nonbreaking spaces within ellipses to 50% width.

match

\u\u+

apply 911

result Set two or more consecutive capitals as small caps.

match

\u+(?=\u)

apply 923

result Add 50 units of letterspacing between consecutive small caps.

match

(?<=\u)\u(?=[.,\l])

apply 923

result Add 50 units of letterspacing between a run of small caps and an immediately following period, comma, or lowercase letter.

match

\l(?=\u\u)

apply 923

result Add 50 units of letterspacing between a lowercase letter and an immediately following run of small caps.

match

\b\u+-?\d+-?\u*\b

apply 911

result Set capital letters in mixed alphanumeric expressions, with or without hyphens, as small caps. (Examples: B-52, AK-47, A1, 221B)

match

\b\d+-?\u+-?\d*\b

apply 911

result Same as above (complementary expression).

match

\<[-\u\x{20}',][-\u\x{20}',]+\>

apply 911

result Set full-capital expressions containing spaces or punctuation in small caps.

match

\<[-\u\x{20}',][-\u\x{20}',]+(?=[-\u\x{20}',]\>)

apply 923

result Add 50 units of letterspacing to full-capital expressions containing spaces or punctuation set as small caps.

match

.(?=[:;?!])

apply 921

result Add 30 units of letterspacing before colons, semicolons, question marks, and exclamation points.

match

:(?=\d)

apply 921

result Add 30 units of letterspacing between a colon and an immediately following digit (as in journal issue numbers).

match

[(){}[\]]

apply 901

result Set parentheses, brackets, and braces invariably in roman, regardless of their context.

match

[([{]

apply 923

result Add 50 units of letterspacing after opening delimiters.

match

.(?=[)\]}])

apply 923

result Add 50 units of letterspacing before closing delimiters.

match

\x{0026}

apply 942

result Set ampersands invariably in italic and in “normal” case (not small caps), regardless of their context.

match

~_(?=~_)

apply 925

result Subtract 10 units of letterspacing between consecutive em dashes, to form an unbroken rule.

match

(?<=\d\d\d)\d(?=[abcdefgh][.,;)\]])

apply 922

result Add 40 units of letterspacing between a four-digit year and an immediately following lowercase letter (as in citations).

match

(?<=\d\d\d\d)[abcdefgh](?=[.,;)\]])

apply 902

result Italicize a lowercase letter immediately following a four-digit year.

match

(?<=\<\d\d\d\d[abcdefgh])[.,]

apply 902

result Italicize a period or comma after a four-digit year followed by a lowercase letter.

match

\<\d+\Kn(?=\d+\>)

apply 902

result Italicize the letter n in note references. (Example: page 123n4)

match

*(?<=\[).+?(?=\])

apply 902

result Italicize text within square brackets. [For drama and transcript styles only.]

match

~m~m~m

apply 944

result Apply the “redaction” style to three consecutive em quads, creating a black bar.

match

(?<!\w)[ABCDF][+~=](?!\w)

apply 914

result Apply OT “full caps” to a plus or minus sign after a capital (as in academic grades), shifting the math sign to center on the capital height.

match

\d

apply 945

result Force all digits to set as OSF (overriding the font’s built-in “small-cap” digits). This style overrides the preceding but not the following GREP styles.

match

\<[\u\l]+-[\u\l]+-?[\u\l]*\>

apply 932

result Force hyphenated words to break across lines only at their existing hyphens.

match

(?i)\<(?:page|pages|p\.|pp\.|figure|figures|fig\.|figs\.|table|tables|chapter|chapters|vol\.|vols\.|volume|volumes)\K\x{20}(?=\d+\>)

apply 931

result Disallow breaking at the space in such expressions as “chapter 1.”

match

\<\d\d?\K\x{20}(?=(?:January|February|March|April|May|June|July|August|September|October|November|December)\>)

apply 931

result Disallow breaking at the space between a day and a month in dates (in the European format).

match

\d+\x{2032}\x{20}\d+\x{2033}

apply 931

result Disallow breaking numerical expressions that use prime marks (as in heights).

match

(?<=\d)~=\d{1,2}\>

apply 931

result Disallow breaking numerical ranges at the en dash when the second element consists of fewer than three digits.

match

(?:\p{Ps}|\p{Pi})(?:\w|[,.:;'!?/+=%–-]|\x{26}|\x{20}){1,4}

apply 931

result Disallow ending a line with an opening delimiter or quotation mark followed by fewer than five characters, including punctuation and spaces.

match

(?:\w|[,.:;'!?/+=%–-]|\x{26}|\x{20}){1,4}(?:\p{Pe}|~})

apply 931

result Disallow beginning a line with a closing delimiter or quotation mark preceded by fewer than five characters, including punctuation and spaces.

match

.{1,5}$

apply 931

result Disallow ending a paragraph with fewer than six characters on the final line.

match

*[\x{0400}-\x{045F}]+

apply 934

result Apply Russian hyphenation rules to text in Cyrillic. [Disabled by default.]

match

*[\x{0374}-\x{03D7}]+

apply 933

result Apply Greek hyphenation rules to text in Greek. [Disabled by default.]

match

\<(?:https?\://|www\.)(?:\S|\n)+\>[/?]?

apply 946

result Apply special styling to URLs: no small caps; proportional lining figures; roman ampersands; no custom letterspacing; straight quotation marks; no hyphenation. (URLs are broken by a custom script.)

match

~2

apply 943

result Baseline-shift the copyright symbol to center vertically on OS numerals in metadata notes.

match

\x{2713}

apply 965

result Apply special font to the checkmark character, as used in graphs.

match

[\x{2190}-\x{2199}]

apply 953

result Apply special font to arrow characters.

# # #

John Nolan · September 2024

Thanks Joshua. These look very useful.

pereelmagne · January 2

Expression:
-~_
Apply: Some percentage of horizontal scaling, e.g., 80%
Result: Shorten em dashes to the percentage of width specified; useful for fonts in which the em dash occupies the full em square, which often looks too wide in text.

Sorry, but I don't understand the first character in this expression. What is it (a regular hyphen?) and what is its purpose? If you want to capture em dashes, code ~_ should be enough.

Joshua Langman · January 3

Ah, I think you are correct! I may have had that extra character in there to intentionally invalidate the expression, and then pasted it into the forum without realizing.

By the way, the style guide for my current project, which I shared in the "SS.I case study" thread, has more recent and possibly more refined versions of some of my GREP expressions, though several of them are quite project-specific.

JL's magic list of GREP styles for body text!

Comments

Categories