37. Text Substitutions

Text substitution elements replace characters, markup, attribute references, and macros with converter specific styles and values. When Asciidoctor processes a document it uses a set of six text substitution elements. The processor runs the text substitution elements in the following order.

  1. Special characters

  2. Quotes

  3. Attribute references

  4. Replacements

  5. Inline macros

  6. Post replacements

In turn, these substitutions are organized into composite value groups. The table below shows which substitution elements are included in each group.

Substitution groups
Group Special characters Quotes Attributes Replacements Macros Post replacements

Header

None

Normal

Pass

Verbatim

By default, the normal substitution group is applied to most block and inline elements. However, there are a few exceptions.

The header substitution group is applied to metadata lines (author and revision information) in the document header. This group is also applied to the values of attribute entries, regardless of whether those entries are defined in the header or elsewhere in the document. In the header, only special characters and attribute references are replaced. In attribute entries, you can also use the inline pass macro.

Fenced, literal, listing, and source blocks are processed with the verbatim substitution group. Only special characters are replaced in these blocks.

The pass substitution group can only be applied to passthrough elements. Attribute references and macros are replaced in passthroughs.

The none substitution group is applied to comment blocks.

The title substitution group

The title substitution group includes the same text substitutions as the normal group. However, the order that the substitutions are executed is slightly different. Text substitutions are applied to titles in the following sequence:

  1. Special characters

  2. Quotes

  3. Replacements

  4. Inline macros

  5. Attribute references

  6. Post replacements

37.1. Special Characters

When applicable, the first text substitution to occur is the replacement of any special characters. This process is handled by the specialchars element. The specialchars element searches for three characters (<, >, &) and replaces them with their named character references.

  • The less than symbol, <, is replaced with the named character reference &lt;.

  • The greater than symbol, >, is replaced with the named character reference &gt;.

  • An ampersand, &, is replaced with the named character reference &amp;.

By default, the special characters substitution occurs on all inline and block elements except for comments and certain passthroughs. The substitution of special characters can be controlled on blocks using the subs attribute and on inline elements using the passthrough macro.

Special character substitution precedes attribute substitution, so you will need to manually escape any attributes containing special characters that you set in the CLI or API. For example, on the command line, type -a toc-title="Sections, Tables &amp; Figures" instead of -a toc-title="Sections, Tables & Figures".

37.2. Quotes

The quotes substitution replaces the formatting markup on inline elements.

For example, when a document is converted to HTML, any asterisks enclosing text are replaced with <strong> HTML tags.

Syntax input
Happy werewolves are *really* slobbery.
HTML output
Happy werewolves are <strong>really</strong> slobbery.

The following table shows the HTML markup that is generated by the quotes substitution process.

HTML markup generated from AsciiDoc formatting syntax
Name AsciiDoc HTML

emphasis

_word_

<em>word</em>

strong

*word*

<strong>word</strong>

monospace

`word`

<code>word</code>

superscript

^word^

<sup>word</sup>

subscript

~word~

<sub>word</sub>

double curved quotes

"`word`"

&#8220;word&#8221;

single curved quotes

'`word`'

&#8216;word&#8217;

The quotes substitution occurs on formatted text within title, paragraph, example, quote, sidebar, and verse blocks.

Elements subject to quotes text substitution
Element quotes substitution

Attribute Entry Value

Comment

Example

Fenced

Header

Literal

Listing

Macro

Open

Varies

Paragraph

Passthrough

Quote

Sidebar

Source

Special sections

Table

Varies

Title

Verse

37.3. Attributes

Attribute references are replaced with their values when they’re processed by the attributes substitution.

Elements subject to attributes text substitution
Element attributes substitution

Attribute Entry Value

Comment

Example

Fenced

Header

Literal

Listing

Macro

Open

Varies

Paragraph

Passthrough

Quote

Sidebar

Source

Special sections

Table

Varies

Title

Verse

37.4. Replacements

The replacements substitution processes textual characters such as marks, arrows and dashes and replaces them with the decimal format of their Unicode code point, i.e. their numeric character reference.

Textual symbol replacements
Name Syntax Unicode Replacement Rendered Notes

Copyright

(C)
&#169;

©

Registered

(R)
&#174;

®

Trademark

(TM)
&#8482;

Em dash

--
&#8212;

 — 

Only replaced if between two word characters, between a word character and a line boundary, or flanked by spaces.

When flanked by space characters (e.g., a -- b), the normal spaces are replaced by thin spaces (&#8201;).

Ellipsis

...
&#8230;

…​

Single right arrow

->
&#8594;

Double right arrow

=>
&#8658;

Single left arrow

<-
&#8592;

Double left arrow

<=
&#8656;

Typographic apostrophe

Sam's
Sam&#8217;s

Sam’s

The typewriter apostrophe is replaced with the typographic (aka curly) apostrophe.

The replacements element depends on the substitutions completed by the specialcharacters element. This is important to keep in mind when applying custom substitutions to a block. See the section about applying custom substitutions for more information.

The replacements substitution also recognizes HTML and XML character references as well as decimal and hexadecimal Unicode code points and substitutes them for their corresponding decimal form Unicode code point.

For example, to produce the § symbol you could write &sect;, &#x00A7;, or &#167;. When the document is processed, replacements will replace the section symbol reference, regardless of whether it is a named character reference or a numeric character reference, with &#167;. In turn, &#167; will display as §.

Anatomy of a character reference

A character reference is a standard sequence of characters that is substituted for a single character when Asciidoctor processes a document. There are two types of character references: named character references and numeric character references.

A named character reference (often called a character entity reference) is a short name that refers to a character (i.e., glyph). To make the reference, the name must be prefixed with an ampersand (&) and end with a semicolon (;).

For example:

  • &dagger; displays as †

  • &euro; displays as €

  • &loz; displays as ◊

Numeric character references are the decimal or hexadecimal Universal Character Set/Unicode code points which refer to a character.

  • The decimal code point references are prefixed with an ampersand (&), followed by a hash (#), and end with a semicolon (;).

  • Hexadecimal code point references are prefixed with an ampersand (&), followed by a hash (#), followed by a lowercase x, and end with a semicolon (;).

For example:

  • &#x2020; or &#8224; displays as †

  • &#x20AC; or &#8364; displays as €

  • &#x25CA; or &#9674; displays as ◊

Developers may be more familiar with using Unicode escape sequences to perform text substitutions. For example, to produce an @ sign using a Unicode escape sequence, you would prefix the hexadecimal Unicode code point with a backslash (\) and an uppercase or lowercase u, i.e. u0040. However, Asciidoctor doesn’t process and replace Unicode escape sequences at this time.

Asciidoctor also provides numerous built-in attributes for representing characters and symbols. These attributes and their corresponding output are listed in Predefined attributes for character replacements [1][2][3].

The replacements substitution occurs within title, paragraph, example, quote, sidebar, and verse blocks.

Elements subject to replacements text substitution
Element replacements substitution

Attribute Entry Value

Comment

Example

Fenced

Header

Literal

Listing

Macro

Open

Varies

Paragraph

Passthrough

Quote

Sidebar

Source

Special sections

Table

Varies

Title

Verse

37.5. Macros

Macros are processed by the macros element. The macros substitution replaces a macro’s content with the appropriate built-in and user-defined configuration.

Elements subject to macros substitution
Element macros substitution

Attribute Entry Value

inline pass macro only

Comment

Example

Fenced

Header

Literal

Listing

Macro

Open

Varies

Paragraph

Passthrough

Quote

Sidebar

Source

Special sections

Table

Varies

Title

Verse

37.6. Post Replacements

The line break character, +, is replaced when the post_replacements process runs.

Elements subject to post replacements text substitution
Element post_replacements substitution

Attribute Entry Value

Comment

Example

Fenced

Header

Literal

Listing

Macro

Open

Varies

Paragraph

Passthrough

Quote

Sidebar

Source

Special sections

Table

Varies

Title

Verse

37.7. Applying Substitutions

Specific substitution elements can be applied to any block or paragraph by setting the subs attribute. The subs attribute can be assigned a comma separated list of the following substitution elements and groups.

none

Disables substitutions

normal

Performs all substitutions except for callouts

verbatim

Replaces special characters and processes callouts

specialchars, specialcharacters

Replaces <, >, and & with their corresponding entities

quotes

Applies text formatting

attributes

Replaces attribute references

replacements

Substitutes textual and character reference replacements

macros

Processes macros

post_replacements

Replaces the line break character (+)

Let’s look at an example where you only want to process special characters, formatting markup, and callouts in a literal block. By default, literal blocks are only subject to special characters substitution. But you can change this behavior by setting the subs attribute in the block’s attribute list.

[source,java,subs="verbatim,quotes"] (1)
----
System.out.println("Hello *bold* text"). (2)
----
1 The subs attribute is set in the attribute list and assigned the verbatim and quotes values.
2 The formatting markup in this line will be replaced when the quotes substitution runs.
System.out.println("Hello bold text").  (1) (2)
1 The verbatim value enables the callouts to be processed.
2 The quotes value enables the text formatting to be processed.

If you are applying the same set of substitutions to numerous blocks, you should consider making them an attribute entry to ensure consistency.

:markup-in-source: verbatim,quotes

[source,java,subs="{markup-in-source}"]
----
System.out.println("Hello *bold* text").
----

Another way to ensure consistency and keep your documents clean and simple is to use the TreeProcessor extension.

37.8. Incremental Substitutions

When you set the subs attribute on a block, you automatically remove all of its default substitutions. For example, if you set subs on a literal block, and assign it a value of attributes, only attributes are substituted. The verbatim substitution will not be applied. To remedy this situation, Asciidoctor provides a syntax to append or remove substitutions instead of replacing them outright.

You can add or remove a substitution from the default substitution list using the plus (+) and minus (-) modifiers. These are known as incremental substitutions.

<substitution>+

Prepends the substitution to the default list.

+<substitution>

Appends the substitution to the default list.

-<substitution>

Removes the substitution from the default list.

For example, you can add the attributes substitution to a listing block’s default substitution list.

Add attributes substitution to a default substitution list
[source,xml,subs="attributes+"]
----
<version>{version}</version>
----

Similarly, you can remove the callouts substitution.

Remove callouts substitution from a default substitution list
[source,xml,subs="-callouts"]
.An illegal XML tag
----
<1>
  content inside "1" tag
</1>
----

You can also specify whether the substitution is placed at the beginning or end of the substitution list. If a + comes before the name of the substitution, then it’s added to the end of the existing list, and if a + comes after the name, it’s added to the beginning of the list.

[source,xml,subs="attributes+,+replacements,-callouts"]
----
<version>{version}</version>
<copyright>(C) ACME</copyright>
<1>
  content inside "1" tag
</1>
----

In the above example, the quotes, then the special characters, and then the attributes substitutions will be applied to the listing block.

37.8.1. Applying Substitutions to Inline Elements

Custom substitutions can also be applied to some inline elements, such as the pass macro.

For example, the quotes text substitution value is assigned in the inline pass macro below.

The text pass:q[<u>underline *me*</u>] is underlined and the word "`me`" is bold.

The text underline me is underlined and the word “me” is bold.

37.9. Preventing Substitutions

Asciidoctor provides several approaches for preventing substitutions.

Backslash escaping

To prevent punctuation from being interpreted as formatting markup, precede it with a backslash (\). If the formatting punctuation begins with two characters (e.g., __), you need to precede it with two backslashes (\\). This is also how you can prevent character and attribute references from substitution. When your document is processed, the backslash is removed so it doesn’t display in your output.

\*Stars* will appear as *Stars*, not as bold text.

\&sect; will appear as an entity, not the &sect; symbol.

\\__func__ will appear as __func__, not as emphasized text.

\{two-semicolons} will appear {two-semicolons}, not resolved as ;;.

You can also prevent substitutions with macro and block passthroughs.