Skip to content

Emit a tagged PDF/UA-2 structure tree from semantic content

This recipe creates a tagged Portable Document Format/Universal Accessibility 2 (PDF/UA-2) file. It targets International Organization for Standardization (ISO) 14289-2. NextPDF emits a logical structure tree, marked-content sequences, the catalog language, and document-level identification metadata. That structure supports accessible authoring, but an independent checker decides conformance. The recipe follows examples/31-pdfua2-tagged.php.

Terminal window
composer require nextpdf/core:^3

Put a PDF/UA-2 checker on PATH for verification. This recipe uses veraPDF with the ua2 flavour. You do not need a Pro or Enterprise package to emit the tagged structure.

A tagged PDF carries a parallel logical structure tree alongside the visual content stream. Assistive technology reads the tree instead of the pixel layout, so the structure determines the exposed reading order. ISO 14289-2 sets four requirements here. Real (non-artifact) content must be reachable through that tree (§8.2.2). Structure elements must nest in a defined order (§8.2.3). Every element must resolve to a known structure namespace, either directly or by role mapping (§8.2.4). The natural language of the content is declared at the document level and refined per structure element where it differs (§8.4.4).

NextPDF models this with a typed ConformanceMode. enableTaggedPdf() sets ConformanceMode::PdfUa2, which (a) makes the Hypertext Markup Language (HTML) pipeline wire a TaggedContentEmitter at parser construction time, (b) sets the catalog MarkInfoMarked flag that signals a tagged PDF (ISO 32000-2 §14.7), and (c) records the Best Current Practice 47 (BCP 47) language for the catalog Lang entry. The writer also emits the per-page Tabs entry, so the tab order follows the structure order (ISO 32000-2 §14.8).

The strict UA-2 invariants apply only to ConformanceMode::PdfUa2. By design, constructing a strict ConformancePolicy against any other mode throws InvalidConfigException.

The application programming interface (API) surface comes from PHPDoc. Use these main entry points:

  • \NextPDF\Core\Document::createStandalone(): Document
  • Document::enableTaggedPdf(string $lang = 'en', ?ConformancePolicy $policy = null): static
  • Document::setLanguage(string $lang): static
  • \NextPDF\Conformance\ConformancePolicy::strictUa2(): self
  • \NextPDF\Conformance\ConformanceMode::PdfUa2 (the mode set by enableTaggedPdf())
  • Document::beginTag(string $type): static / Document::endTag(): static (manual tagging for non-HTML content)
examples/31-pdfua2-tagged.php
<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;
$doc = Document::createStandalone();
// Enable tagged mode BEFORE writeHtml(). The HTML pipeline detects the
// mode at parser construction time and wires the tagged-content emitter.
$doc->enableTaggedPdf(lang: 'en');
$doc->setTitle('Quarterly Accessibility Report');
$doc->setLanguage('en');
$doc->addPage();
$doc->writeHtml(<<<'HTML'
<h1>Quarterly Accessibility Report</h1>
<p>This document opts into tagged PDF so assistive technology can expose
a meaningful reading order.</p>
<ul>
<li>Headings carry semantic roles.</li>
<li>Lists keep their item structure.</li>
</ul>
HTML);
$doc->save(__DIR__ . '/output/31-pdfua2-tagged.pdf');
echo "Created: output/31-pdfua2-tagged.pdf\n";

This self-contained program can run in the harness. In production, fail fast on a malformed language tag instead of discovering it only when the external checker runs. Pass ConformancePolicy::strictUa2() to reject an invalid BCP 47 tag at the API boundary, then gate the build on the checker verdict.

<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Conformance\ConformancePolicy;
use NextPDF\Core\Document;
use NextPDF\Exception\InvalidConfigException;
$out = getenv('NEXTPDF_COOKBOOK_OUTPUT') ?: (__DIR__ . '/accessible.pdf');
try {
$doc = Document::createStandalone();
// Strict UA-2: a malformed BCP 47 tag throws here, not silently at
// write time. strictUa2() also forces the §8.4.4 Lang validation.
$doc->enableTaggedPdf(lang: 'en-GB', policy: ConformancePolicy::strictUa2());
$doc->setTitle('Accessible Annual Report 2026');
$doc->setLanguage('en-GB');
$doc->addPage();
$doc->writeHtml(<<<'HTML'
<h1>Annual Report 2026</h1>
<p>Audited results for the financial year ending March 2026.</p>
<h2>Segment performance</h2>
<table>
<tr><th>Segment</th><th>Revenue</th></tr>
<tr><td>Cloud</td><td>42.1</td></tr>
<tr><td>Services</td><td>18.7</td></tr>
</table>
HTML);
$doc->save($out);
} catch (InvalidConfigException $e) {
fwrite(STDERR, "Tagged PDF/UA-2 setup rejected: {$e->getMessage()}\n");
exit(1);
}
// The gate is the checker, not the library.
$exitCode = 0;
$report = [];
exec('verapdf --flavour ua2 ' . escapeshellarg($out), $report, $exitCode);
if ($exitCode !== 0) {
fwrite(STDERR, "veraPDF FAILED — output is not PDF/UA-2 conforming\n");
fwrite(STDERR, implode("\n", $report) . "\n");
exit(1);
}
echo "veraPDF PASS — accessible.pdf carries a conforming UA-2 structure\n";

On a host where verapdf --flavour ua2 reports a conforming file, expected standard output (STDOUT) is:

veraPDF PASS — accessible.pdf carries a conforming UA-2 structure

If enableTaggedPdf() rejects the language tag, the program exits non-zero after Tagged PDF/UA-2 setup rejected: … on standard error (STDERR). If the checker reports a problem, it exits non-zero after veraPDF FAILED — output is not PDF/UA-2 conforming. The checker gives the verdict: NextPDF emits the structure but does not assert conformance.

  • Call order. enableTaggedPdf() after writeHtml() does not retroactively tag content already written. Enable tagged mode first.
  • Strict language gate. Without a policy, an unparseable BCP 47 tag is dropped silently and appears only at checker time. With ConformancePolicy::strictUa2(), the same tag throws InvalidConfigException at the enableTaggedPdf() boundary (ISO 14289-2 §8.4.4 strict path).
  • Idempotent re-enable. If you call enableTaggedPdf() twice, NextPDF updates the language without rebuilding a populated structure tree.
  • Manual tagging. For non-HTML content, wrap items with beginTag() / endTag(). Container roles (Table, TR, L, LI) become grouping elements with no marked content. Leaf roles (P, H1H6, TD) get marked-content identifiers (MCIDs).
  • Mode exclusivity. A strict ConformancePolicy is valid only with ConformanceMode::PdfUa2. Combining strict UA-2 flags with a PDF/A mode throws InvalidConfigException. Compose a tagged PDF/A deliverable by enabling tagged mode and the PDF/A profile separately.

The structure tree adds one parallel tree of lightweight dictionaries and per-text-run BDC/EMC operators. For a typical report, the overhead is a few percent of output size and stays well within the 2000 ms / 128 MB budget. The semantic reproducibility profile applies because a checker-oriented deliverable is compared by structural abstract syntax tree (AST) plus metadata, not by raw bytes. See the Conformance section.

The structure tree carries the same text as the visible content. If the source HTML contains personal data, including personally identifiable information (PII), that data is also reachable through the tree and through ActualText/Alt attributes. Apply the same redaction and minimization before authoring as you would for the visible content. Tagging adds no new exfiltration path, but it makes the text programmatically extractable by design.

The recipe writes only a fixed progress line to STDOUT. It routes the PDF to the harness side channel (NEXTPDF_COOKBOOK_OUTPUT) or a caller path. Document text is never logged. Keep checker output, which can echo content fragments, out of shared logs.

A tagged PDF is not a trust boundary. If your consumer trusts the structure tree for automated processing, it must still validate the file because a hostile producer can emit a structurally well-formed but misleading tree. Treat the structure as an accessibility affordance, not as an integrity or authenticity signal.

This recipe performs no cryptographic operation. Federal Information Processing Standards (FIPS) mode does not change its behavior. No signing or encryption is involved.

PDF/UA-2 requirementWhat NextPDF emitsClause
Real content is in the structure treeStructTreeRoot with per-block StructElem and MCID-linked marked contentISO 14289-2 §8.2.2
Defined nesting and reading orderBlock elements mapped to grouping/leaf roles in document orderISO 14289-2 §8.2.3
Known structure namespaceRoles in the PDF 2.0 namespace; HTML tags role-mapped where neededISO 14289-2 §8.2.4
Document and element languageCatalog Lang from the BCP 47 tag; per-element Lang when it differsISO 14289-2 §8.4.4
Non-text content has a text alternativeAlt/ActualText carried on figure/non-text structure elementsISO 14289-2 §8.5.1
Table relationshipsTable/TR/TH/TD roles with header associationISO 14289-2 §8.2.5.26
Part identification metadataDocument-level identification scheduled at saveISO 14289-2 §Intro (pdfua2#p17)

PDF/UA-2 layers accessibility requirements on the ISO 32000-2 tagged-PDF machinery. NextPDF relies on this mapping:

NextPDF emissionISO 32000-2 §14 facilityClause
Logical structure tree (StructTreeRoot)Tagged PDF logical structure§14.7 (iso32000_2_sec14#x1.x38.p13)
Catalog MarkInfo << /Marked true >>Tagged-PDF marker§14.7 (iso32000_2_sec14#x1.x40.p3)
Per-page Tabs entry following structure orderStructural navigation / tab order§14.8 (iso32000_2_sec14#x1.x50)

PDF/UA-2 is the PDF-format expression of structure requirements that Web Content Accessibility Guidelines (WCAG) 2.2 states format-independently. The relevant alignment:

WCAG 2.2 success criterionPDF/UA-2 mechanism this recipe produces
1.3.1 Info and Relationships (Level A)The structure tree makes headings, lists, and table relationships programmatically determinable (wcag_2_2#x2.x3.x3.x1.p3).
1.3.2 Meaningful Sequence (Level A)Structure order defines the reading order independent of visual layout.
3.1.1 Language of Page (Level A)The catalog Lang entry from the BCP 47 tag.
1.1.1 Non-text Content (Level A)Alt/ActualText on non-text structure elements (ISO 14289-2 §8.5.1).

This mapping shows where the emitted structure supports a WCAG 2.2 criterion. It is not a WCAG conformance claim. WCAG conformance covers the whole user experience, and an accessibility evaluation determines it, not the producer.

StatementSpecClausereference_id
Real content requires a logical structure.ISO 14289-2§8.2.2
Structure elements follow a defined nesting and reading order.ISO 14289-2§8.2.3
Every structure element resolves to a known namespace, directly or by role mapping.ISO 14289-2§8.2.4
Natural language is declared at the document and structure-element level.ISO 14289-2§8.4.4
Non-text content carries a text alternative.ISO 14289-2§8.5.1
Table cells carry row/header/data relationships.ISO 14289-2§8.2.5.26
The tagged-PDF marker is the catalog MarkInfoMarked flag.ISO 32000-2§14.7
Conformance is decided against the part, not asserted by the producer.ISO 14289-2§8.14.2

NextPDF emits the tagged structure that supports accessible authoring. Support is not conformance. This recipe does not assert PDF/UA-2 conformance. An independent checker, such as veraPDF, makes that determination. Run the checker before you state that a file conforms.