Emit a tagged PDF/UA-2 structure tree from semantic content
At a glance
Section titled “At a glance”This recipe creates a tagged Portable Document Format/Universal
Accessibility 2 (PDF/UA-2) file. It targets International Organization for
Standardization (ISO) 14289-2. NextPDF emits a logical structure tree,
marked-content sequences, the catalog language, and document-level
identification metadata. That structure supports accessible authoring, but
an independent checker decides conformance. The recipe follows
examples/31-pdfua2-tagged.php.
Install
Section titled “Install”composer require nextpdf/core:^3Put a PDF/UA-2 checker on PATH for verification. This recipe uses
veraPDF with the ua2 flavour. You do not need a Pro or Enterprise
package to emit the tagged structure.
Conceptual overview
Section titled “Conceptual overview”A tagged PDF carries a parallel logical structure tree alongside the visual content stream. Assistive technology reads the tree instead of the pixel layout, so the structure determines the exposed reading order. ISO 14289-2 sets four requirements here. Real (non-artifact) content must be reachable through that tree (§8.2.2). Structure elements must nest in a defined order (§8.2.3). Every element must resolve to a known structure namespace, either directly or by role mapping (§8.2.4). The natural language of the content is declared at the document level and refined per structure element where it differs (§8.4.4).
NextPDF models this with a typed ConformanceMode. enableTaggedPdf()
sets ConformanceMode::PdfUa2, which (a) makes the Hypertext Markup
Language (HTML) pipeline wire a TaggedContentEmitter at parser
construction time, (b) sets the catalog MarkInfoMarked flag that signals
a tagged PDF (ISO 32000-2 §14.7), and (c) records the Best Current
Practice 47 (BCP 47) language for the catalog Lang entry. The writer
also emits the per-page Tabs entry, so the tab order follows the
structure order (ISO 32000-2 §14.8).
The strict UA-2 invariants apply only to ConformanceMode::PdfUa2. By
design, constructing a strict ConformancePolicy against any other mode
throws InvalidConfigException.
API surface
Section titled “API surface”The application programming interface (API) surface comes from PHPDoc. Use these main entry points:
\NextPDF\Core\Document::createStandalone(): DocumentDocument::enableTaggedPdf(string $lang = 'en', ?ConformancePolicy $policy = null): staticDocument::setLanguage(string $lang): static\NextPDF\Conformance\ConformancePolicy::strictUa2(): self\NextPDF\Conformance\ConformanceMode::PdfUa2(the mode set byenableTaggedPdf())Document::beginTag(string $type): static/Document::endTag(): static(manual tagging for non-HTML content)
Code sample — Quick start
Section titled “Code sample — Quick start”<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;
$doc = Document::createStandalone();
// Enable tagged mode BEFORE writeHtml(). The HTML pipeline detects the// mode at parser construction time and wires the tagged-content emitter.$doc->enableTaggedPdf(lang: 'en');
$doc->setTitle('Quarterly Accessibility Report');$doc->setLanguage('en');$doc->addPage();
$doc->writeHtml(<<<'HTML'<h1>Quarterly Accessibility Report</h1><p>This document opts into tagged PDF so assistive technology can exposea meaningful reading order.</p><ul> <li>Headings carry semantic roles.</li> <li>Lists keep their item structure.</li></ul>HTML);
$doc->save(__DIR__ . '/output/31-pdfua2-tagged.pdf');
echo "Created: output/31-pdfua2-tagged.pdf\n";Code sample — Production
Section titled “Code sample — Production”This self-contained program can run in the harness. In production, fail
fast on a malformed language tag instead of discovering it only when the
external checker runs. Pass ConformancePolicy::strictUa2() to reject an
invalid BCP 47 tag at the API boundary, then gate the build on the checker
verdict.
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Conformance\ConformancePolicy;use NextPDF\Core\Document;use NextPDF\Exception\InvalidConfigException;
$out = getenv('NEXTPDF_COOKBOOK_OUTPUT') ?: (__DIR__ . '/accessible.pdf');
try { $doc = Document::createStandalone();
// Strict UA-2: a malformed BCP 47 tag throws here, not silently at // write time. strictUa2() also forces the §8.4.4 Lang validation. $doc->enableTaggedPdf(lang: 'en-GB', policy: ConformancePolicy::strictUa2());
$doc->setTitle('Accessible Annual Report 2026'); $doc->setLanguage('en-GB'); $doc->addPage();
$doc->writeHtml(<<<'HTML'<h1>Annual Report 2026</h1><p>Audited results for the financial year ending March 2026.</p><h2>Segment performance</h2><table> <tr><th>Segment</th><th>Revenue</th></tr> <tr><td>Cloud</td><td>42.1</td></tr> <tr><td>Services</td><td>18.7</td></tr></table>HTML);
$doc->save($out);} catch (InvalidConfigException $e) { fwrite(STDERR, "Tagged PDF/UA-2 setup rejected: {$e->getMessage()}\n"); exit(1);}
// The gate is the checker, not the library.$exitCode = 0;$report = [];exec('verapdf --flavour ua2 ' . escapeshellarg($out), $report, $exitCode);
if ($exitCode !== 0) { fwrite(STDERR, "veraPDF FAILED — output is not PDF/UA-2 conforming\n"); fwrite(STDERR, implode("\n", $report) . "\n"); exit(1);}
echo "veraPDF PASS — accessible.pdf carries a conforming UA-2 structure\n";On a host where verapdf --flavour ua2 reports a conforming file, expected
standard output (STDOUT) is:
veraPDF PASS — accessible.pdf carries a conforming UA-2 structureIf enableTaggedPdf() rejects the language tag, the program exits non-zero
after Tagged PDF/UA-2 setup rejected: … on standard error (STDERR). If
the checker reports a problem, it exits non-zero after
veraPDF FAILED — output is not PDF/UA-2 conforming. The checker gives the
verdict: NextPDF emits the structure but does not assert conformance.
Edge cases & gotchas
Section titled “Edge cases & gotchas”- Call order.
enableTaggedPdf()afterwriteHtml()does not retroactively tag content already written. Enable tagged mode first. - Strict language gate. Without a policy, an unparseable BCP 47 tag is
dropped silently and appears only at checker time. With
ConformancePolicy::strictUa2(), the same tag throwsInvalidConfigExceptionat theenableTaggedPdf()boundary (ISO 14289-2 §8.4.4 strict path). - Idempotent re-enable. If you call
enableTaggedPdf()twice, NextPDF updates the language without rebuilding a populated structure tree. - Manual tagging. For non-HTML content, wrap items with
beginTag()/endTag(). Container roles (Table,TR,L,LI) become grouping elements with no marked content. Leaf roles (P,H1–H6,TD) get marked-content identifiers (MCIDs). - Mode exclusivity. A strict
ConformancePolicyis valid only withConformanceMode::PdfUa2. Combining strict UA-2 flags with a PDF/A mode throwsInvalidConfigException. Compose a tagged PDF/A deliverable by enabling tagged mode and the PDF/A profile separately.
Performance
Section titled “Performance”The structure tree adds one parallel tree of lightweight dictionaries and
per-text-run BDC/EMC operators. For a typical report, the overhead is a
few percent of output size and stays well within the 2000 ms / 128 MB
budget. The semantic reproducibility profile applies because a
checker-oriented deliverable is compared by structural abstract syntax tree
(AST) plus metadata, not by raw bytes. See the Conformance section.
Security notes
Section titled “Security notes”Data Residency & PII Mitigations
Section titled “Data Residency & PII Mitigations”The structure tree carries the same text as the visible content. If the
source HTML contains personal data, including personally identifiable
information (PII), that data is also reachable through the tree and through
ActualText/Alt attributes. Apply the same redaction and minimization
before authoring as you would for the visible content. Tagging adds no new
exfiltration path, but it makes the text programmatically extractable by
design.
Safe Telemetry & Log Scrubbing
Section titled “Safe Telemetry & Log Scrubbing”The recipe writes only a fixed progress line to STDOUT. It routes the PDF
to the harness side channel (NEXTPDF_COOKBOOK_OUTPUT) or a caller path.
Document text is never logged. Keep checker output, which can echo content
fragments, out of shared logs.
Threat model
Section titled “Threat model”A tagged PDF is not a trust boundary. If your consumer trusts the structure tree for automated processing, it must still validate the file because a hostile producer can emit a structurally well-formed but misleading tree. Treat the structure as an accessibility affordance, not as an integrity or authenticity signal.
FIPS-mode behavior
Section titled “FIPS-mode behavior”This recipe performs no cryptographic operation. Federal Information Processing Standards (FIPS) mode does not change its behavior. No signing or encryption is involved.
PDF/UA-2 mapping
Section titled “PDF/UA-2 mapping”| PDF/UA-2 requirement | What NextPDF emits | Clause |
|---|---|---|
| Real content is in the structure tree | StructTreeRoot with per-block StructElem and MCID-linked marked content | ISO 14289-2 §8.2.2 |
| Defined nesting and reading order | Block elements mapped to grouping/leaf roles in document order | ISO 14289-2 §8.2.3 |
| Known structure namespace | Roles in the PDF 2.0 namespace; HTML tags role-mapped where needed | ISO 14289-2 §8.2.4 |
| Document and element language | Catalog Lang from the BCP 47 tag; per-element Lang when it differs | ISO 14289-2 §8.4.4 |
| Non-text content has a text alternative | Alt/ActualText carried on figure/non-text structure elements | ISO 14289-2 §8.5.1 |
| Table relationships | Table/TR/TH/TD roles with header association | ISO 14289-2 §8.2.5.26 |
| Part identification metadata | Document-level identification scheduled at save | ISO 14289-2 §Intro (pdfua2#p17) |
Tag → ISO 32000-2 §14 cross-ref
Section titled “Tag → ISO 32000-2 §14 cross-ref”PDF/UA-2 layers accessibility requirements on the ISO 32000-2 tagged-PDF machinery. NextPDF relies on this mapping:
| NextPDF emission | ISO 32000-2 §14 facility | Clause |
|---|---|---|
Logical structure tree (StructTreeRoot) | Tagged PDF logical structure | §14.7 (iso32000_2_sec14#x1.x38.p13) |
Catalog MarkInfo << /Marked true >> | Tagged-PDF marker | §14.7 (iso32000_2_sec14#x1.x40.p3) |
Per-page Tabs entry following structure order | Structural navigation / tab order | §14.8 (iso32000_2_sec14#x1.x50) |
WCAG 2.2 mapping
Section titled “WCAG 2.2 mapping”PDF/UA-2 is the PDF-format expression of structure requirements that Web Content Accessibility Guidelines (WCAG) 2.2 states format-independently. The relevant alignment:
| WCAG 2.2 success criterion | PDF/UA-2 mechanism this recipe produces |
|---|---|
| 1.3.1 Info and Relationships (Level A) | The structure tree makes headings, lists, and table relationships programmatically determinable (wcag_2_2#x2.x3.x3.x1.p3). |
| 1.3.2 Meaningful Sequence (Level A) | Structure order defines the reading order independent of visual layout. |
| 3.1.1 Language of Page (Level A) | The catalog Lang entry from the BCP 47 tag. |
| 1.1.1 Non-text Content (Level A) | Alt/ActualText on non-text structure elements (ISO 14289-2 §8.5.1). |
This mapping shows where the emitted structure supports a WCAG 2.2 criterion. It is not a WCAG conformance claim. WCAG conformance covers the whole user experience, and an accessibility evaluation determines it, not the producer.
Conformance
Section titled “Conformance”| Statement | Spec | Clause | reference_id |
|---|---|---|---|
| Real content requires a logical structure. | ISO 14289-2 | §8.2.2 | |
| Structure elements follow a defined nesting and reading order. | ISO 14289-2 | §8.2.3 | |
| Every structure element resolves to a known namespace, directly or by role mapping. | ISO 14289-2 | §8.2.4 | |
| Natural language is declared at the document and structure-element level. | ISO 14289-2 | §8.4.4 | |
| Non-text content carries a text alternative. | ISO 14289-2 | §8.5.1 | |
| Table cells carry row/header/data relationships. | ISO 14289-2 | §8.2.5.26 | |
The tagged-PDF marker is the catalog MarkInfoMarked flag. | ISO 32000-2 | §14.7 | |
| Conformance is decided against the part, not asserted by the producer. | ISO 14289-2 | §8.14.2 |
NextPDF emits the tagged structure that supports accessible authoring. Support is not conformance. This recipe does not assert PDF/UA-2 conformance. An independent checker, such as veraPDF, makes that determination. Run the checker before you state that a file conforms.