Skip to content

HTML: HTML+CSS to PDF rendering subsystem

The HTML subsystem converts HyperText Markup Language (HTML) and Cascading Style Sheets (CSS) into Portable Document Format (PDF) content streams in one forward pass. It is the engine’s largest and highest-risk subsystem, with 324 files under src/Html/.

Terminal window
composer require nextpdf/core:^3

The HTML subsystem is a single-pass streaming HTML+CSS-to-PDF renderer. Its public surface is one method: Document::writeHtml(). Internally, HtmlParser tokenizes the input, resolves styles, computes layout, and emits PDF operators in one forward pass, without retaining a document tree.

Be clear about the scope. This subsystem is not a retained-document renderer. It does not hold an element graph, re-lay-out content already written, or let input mutate after parsing starts. It implements a curated subset of CSS at fixed specification pins. Two Architecture Decision Records (ADRs) govern it. ADR-001 defines the single-pass streaming model and its caps. ADR-010 defines the four-layer contract (CSS parsing, style state, layout, paint), plus the paged-media and measurement adjuncts.

HtmlParser is rated critical risk in the module manifest. Five files carry documented danger-zone annotations: the HtmlParser orchestrator (streaming tokenizer, 1000+ lines of code (LOC)), HtmlStyleState (100+ CSS property fields with a stack inheritance model), HtmlBlockHandler (block dispatch coupled to style state), FlexLayoutEngine (full flex measurement and layout), and TableParser (colspan/rowspan pagination across page breaks). Treat changes here as plan-mode work.

Use this page as the entry point. See pipeline for the stage sequence, css-resolver for cascade and specificity, layer-contracts-adr010 for the layer boundaries, and streaming-constraints-adr001 for the no-retained-tree model and its caps.

writeHtml() renders right-to-left (RTL) content. Set the CSS direction: rtl property on the body, a table, or any element. The engine resolves the visual order with the Unicode Bidirectional Algorithm (UAX #9) through the typography layer’s bidirectional engine — see Typography for the BidiEngine details. Mixed Latin, Arabic, and numeric content orders correctly, and a number after Arabic keeps its digits left to right.

Arabic also takes contextual shaping: the engine selects the initial, medial, final, or isolated form of each letter and applies the Lam-Alef ligature. Shaping needs a registered font whose character map covers the Arabic Presentation Forms-B block; a Latin-only face, including the standard-14 fonts, cannot draw Arabic. In tables, each cell is reordered and shaped on its own and aligns to the start (right) edge under direction: rtl. RTL applies to Arabic, Hebrew, Persian, and Urdu; Hebrew is reordered but not shaped.

Set direction with the CSS direction property — the HTML dir attribute does not map to it. Horizontal alignment of non-table block and inline text, and text-align: justify, are not yet applied. For a runnable Arabic invoice and the full list of current limitations, see Render right-to-left Arabic HTML.

SymbolLocationRole
Document::writeHtml(string $html): staticsrc/Core/Concerns/HasTextOutput.phpPublic entry point. Renders HTML at the current cursor.
Document::createStandalone(): selfsrc/Core/Document.phpConstruct a standalone document.
HtmlParser::parse(string $html): HtmlRenderResultsrc/Html/HtmlParser.phpInternal orchestrator.
HtmlRenderResultsrc/Html/HtmlRenderResult.phpImmutable result: stream, end cursor, and used fonts.
DefaultHtmlSecurityPolicysrc/Html/DefaultHtmlSecurityPolicy.phpDefault tag, attribute, CSS, and Uniform Resource Locator (URL) policy.
HtmlSecurityPolicyInterfacesrc/Contracts/HtmlSecurityPolicyInterface.phpPolicy contract for custom policies.

Source: examples/08-html-basic.php.

<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;
$doc = Document::createStandalone();
$doc->setTitle('HTML Basic');
$doc->addPage();
$doc->writeHtml('<h1 style="color:#1E3A8A;">HTML Rendering</h1><p>Direct to PDF.</p>');
$doc->save(__DIR__ . '/output/08-html-basic.pdf');

This sample shows a table report with an embedded style block, modeled on examples/09-html-table.php.

<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Core\Document;
use NextPDF\Exception\HtmlParsingException;
function renderInventory(string $rowsHtml, string $out): void
{
$doc = Document::createStandalone();
$doc->setTitle('Inventory');
$doc->addPage();
$html = '<style>table { width: 100%; } '
. 'th { background-color: #1E3A8A; color: #FFFFFF; }</style>'
. '<table border="1" cellpadding="5">' . $rowsHtml . '</table>';
try {
$doc->writeHtml($html);
} catch (HtmlParsingException $e) {
// Input cap, element cap (50,000), or nesting cap (100). Do not retry.
throw $e;
}
$doc->save($out);
}
  • Curated CSS subset. Support is pinned per module. Check the CSS support matrix before you rely on a property.
  • Hard caps throw. The 10 MB input, 50,000 elements, and 100 nesting levels caps each throw HtmlParsingException. See streaming constraints.
  • No re-layout. The renderer writes output once in document order; late styles cannot change earlier output.
  • :has() is gated behind the css.has experimental feature.
  • Critical-risk subsystem. Five files are marked as danger zones. Use plan mode for changes under src/Html/.

Single-pass streaming constraints (ADR-001)

Section titled “Single-pass streaming constraints (ADR-001)”

The renderer keeps no document tree and runs one forward pass. The element, nesting, and input caps are hard limits. For full detail and the worker-safety contract, see streaming constraints (ADR-001).

CSS parsing, style state, layout, and paint are separated into four layers with one-directional contracts, plus paged-media and measurement adjuncts. For full detail, see layer contracts (ADR-010).

Style-state and cursor memory is O(nesting depth), not O(element count). The per-page performance_budget is peak_mb: 64. The 50,000-element cap is the hard ceiling; split larger inputs across multiple writeHtml() calls. For details, see streaming constraints.

Traversal is O(token count). Table column sizing adds a bounded per-table row scan. The optional :has() pre-scan adds one bounded token-list pass. The HTML render-pipeline performance benchmark enforces a 5% regression gate (merged work, pull request (PR) #564). The per-page performance_budget (wall_ms: 1500, peak_mb: 64) is the operational ceiling.

DefaultHtmlSecurityPolicy enforces an allowlist of tags, attributes, CSS properties, and URL schemes, plus a 10 MB input ceiling and a 100-level nesting ceiling, independently of the parser. The CSS property allowlist is the security ceiling. The runtime support table is a separate capability ceiling. Implement HtmlSecurityPolicyInterface to supply a stricter policy. DefaultExternalResourcePolicy governs external resource fetching separately.

In href and image src values, the URL allowlist also rejects backslash-rooted (\…) and Universal Naming Convention (UNC) (\\host\share) paths, alongside the existing protocol-relative (//) rejection and the http(s)-or-relative-only allowlist. Backslashes are normalized to forward slashes before the check, so a Windows absolute-path local-file include or a Server Message Block (SMB) share fetch cannot fall through the “no scheme, therefore relative” branch. Neither path carries a Uniform Resource Identifier (URI) scheme.

CSS support matrix excerpt (Verified-only rows)

Section titled “CSS support matrix excerpt (Verified-only rows)”

This page does not restate per-property support. The CSS support matrix is the single authority for verified per-World Wide Web Consortium (W3C) module status, including which modules are Verified versus Claimed.

The subsystem implements a curated CSS subset at fixed specification pins. Behavioral spec mappings for the cascade are documented with clause and chunk identifiers on css-resolver. Per-module conformance status appears in the CSS support matrix.

Enterprise capability. Premium widens CSS coverage (advanced print and additional modules) on the identical single-pass pipeline. The architecture, caps, and layer contracts stay the same across editions. See the CSS support matrix.