Ir al contenido

The HTML pipeline

Esta página aún no está disponible en tu idioma.

Spec: CSS Cascade 5, §6.1 Spec: CSS Display 3, §2 Evidence: Code-backed

NextPDF renders HTML and CSS to PDF inside your PHP process: no browser, no subprocess by default. This page explains the layered stages that conversion moves through, what the CSS engine actually covers, and the case where delegating to a real browser renderer is the honest choice.

“HTML to PDF” sounds like one operation. It is actually a cascade, a box model, a layout pass, and a paint pass. Each is a well-specified problem with its own failure modes. An engine that fuses them into one procedure is fragile. A change to color parsing can move a box, and the only way to know is to render and look.

The in-process model has a real advantage: no browser to install, no sandbox to operate, and no process boundary to marshal across. But it only pays off if the conversion is decomposed cleanly enough to test each concern on its own. The architecture is what makes “render HTML in PHP” trustworthy rather than merely possible.

  • HTML/CSS conversion runs in-process via writeHtml(). The result is native PDF content, not an image of a page.
  • It is single-pass and streaming. The tokenizer produces a token list. The parser consumes it left to right, and no full DOM tree is retained (ADR-001). Hard caps bound element count and nesting depth.
  • The engine is organized as explicit layers: CSS parsing and applicators, style state, layout and formatting, paint, and paged media — with strict rules about which layer may do what (ADR-010).
  • The CSS engine covers the cascade, the box model, and common layout (block, inline, tables, floats, and more) — substantial, but a defined subset of what a modern browser implements.
  • When you need exact browser fidelity for arbitrary modern CSS, NextPDF can delegate to a headless browser renderer through an optional extension — a deliberate, network-isolated seam, not the default path.

The conversion is a sequence of stages, each consuming the previous stage’s typed output.

  1. Tokenize HTML becomes an ordered token list — no retained DOM tree.
  2. Resolve CSS Parse styles; the cascade and applicators compute typed values.
  3. Style state A push/pop style stack carries computed values per nesting level.
  4. Layout Block, inline, table, and float geometry computed; no paint here.
  5. Paint Borders, backgrounds, text, and decorations emit PDF operators.
  6. Paged media Page-break and @page rules applied as the cursor crosses page bounds.
The in-process HTML pipeline: a single left-to-right pass over a token stream, with CSS resolution, style state, layout, and paint as separate layers, and paged-media breaks applied as the cursor advances.

Two architectural rules make this more than a flow.

Layers have contracts. CSS text is read only inside applicator classes. Layout code computes geometry but emits no paint operators. Paint code reads an immutable computed-style snapshot, never the mutable layout-tracking state. Paged-media code triggers breaks but delegates page decoration to the paint layer. These boundaries are enforced (ADR-010). That is why a new CSS property is a new applicator, rather than a change that ripples through the parser, the layout dispatch, and the painter at once.

There is no DOM. The pipeline is single-pass and streaming by decision (ADR-001): at most one style state per nesting level plus the active cursor, not one object per element. A few operations genuinely need look-ahead — table column sizing, :has(), :last-child. These are handled by bounded pre-scan index structures over the flat token list, not by retaining a tree. Element count and nesting depth are hard-capped, so a pathological input fails fast instead of exhausting memory.

The CSS engine resolves real CSS semantics, not a lookalike. Competing declarations are reduced to one value per property by origin, importance, layer, specificity, and order — the actual cascade. Layout follows the box model. A box’s type and the formatting context it establishes decide how it and its in-flow siblings are placed. The engine’s source is organized around exactly these concerns (cascade, box/display, flex, float, tables, fragmentation). That is why you can reason about its behavior against the specifications, rather than discover it empirically.

This page is Evidence: Code-backed . The stages and rules map to the core repository:

  • The in-process entry point is writeHtml(string $html): static in src/Core/Concerns/HasTextOutput.php.
  • The single-pass, no-retained-DOM design with element and nesting caps is ADR-001 and the tokenizer/parser/style-stack code in src/Html/.
  • The layered engine contract — CSS parsing/applicators, style state, layout, paint, paged media — is ADR-010, reflected in the src/Html/ layout (for example Cascade/, Css/, Flex/, Float/, Fragmentation/, and the applicator classes).
  • The browser-delegation seam is writeHtmlChrome() in the same file, documented as requiring the optional renderer extension plus a Chrome/Chromium binary.

The standards anchor the coverage claim honestly. The cascade reduces competing declarations to a single value per property — origin, importance, layer, specificity, order — per Spec: CSS Cascade 5, §6.1 , and in-flow placement follows box and formatting-context rules per Spec: CSS Display 3, §2 . Equally important is the boundary: a feature query exists precisely because not every processor supports every feature per Spec: CSS Conditional 5, §2 . NextPDF’s CSS engine is a defined, specification-aligned subset, and stating that plainly is part of the contract.

In-process rendering is one call. The output is selectable PDF text, not a rasterized page:

<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Document;
$doc = Document::createStandalone();
$doc->setTitle('HTML Basic');
$doc->addPage();
$html = <<<'HTML'
<h1 style="color: #1E3A8A;">HTML Rendering in NextPDF</h1>
<p>NextPDF renders <strong>HTML and CSS</strong> directly into PDF pages,
<em>in-process</em>.</p>
<ul>
<li>Headings, paragraphs, bold and italic</li>
<li>Lists, tables, inline styles</li>
</ul>
HTML;
$doc->writeHtml($html);
$doc->save(__DIR__ . '/html-basic.pdf');

If the same document required arbitrary modern CSS at exact browser fidelity, the call would instead be writeHtmlChrome($html) — same document, different rendering path, and a deliberate dependency on the optional browser renderer.

The recurring misconception is that an HTML-to-PDF engine is “basically a browser.” It is not, and it does not claim to be. A browser is a vast, continuously updated implementation of the entire web platform. NextPDF’s in-process engine is a specification-aligned subset focused on document layout. The honest mental model is “a competent print-document CSS engine,” not “Chrome in PHP.” When you genuinely need the full platform, that is what writeHtmlChrome() is for. It is a separate, opt-in path with its own operational footprint, not a silent fallback.

A second misconception: assuming the browser path is merely “render the page over the network.” It is the opposite by construction. The delegation seam always renders with subresource network access blocked — no remote images, fonts, stylesheets, or frames — so it cannot become an outbound-request vector. Pixel fidelity, yes; an open network egress, no.

This page explains the pipeline’s shape and the in-process / browser choice. It is not a CSS support matrix. Which exact properties, modules, and selectors the in-process engine covers is defined by the code and its conformance tests, not by this overview. That coverage evolves. The browser-delegation path requires an optional extension and a Chrome/Chromium binary. Its setup, operational characteristics, and the internal layout of that extension are out of scope here and documented with that package. “In-process” describes the default writeHtml() path. It is not a claim that every rendering path avoids a subprocess. The architectural claims are accurate as of this page’s review date. The authoritative sources are src/Html/, ADR-001, and ADR-010 in the core repository.

The in-process CSS engine is a Core capability. The browser-delegation seam is an optional extension, surfaced here only at the capability level:

HTML rendering paths — edition availability
Edition Availability
Core Core provides the in-process HTML/CSS engine (writeHtml).
Pro The browser-delegation path is an optional add-on extension, independent of edition tier.
Enterprise The browser-delegation path is an optional add-on extension, independent of edition tier.
  • In-process rendering — converting HTML/CSS to PDF inside the PHP process, with no browser or default subprocess (writeHtml()).
  • Single-pass / streaming — consuming a token stream left to right without retaining a full DOM tree (ADR-001).
  • Cascade — the CSS process that resolves competing declarations into one value per property by origin, importance, layer, specificity, and order.
  • Formatting context — the layout environment a box establishes that governs how its in-flow contents are placed.
  • Engine layer contract — the enforced rule set (ADR-010) defining what the parsing, style, layout, paint, and paged-media layers may each do.
  • Browser-delegation seam — the optional writeHtmlChrome() path that renders via a headless browser with subresource network access blocked.