Skip to content

The pipeline model

Spec: ISO 32000-2, §7.5 Evidence: Code-backed

A NextPDF document is not produced in one opaque step. It moves through a small number of explicit stages: a facade that records intent, a content layer that turns intent into a model, and a writer that serializes that model into a conforming PDF. This page explains that shape and why it is the right shape for the engine.

The PDF file format is itself a layered structure — a header, a body of objects, a cross-reference table, and a trailer — and a writer has to assemble all of it consistently. If the engine that builds it is a single tangled procedure, every change risks every output. The only way to gain confidence is then to render whole documents and inspect them by eye, which is slow, late, and unconvincing.

An explicit pipeline turns that around. Each stage has one job and a typed boundary, so you can reason about a change and test it at the stage it touches, not only at the end of the file. The architecture is a testability and extensibility decision before it is anything else.

  • The public entry point is a Document facade. It is a fluent, use-once, worker-safe builder that records what you want, not how it is serialized.
  • The facade delegates to roughly two dozen focused concern traits (text output, drawing, pages, security, navigation, and so on) — one responsibility each, not one giant class.
  • Content arrives by one of two paths: direct drawing (graphics primitives) or the HTML/CSS engine. Both produce the same internal document model.
  • A dedicated PDF writer serializes that model, choosing a PDF 1.4 / 1.7 / 2.0 strategy. Producing valid file structure lives here and nowhere else.
  • Long-lived state (font and image registries) is process-scoped and shared; per-request state (the document) is created fresh and never reused. The boundary is explicit, which is what makes worker runtimes safe.

The cleanest way to see the model is to follow a document from call to bytes.

  1. Document facade Fluent, use-once builder; records intent via concern traits.
  2. Content production Direct drawing or the HTML/CSS engine — both build one document model.
  3. Document model Accumulated pages, content, and resources held as typed state.
  4. PDF writer Serialises the model; selects a PDF 1.4 / 1.7 / 2.0 strategy.
  5. Conforming PDF Header, object body, cross-reference table, trailer.
A document's path through NextPDF: each stage has one responsibility and a typed boundary, so it can be reasoned about and tested in isolation.

Two design choices make this more than a diagram.

The facade is composed, not monolithic. Document does not implement every feature itself; it delegates each area to a dedicated concern trait — text output, drawing, pages, security, typography, navigation, transactions, and so on. A new document method belongs in the trait that owns its area, not on the facade itself. The class you call stays small, and the responsibilities stay separated.

The writer owns file structure exclusively. Content production decides what marks and objects exist; the writer decides how they become a valid PDF file, including which version strategy applies. That separation is enforced as an architectural rule: layout and content code do not emit final file structure, and the writer does not make layout decisions. The benefit is that “is the output a valid PDF?” has exactly one place to be tested.

The lifetime boundary is part of the model, not an afterthought. Font and image registries live for the life of the process and are shared across requests; the document, its rendering context, and the writer are created per request and disposed. In a worker runtime that distinction is the difference between safe reuse and cross-request corruption. For that reason it is stated in the architecture, not left to discipline.

This page is Evidence: Code-backed . The stages map to real structure in the core repository:

  • The facade and its delegation are src/Core/Document.php plus the concern traits in src/Core/Concerns/ (text output, output, drawing, pages, security, typography, navigation, transactions, and more — each a single responsibility).
  • The two content paths are the HTML/CSS engine (src/Html/) and direct drawing (src/Graphics/), both feeding the internal model.
  • Serialization and PDF version strategy live in src/Writer/ (PdfWriter.php, with explicit PDF 1.4 / 1.7 / 2.0 strategy classes).
  • The process-lifetime vs per-request boundary is the worker-safe design recorded in the architecture overview and exercised by the shipped worker-factory example, which shares a FontRegistry and ImageRegistry across requests while creating each Document fresh.

The destination is fixed by the format. The writer’s output must be a header, an object body, a cross-reference table, and a trailer per Spec: ISO 32000-2, §7.5 . Concentrating that obligation in one stage is what lets the rest of the engine stay focused on content instead of on assembling file structure.

The facade’s job is to make intent read like intent. The content path and the writer stay invisible at the call site:

<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Document;
$doc = Document::createStandalone(); // facade
$doc->setTitle('Quarterly Report'); // metadata concern
$doc->addPage(); // pages concern
$doc->setFont('helvetica', 'B', 16); // typography concern
$doc->cell(0, 12, 'Summary', newLine: true); // text-output concern
$doc->writeHtml('<p>Generated in-process.</p>'); // HTML content path
$doc->save(__DIR__ . '/report.pdf'); // writer stage

Each call lands in a different concern. Two different content paths feed the same model. Exactly one stage — save() — turns the model into file bytes. Nothing at the call site needs to know how the cross-reference table is built.

The frequent misreading is that “pipeline” implies a streaming push API you wire stage by stage, like a Unix pipe. It does not. The pipeline here is an architectural decomposition: stages with single responsibilities and typed boundaries. You still program against a fluent facade. The stages are how the engine is built and tested, not a transport you assemble by hand.

A related mistake is assuming the facade is the engine. It is the entry point. The real work is distributed across concern traits, two content paths, and a writer. That distribution is precisely why one feature change does not put every output at risk.

This page describes the shape of the pipeline, not the internal API of any single stage. The exact concern-trait inventory, writer strategy selection rules, and content-model fields are defined by the code and the reference, not by this explanation. The precise trait count is an implementation detail that can change without changing the model. This page does not cover the HTML engine’s internal stages (a separate topic) or the streaming and memory behavior of the writer (also separate). The structural claims are accurate as of this page’s review date; the authoritative source is the core repository’s src/Core/, src/Html/, src/Graphics/, and src/Writer/.

The pipeline model is identical across editions; editions add capabilities within stages, not new stages:

Pipeline model — edition availability
Edition Availability
Core Core implements the full facade → content → writer pipeline.
Pro Pro adds capabilities within existing stages, not new stages.
Enterprise Enterprise adds capabilities within existing stages, not new stages.
  • Facade — the public Document entry point: a fluent, use-once builder that records intent and delegates to concern traits.
  • Concern trait — a focused PHP trait the facade composes, each owning a single feature area (text output, drawing, pages, security, and so on).
  • Content path — one of the two ways content enters the model: direct drawing or the HTML/CSS engine.
  • Document model — the engine’s internal, typed accumulation of pages, content, and resources before serialization.
  • Writer stage — the component that serializes the model into a valid PDF, selecting a PDF 1.4 / 1.7 / 2.0 strategy.
  • Worker-safe — designed so process-lifetime state is shared safely while per-request state is created fresh and never reused.