Skip to content

ContentStream: PDF content-stream emitter

The ContentStream module emits Portable Document Format (PDF) marked-content operators. It opens and closes structure tags and artifacts, tracks nesting depth, and returns the operator buffer.

Terminal window
composer require nextpdf/core:^3

ContentStreamBuilder is the module’s single class. It builds the marked-content layer of a page content stream. A content stream encodes page content as a sequence of operators — ISO 32000-2 §8. The builder then emits marked-content operators around that content.

append() adds raw operator bytes verbatim. The builder does not escape this input. You own its validity. Use this boundary when the HTML pipeline and the Graphics module need to interleave their own operators.

beginTag() opens a structure-tagged sequence. It emits a BDC operator with an MCID property list, per ISO 32000-2 §14.6. endTag() emits the matching EMC operator. The builder counts nesting depth. If you call endTag() with no open sequence, it throws PageLayoutException instead of writing an unbalanced EMC.

beginArtifact() opens an artifact sequence. Use artifacts for pagination decoration — headers, footers, page numbers, and rules — that must stay out of the structure tree, per ISO 32000-2 §14.8.2.2. The subtype is one of four ISO values: Pagination, Layout, Page, or Background. Prefer the typed ArtifactSubtype enum. The string overload is validated against the enum, so a non-standard value fails immediately.

relabelTag() rewrites a previously emitted tag in place. finish() returns the full buffer and throws if marked content is unbalanced. drain() returns the buffer so far without the balance check, for incremental streaming. peek() returns the buffer without consuming it. reset() clears the state.

MethodSignatureRole
append()append(string $raw): voidAdds raw operator bytes verbatim (no escaping)
beginTag()beginTag(string $structType, int $mcid): voidOpens a BDC structure sequence
endTag()endTag(): voidCloses the innermost sequence with EMC
beginArtifact()beginArtifact(ArtifactSubtype|string $type): voidOpens an artifact sequence
endArtifact()endArtifact(): voidCloses the innermost artifact
getMarkedContentDepth()getMarkedContentDepth(): intReturns current nesting depth
relabelTag()relabelTag(string $old, string $new, int $mcid): voidRewrites an emitted tag in place
finish()finish(): stringReturns the full buffer; throws if unbalanced
drain()drain(): stringReturns the buffer without the balance check
peek()peek(): stringReturns the buffer without consuming it
reset()reset(): voidClears all state

Run composer docs:generate-api-php -- --module=ContentStream to generate the full PHPDoc table.

<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\ContentStream\ContentStreamBuilder;
$builder = new ContentStreamBuilder();
$builder->beginTag('P', mcid: 0);
$builder->append("BT /F1 12 Tf 72 720 Td (Hello) Tj ET\n");
$builder->endTag();
$pageContent = $builder->finish();

Use this pattern to wrap a paragraph in a structure tag and a footer in an artifact. The pattern streams the buffer incrementally with drain().

<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Accessibility\ArtifactSubtype;
use NextPDF\ContentStream\ContentStreamBuilder;
$builder = new ContentStreamBuilder();
$builder->beginTag('H1', mcid: 0);
$builder->append($titleOperators);
$builder->endTag();
$builder->beginArtifact(ArtifactSubtype::Pagination);
$builder->append($footerOperators);
$builder->endArtifact();
if ($builder->getMarkedContentDepth() !== 0) {
throw new RuntimeException('Unbalanced marked content before flush.');
}
$chunk = $builder->drain();
  • append() does not escape input. Pass only valid operator bytes. The builder trusts the caller.
  • endTag() and endArtifact() throw on underflow. Never close a sequence that is not open.
  • finish() checks balance and throws when depth is not zero. drain() does not check. Use drain() only for incremental streaming.
  • The depth counter does not distinguish tags from artifacts. EMC closes the innermost sequence of either kind. Nest sequences in strict order.
  • The string overload of beginArtifact() is validated against the enum. A non-standard subtype fails at the call, not in the output.
  • relabelTag() rewrites an emitted tag. Use the same mcid you used to emit it.

Each operation is an O(1) string append, except relabelTag(), which performs an O(buffer) rewrite. The module holds one string buffer and one integer depth counter. It performs no parsing and allocates only the buffer. The reference workload budget is 1500 ms wall and 64 MB peak. This module remains far below it.

append() is the trust boundary. The builder writes bytes verbatim, so upstream code must escape any string that reaches a literal-string operator. The canonical escaper is PdfStringEscaper::escapeLiteral() (ADR-015). Never pass unescaped user text through append(). The balance checks in endTag(), endArtifact(), and finish() prevent a malformed marked-content tree from reaching the Writer. See /modules/core/security/ for the document threat model.

The module emits marked-content operator structures consistent with ISO 32000-2: BDC/EMC pairs with an MCID property list per §14.6, and artifact sequences per §14.8.2.2. These are implementation facts. The evidence is src/ContentStream/ContentStreamBuilder.php, the src/Accessibility/ArtifactSubtype.php enum, and tests/Unit/ContentStream/ContentStreamBuilderMarkedContentBalanceCoverageTest plus ContentStreamBuilderRelabelTagInvariantTest. They are not a claim of end-to-end PDF/UA-2 or PDF 2.0 conformance. An external oracle validates the tagged-PDF structure that these operators participate in: tests/Integration/Accessibility/VeraPdfUa2GoldenTest checks a generated fixture against veraPDF for the PDF/UA-2 profile. That oracle test skips when the veraPDF binary is absent, so it is an opt-in gate. State that this module “produces marked-content structures; PDF/UA-2 conformance is validated by veraPDF” rather than asserting unqualified conformance.