Skip to content

Custom layout engines and layout-time text interception

NextPDF does not expose a pluggable layout-engine interface. Use the public layout-extension contract, TextPreprocessorInterface, to hook text at layout time. Content lifecycle events let you observe what layout produces.

Terminal window
composer require nextpdf/core:^3

The layout pipeline is internal. It covers glyph layout, font subsetting, ToUnicode CMap output, and the structure tree. NextPDF does not let you replace it. Stable byte output and tagged-PDF conformance depend on one controlled build.

NextPDF does expose the point before layout: TextPreprocessorInterface. An implementation receives raw text and returns a segmented result before that text enters glyph layout, font subsetting, the ToUnicode CMap, or the structure tree. Use this supported path to change text content without touching the layout engine.

The source PHPDoc sets a hard rule: an implementation must not change how layout works. It must not add layout-affecting characters such as line feed, carriage return, or tab, and it must preserve logical reading order. The preprocessor states a content swap; it does not make layout choices. Honor this rule, or stable output and accessibility break.

To observe the result of layout, not change it, use the content lifecycle events in Action triggers and event listeners. ContentRenderedEvent fires after content is drawn to a page. FontLoadedEvent fires once per font family and style.

NextPDF\Contracts\TextPreprocessorInterface (stable, since 1.9.0):

MethodReturnsPurpose
process(string $text)TextPreprocessResultTransform raw text before the render pipeline, and return a segmented result with redaction metadata.

The returned NextPDF\Contracts\TextPreprocessResult is a frozen value object. Its constructor signature and public properties are stable and do not change in a minor or patch release. New methods may be added.

The small preprocessor below masks a fixed token. It adds no layout-affecting characters and keeps reading order.

<?php
declare(strict_types=1);
use NextPDF\Contracts\TextPreprocessorInterface;
use NextPDF\Contracts\TextPreprocessResult;
use NextPDF\Contracts\TextSegment;
final class TokenMaskingPreprocessor implements TextPreprocessorInterface
{
public function process(string $text): TextPreprocessResult
{
$masked = \str_replace('SECRET-TOKEN', '••••••••••••', $text);
return new TextPreprocessResult([
new TextSegment($masked, redacted: $masked !== $text),
]);
}
}

A production preprocessor keeps matching rules in one place. It fails closed on a bad pattern and never logs the original text.

<?php
declare(strict_types=1);
use NextPDF\Contracts\TextPreprocessorInterface;
use NextPDF\Contracts\TextPreprocessResult;
use NextPDF\Contracts\TextSegment;
use Psr\Log\LoggerInterface;
final class PatternRedactionPreprocessor implements TextPreprocessorInterface
{
/**
* @param non-empty-string $pattern A valid PCRE pattern for sensitive spans
*/
public function __construct(
private readonly string $pattern,
private readonly LoggerInterface $logger,
) {}
public function process(string $text): TextPreprocessResult
{
$result = \preg_replace($this->pattern, '[REDACTED]', $text);
if ($result === null) {
// Fail closed: never emit unredacted text on a pattern error.
$this->logger->error('Redaction pattern failed; substituting empty text');
return new TextPreprocessResult([new TextSegment('', redacted: true)]);
}
return new TextPreprocessResult([
new TextSegment($result, redacted: $result !== $text),
]);
}
}
  • No layout replacement. You cannot replace box layout, line breaking, or pagination through this contract. Plugging in a third-party layout engine is out of scope by design.
  • Rule enforcement. If you add \n, \r, or \t in process(), you corrupt layout and break stable output. The engine trusts this rule; it does not re-check your output for layout-affecting characters.
  • Reading order. If you reorder segments, you break tagged-PDF reading order and PDF/UA conformance.
  • One responsibility. The preprocessor states a content swap. Use lifecycle events to watch, and do not push side effects through process().

process() runs once per text run on the layout hot path. Keep memory use low. Compile patterns once in the constructor, not on each call. Content lifecycle events cost nothing when no listener is bound.

Use TextPreprocessorInterface to remove sensitive content before it reaches the content stream, font subsets, or metadata. Because it runs before subsetting and the ToUnicode CMap, redacted glyphs never enter the file. Treat a preprocessor failure as fail-closed, and emit empty or masked text rather than the original.

This page makes no normative signing or archival claims. The reading-order rule aligns the contract with tagged-PDF needs. The accessibility reference covers tag-level conformance.

NextPDF Pro provides production text-preprocessing strategies, including personally identifiable information (PII) redaction tuned for common document types. In Core, you write TextPreprocessorInterface yourself, or you use a verified paid-edition build through the same public contract.

The glossary defines text preprocessor and extension point; see the published glossary for each canonical definition.