Validate conformance: in-process pre-check plus the external oracle
At a glance
Section titled “At a glance”Use this recipe to run NextPDF’s pure-PHP, in-process conformance validators
as a fast structural pre-check, then send the authoritative conformance
decision to an independent validator. In-process checks are necessary,
not sufficient: a clean result is a structural fact, not a conformance
verdict. The recipe uses examples/33-validate-conformance.php and its
tests/Cookbook/Php/ValidateConformanceRecipeTest.php harness.
Install
Section titled “Install”composer require nextpdf/core:^3The in-process validators need no external toolchain. For the authoritative
gate step, you need an external validator on PATH. The example uses
veraPDF. You do not need a Pro or Enterprise package.
Conceptual overview
Section titled “Conceptual overview”NextPDF includes in-process validators under
\NextPDF\Compliance\Validator. They verify specific normative
invariants without starting an external process:
PdfRValidator— runs ISO 23504-1 (PDF/R-1) §5/§6 byte-stream checks: the file-header allowlist, generation-0 objects, the §6.5.7 page-content operator allowlist (q/Q/cm/Doonly), and the §6.4.3 Info-dict key allowlist. It returns a flatPdfRValidationFinding[]; an empty list means every gated §6 check passed.ArlingtonValidator— runs the PDF Association’s machine-readable Arlington grammar in report-only mode. It never gates the build, and it records the pinned grammar commit SHA on every finding so audit consumers can correlate against a known upstream snapshot.
These checks are deliberately scoped. They catch drift between an emission contract and a spec, but they do not establish ISO conformance for a profile such as PDF/A-4 or PDF/UA-2. An independent validator makes that determination, and its verdict is the build gate (ISO 19005-4 §6.7.3 makes this explicit for PDF/A). The recipe keeps the boundary clear: it runs the pre-check in process, then prints and runs the external-oracle command that decides.
The diagram below shows the two-stage gate. One rule governs the flow: only the external oracle’s verdict may be reported as conformance.
API surface
Section titled “API surface”The API surface is generated from PHPDoc. Use these main entry points:
\NextPDF\Compliance\Validator\PdfRValidator::validate(string $pdfBytes): list<PdfRValidationFinding>\NextPDF\Compliance\Validator\PdfRValidationFinding(readonly:clause,severity,message)\NextPDF\Compliance\Validator\ArlingtonValidator::validateReportOnly(string $pdfPath): list<ArlingtonFinding>\NextPDF\Core\Document::output(?string $filename, OutputDestination $dest): string(OutputDestination::Stringfor raw bytes)
Code sample — Quick start
Section titled “Code sample — Quick start”<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Compliance\Validator\PdfRValidator;use NextPDF\Contracts\OutputDestination;use NextPDF\Core\Document;
$doc = Document::createStandalone();$doc->addPage();$doc->setFont('helvetica', '', 12);$doc->cell(0, 10, 'Document under conformance review.', newLine: true);
$bytes = $doc->output(dest: OutputDestination::String);
$findings = (new PdfRValidator())->validate($bytes);
// A finding list is a structural fact, not a conformance verdict.echo $findings === [] ? "No in-process PDF/R-1 findings (necessary, not sufficient).\n" : count($findings) . " in-process finding(s); not a conformance verdict.\n";Code sample — Production
Section titled “Code sample — Production”In production, treat the in-process validator as a cheap gate that fails fast on obvious structural drift. Then run the external oracle as the authoritative conformance decision. Only the oracle’s verdict may be reported as conformance.
$bytes = $doc->output(dest: OutputDestination::String);$doc->save($out);
// 1. In-process pre-check — necessary, not sufficient.$findings = (new PdfRValidator())->validate($bytes);foreach ($findings as $finding) { fwrite(STDERR, sprintf("[%s] §%s — %s\n", $finding->severity, $finding->clause, $finding->message));}
// 2. The authoritative gate — the external validator decides.$exitCode = 0;$report = [];exec('verapdf --flavour 4 ' . escapeshellarg($out), $report, $exitCode);
if ($exitCode !== 0) { fwrite(STDERR, "veraPDF FAILED — not reported conforming\n"); fwrite(STDERR, implode("\n", $report) . "\n"); exit(1);}
echo "veraPDF PASS — the validator reports the file conforming\n";Run the example with php examples/33-validate-conformance.php. It builds
an ordinary PDF and prints the in-process findings. An ordinary PDF is
expected to produce PDF/R-1 findings; that outcome is the teaching point.
The example then prints the authoritative external-oracle command.
Edge cases & gotchas
Section titled “Edge cases & gotchas”- Necessary, not sufficient. An empty
PdfRValidatorfinding list means the gated §6 checks passed, nothing more. It is not a PDF/A-4 or PDF/UA-2 conformance claim. Never report conformance from an in-process result alone. - An ordinary PDF fails PDF/R-1 by design. PDF/R-1 is an image-only raster profile; an ordinary text PDF legitimately produces §6.5.7 and §6.4.3 findings. The example shows this on purpose to make the point: in-process output is a structural fact, not a verdict.
- Arlington is report-only.
ArlingtonValidator::validateReportOnly()never throws and never gates. In grammar-only mode, it emits oneinfofinding proving the pinned grammar SHA loaded; it returns an empty list when the grammar is not materialized. Do not build a pass/fail gate on it — it is a cross-check artifact. - Bytes vs. file.
PdfRValidator::validate()takes the raw byte string (OutputDestination::String); the external oracle needs a file path. Save the file withsave()for the oracle step. - Empty input. Passing an empty or header-less string to
PdfRValidator::validate()returns a§6.2.2error finding rather than throwing. Check the finding list; do not assume an exception.
Performance
Section titled “Performance”The in-process validators use single-pass regular-expression and byte scans over the PDF. They are fast with low allocation use for typical documents, and they stay inside the 2000 ms / 128 MB budget. When present, the external oracle dominates wall time, but it runs out of process. The semantic reproducibility profile applies. The example’s value is its observable validation behavior, and the harness checks that behavior through a structural abstract syntax tree (AST) plus metadata comparison.
Security notes
Section titled “Security notes”Data Residency & PII Mitigations
Section titled “Data Residency & PII Mitigations”The validators read the document bytes in process, and nothing leaves the process. The external oracle, however, receives the file. If you run a hosted validator, document content leaves your boundary. For sensitive content, prefer a local validator binary, or redact before validating.
Safe Telemetry & Log Scrubbing
Section titled “Safe Telemetry & Log Scrubbing”Findings can quote object paths and operator fragments. The example writes findings to STDERR and a fixed progress line to STDOUT. For sensitive documents, keep finding logs out of shared sinks. Never log the raw PDF bytes.
Threat model
Section titled “Threat model”A clean in-process result is not an integrity or authenticity signal. A hostile producer can craft a file that passes the scoped in-process checks yet fails the full validator, or that is well-formed but misleading. Treat the in-process pass as a fast filter, never as trust.
FIPS-mode behavior
Section titled “FIPS-mode behavior”This recipe performs no cryptographic operation. Federal Information Processing Standards (FIPS) mode does not change its behavior. No signing, encryption, or digest of trust material occurs.
Conformance
Section titled “Conformance”| Statement | Spec | Clause | reference_id |
|---|---|---|---|
| PDF/R-1 page content uses only the q/Q/cm/Do operator allowlist. | ISO 23504-1 | §6.5.7 | |
| PDF/R-1 pages are image-only raster content. | ISO 23504-1 | §6.5.5 | |
| PDF/R-1 constrains the document information dictionary keys. | ISO 23504-1 | §6.4.4 | |
| The Arlington grammar is a machine-readable object-model cross-check. | Arlington PDF Model | grammar | |
| A validator, not the producer, decides conformance. | ISO 19005-4 | §6.7.3 |
NextPDF’s in-process validators verify specific normative invariants. Support is not conformance; validation is not certification. A clean in-process result does not establish ISO conformance; an independent validator (for example veraPDF) makes that determination. Use its verdict as the build gate.