Skip to content

Validate conformance: in-process pre-check plus the external oracle

Use this recipe to run NextPDF’s pure-PHP, in-process conformance validators as a fast structural pre-check, then send the authoritative conformance decision to an independent validator. In-process checks are necessary, not sufficient: a clean result is a structural fact, not a conformance verdict. The recipe uses examples/33-validate-conformance.php and its tests/Cookbook/Php/ValidateConformanceRecipeTest.php harness.

Terminal window
composer require nextpdf/core:^3

The in-process validators need no external toolchain. For the authoritative gate step, you need an external validator on PATH. The example uses veraPDF. You do not need a Pro or Enterprise package.

NextPDF includes in-process validators under \NextPDF\Compliance\Validator. They verify specific normative invariants without starting an external process:

  • PdfRValidator — runs ISO 23504-1 (PDF/R-1) §5/§6 byte-stream checks: the file-header allowlist, generation-0 objects, the §6.5.7 page-content operator allowlist (q/Q/cm/Do only), and the §6.4.3 Info-dict key allowlist. It returns a flat PdfRValidationFinding[]; an empty list means every gated §6 check passed.
  • ArlingtonValidator — runs the PDF Association’s machine-readable Arlington grammar in report-only mode. It never gates the build, and it records the pinned grammar commit SHA on every finding so audit consumers can correlate against a known upstream snapshot.

These checks are deliberately scoped. They catch drift between an emission contract and a spec, but they do not establish ISO conformance for a profile such as PDF/A-4 or PDF/UA-2. An independent validator makes that determination, and its verdict is the build gate (ISO 19005-4 §6.7.3 makes this explicit for PDF/A). The recipe keeps the boundary clear: it runs the pre-check in process, then prints and runs the external-oracle command that decides.

The diagram below shows the two-stage gate. One rule governs the flow: only the external oracle’s verdict may be reported as conformance.

Findings

Clean

Pass

Fail

Produced PDF bytes

In-process pre-check

PdfRValidator / Arlington

Structural drift?

Fail fast — cheap reject

NOT a conformance verdict

Necessary, not sufficient

never report as conformance

Independent external validator

the authoritative oracle

Oracle verdict

May report file conforming

Not conforming — do not ship

Diagram

The API surface is generated from PHPDoc. Use these main entry points:

  • \NextPDF\Compliance\Validator\PdfRValidator::validate(string $pdfBytes): list<PdfRValidationFinding>
  • \NextPDF\Compliance\Validator\PdfRValidationFinding (readonly: clause, severity, message)
  • \NextPDF\Compliance\Validator\ArlingtonValidator::validateReportOnly(string $pdfPath): list<ArlingtonFinding>
  • \NextPDF\Core\Document::output(?string $filename, OutputDestination $dest): string (OutputDestination::String for raw bytes)
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Compliance\Validator\PdfRValidator;
use NextPDF\Contracts\OutputDestination;
use NextPDF\Core\Document;
$doc = Document::createStandalone();
$doc->addPage();
$doc->setFont('helvetica', '', 12);
$doc->cell(0, 10, 'Document under conformance review.', newLine: true);
$bytes = $doc->output(dest: OutputDestination::String);
$findings = (new PdfRValidator())->validate($bytes);
// A finding list is a structural fact, not a conformance verdict.
echo $findings === []
? "No in-process PDF/R-1 findings (necessary, not sufficient).\n"
: count($findings) . " in-process finding(s); not a conformance verdict.\n";

In production, treat the in-process validator as a cheap gate that fails fast on obvious structural drift. Then run the external oracle as the authoritative conformance decision. Only the oracle’s verdict may be reported as conformance.

examples/33-validate-conformance.php (gate core)
$bytes = $doc->output(dest: OutputDestination::String);
$doc->save($out);
// 1. In-process pre-check — necessary, not sufficient.
$findings = (new PdfRValidator())->validate($bytes);
foreach ($findings as $finding) {
fwrite(STDERR, sprintf("[%s] §%s — %s\n",
$finding->severity, $finding->clause, $finding->message));
}
// 2. The authoritative gate — the external validator decides.
$exitCode = 0;
$report = [];
exec('verapdf --flavour 4 ' . escapeshellarg($out), $report, $exitCode);
if ($exitCode !== 0) {
fwrite(STDERR, "veraPDF FAILED — not reported conforming\n");
fwrite(STDERR, implode("\n", $report) . "\n");
exit(1);
}
echo "veraPDF PASS — the validator reports the file conforming\n";

Run the example with php examples/33-validate-conformance.php. It builds an ordinary PDF and prints the in-process findings. An ordinary PDF is expected to produce PDF/R-1 findings; that outcome is the teaching point. The example then prints the authoritative external-oracle command.

  • Necessary, not sufficient. An empty PdfRValidator finding list means the gated §6 checks passed, nothing more. It is not a PDF/A-4 or PDF/UA-2 conformance claim. Never report conformance from an in-process result alone.
  • An ordinary PDF fails PDF/R-1 by design. PDF/R-1 is an image-only raster profile; an ordinary text PDF legitimately produces §6.5.7 and §6.4.3 findings. The example shows this on purpose to make the point: in-process output is a structural fact, not a verdict.
  • Arlington is report-only. ArlingtonValidator::validateReportOnly() never throws and never gates. In grammar-only mode, it emits one info finding proving the pinned grammar SHA loaded; it returns an empty list when the grammar is not materialized. Do not build a pass/fail gate on it — it is a cross-check artifact.
  • Bytes vs. file. PdfRValidator::validate() takes the raw byte string (OutputDestination::String); the external oracle needs a file path. Save the file with save() for the oracle step.
  • Empty input. Passing an empty or header-less string to PdfRValidator::validate() returns a §6.2.2 error finding rather than throwing. Check the finding list; do not assume an exception.

The in-process validators use single-pass regular-expression and byte scans over the PDF. They are fast with low allocation use for typical documents, and they stay inside the 2000 ms / 128 MB budget. When present, the external oracle dominates wall time, but it runs out of process. The semantic reproducibility profile applies. The example’s value is its observable validation behavior, and the harness checks that behavior through a structural abstract syntax tree (AST) plus metadata comparison.

The validators read the document bytes in process, and nothing leaves the process. The external oracle, however, receives the file. If you run a hosted validator, document content leaves your boundary. For sensitive content, prefer a local validator binary, or redact before validating.

Findings can quote object paths and operator fragments. The example writes findings to STDERR and a fixed progress line to STDOUT. For sensitive documents, keep finding logs out of shared sinks. Never log the raw PDF bytes.

A clean in-process result is not an integrity or authenticity signal. A hostile producer can craft a file that passes the scoped in-process checks yet fails the full validator, or that is well-formed but misleading. Treat the in-process pass as a fast filter, never as trust.

This recipe performs no cryptographic operation. Federal Information Processing Standards (FIPS) mode does not change its behavior. No signing, encryption, or digest of trust material occurs.

StatementSpecClausereference_id
PDF/R-1 page content uses only the q/Q/cm/Do operator allowlist.ISO 23504-1§6.5.7
PDF/R-1 pages are image-only raster content.ISO 23504-1§6.5.5
PDF/R-1 constrains the document information dictionary keys.ISO 23504-1§6.4.4
The Arlington grammar is a machine-readable object-model cross-check.Arlington PDF Modelgrammar
A validator, not the producer, decides conformance.ISO 19005-4§6.7.3

NextPDF’s in-process validators verify specific normative invariants. Support is not conformance; validation is not certification. A clean in-process result does not establish ISO conformance; an independent validator (for example veraPDF) makes that determination. Use its verdict as the build gate.