Skip to content

Compliance: PDF/R-1 validator, Arlington grammar, lifecycle tools

NextPDF\Compliance ships byte-stream validators and a grammar cross-check that read a finished Portable Document Format (PDF) file and report where it differs from a normative contract. When a validator returns zero findings, the result is checked against the clauses it implements. It is not a blanket certificate.

Terminal window
composer require nextpdf/core:^3

The module has three parts.

PdfRValidator validates a candidate ISO 23504-1:2020 (PDF/R-1) byte stream. It operates on raw bytes, not on the writer’s internal state. It catches drift between what the writer intends to emit and what the spec requires, and it is the final check. The implemented clause set is the v5.1.0 cluster: §5 version-identification comment, §6.2.2/§6.2.3 header allowlist, §6.2.4 generation-0 and object-stream prohibition, §6.5.7 content-stream operator allowlist (q, Q, cm, Do only), §6.6.1 image XObject key allowlist, §6.4.3 Info-dictionary key allowlist, and §6.3 Catalog key allowlist. §6.7 incremental updates and §6.8 encryption are explicitly out of scope for the initial cluster and declared as such in claims.json. The validator does not stop at the first finding. It collects every divergence in one pass so you can see the full diff.

PdfRConformancePolicy is an immutable policy for the recommended-but-informative bands around PDF/R-1. The normative §6 floor is never configurable. The policy controls only §A.5 implementation-limit recommendations, the §6.6.1 multi-strip discouragement, and the §A.6 XMP Extensible Metadata Platform (XMP) requirement for downstream PDF/A re-classification.

ArlingtonValidator runs the upstream PDF Association Arlington PDF model in report-only mode. It is advisory throughout the current cycle: validateReportOnly() never throws. It falls back through three modes. When the reference checker binary is available, it parses structured findings. When only the pinned grammar is available, it emits one info finding that proves the grammar pin loaded. When the grammar is unavailable, it returns an empty list. A WaiverRegistry lets the orchestrator suppress known-acceptable disagreements while preserving the audit trail.

The honesty rule matches the Cascading Style Sheets (CSS) support matrix and the conformance module. A clause is Verified only when a passing test exists and the normative clause is cited. A clause that is implemented but does not have a dedicated passing fixture is Claimed. Out-of-scope clauses are stated explicitly. They are not left ambiguous. A zero-finding PdfRValidator result asserts only the clauses it checks. It makes no claim about §6.7 or §6.8, which it does not implement.

TypeKindKey members
NextPDF\Compliance\Validator\PdfRValidatorfinal classvalidate(string $pdfBytes): list<PdfRValidationFinding>
NextPDF\Compliance\Validator\PdfRValidationFindingfinal readonly classstring $clause, 'error'|'warning'|'info' $severity, string $message
NextPDF\Compliance\Profile\PdfRConformancePolicyfinal readonly class__construct(bool $enforceA5ImplementationLimits = true, bool $rejectMultiStripPages = false, bool $requireXmpForA6Compatibility = false); lax(), strictArchival()
NextPDF\Compliance\Validator\ArlingtonValidatorfinal classvalidateReportOnly(string $pdfPath): list<ArlingtonFinding>
NextPDF\Compliance\Validator\WaiverRegistryfinal classisWaived(string $validator, string $ruleId, string $scopeKey): bool
<?php
declare(strict_types=1);
use NextPDF\Compliance\Validator\PdfRValidator;
$validator = new PdfRValidator();
$findings = $validator->validate(file_get_contents('candidate.pdf'));
if ($findings === []) {
// Zero divergences from the §6 clauses PdfRValidator implements.
// This is NOT a PDF/R-1 certificate — §6.7 and §6.8 are not checked.
echo "No PDF/R-1 §6 divergences detected (implemented clause set).\n";
} else {
foreach ($findings as $f) {
echo "[{$f->severity}] §{$f->clause}: {$f->message}\n";
}
}
<?php
declare(strict_types=1);
use NextPDF\Compliance\Validator\ArlingtonValidator;
use NextPDF\Compliance\Validator\ArlingtonGrammarLoader;
use NextPDF\Compliance\Validator\WaiverRegistry;
$validator = new ArlingtonValidator(
waivers: new WaiverRegistry(/* loaded waiver entries */),
grammar: new ArlingtonGrammarLoader(/* pinned submodule path */),
adapter: null, // grammar-only mode when the reference checker is absent
);
// Advisory by contract — never throws on findings.
$findings = $validator->validateReportOnly('artifact.pdf');
foreach ($findings as $finding) {
// Each finding pins the Arlington grammar commit SHA for provenance.
logger()->info('arlington', [
'rule' => $finding->ruleId,
'severity' => $finding->severity,
'grammarSha' => $finding->grammarSha,
]);
}
  • PdfRValidator is regex-based, not a full parser. It targets the deterministic output of NextPDF\Writer\PdfRWriter. Use it as a drift detector for that writer, not as a general PDF parser.
  • Zero findings ≠ full PDF/R-1 conformance. §6.7 (incremental updates) and §6.8 (encryption) are not implemented in the v5.1.0 cluster and are declared out of scope in claims.json. Treat a clean result as “no divergence on the implemented clause set”, and nothing more.
  • Arlington is advisory. In the current cycle, validateReportOnly() never fails the build. Continuous integration (CI) consumes the artifact but does not gate on it.
  • PDF/A International Color Consortium (ICC) validation is not here. The ISO 19005-4:2020 §6.2.2 OutputIntent ICC validation lives in the Enterprise PdfAManager (nextpdf/pro), not in Core’s Compliance module. Core’s PDF/A surface is the ConformanceMode discriminator only.
  • Waivers preserve the audit trail. A waived rule is suppressed from the finding list, but the waiver entry remains the record of why.

PdfRValidator::validate() is a single linear pass with bounded regex walks over the byte stream. Cost scales with document size and stays well inside the module budget. In grammar-only mode, ArlingtonValidator is O(grammar-rule-count) for the load-proof finding. The reference-checker path runs as a subprocess and is bounded by the upstream tool, not by NextPDF. It is an out-of-band CI step.

These validators read untrusted PDF bytes. PdfRValidator strips parenthesized and hex literals before key extraction, so a crafted Creator string cannot inject a false /Name key (ISO 32000-1:2008 §7.3.4.2 escape handling). The Arlington adapter runs the upstream checker as a bounded subprocess. It treats a timeout or execution error as “no findings” rather than trusting partial output. See the project threat model for the PDF-parsing attack surface.

StandardClauseWhat the Compliance module doesStatus
ISO 23504-1:2020 (PDF/R-1)§6.5.7PdfRValidator enforces the {q,Q,cm,Do} content-stream operator allowlistVerified (unit + standards-profile + integration tests pass)
ISO 23504-1:2020 (PDF/R-1)§6.4.3PdfRValidator enforces the Info-dictionary key allowlistVerified (test-backed)
ISO 23504-1:2020 (PDF/R-1)§6.7, §6.8Not implemented in the v5.1.0 clusterExplicit non-coverage (declared in claims.json)
ISO 32000-2:2020 (PDF 2.0)§7.5.2Catalog-key allowlist walkClaimed (regex walk; structural)
ISO 19005-4:2020 (PDF/A-4)§6.7.3Identification-schema awareness via the Conformance moduleCross-reference (see /specifications/pdfa4/)

Support is not conformance. A PdfRValidator run that returns no findings proves only that the input did not diverge from the §6 clauses that the validator implements. It does not assert that the file is a conforming PDF/R-1 file: §6.7 and §6.8 are not checked. The Arlington cross-check is advisory and never asserts conformance. For PDF/A-4, veraPDF is the authoritative validator and runs out of band. See the conformance module for the veraPDF oracle and its opt-in gating.

Citations are paraphrased from the NextPDF compliance corpus. The full 64-character reference_id digests are recorded in the page front-matter and in _normative-evidence-conf.md.