Inspect: PDF introspection and preflight
At a glance
Section titled “At a glance”The Inspect module reads an existing Portable Document Format (PDF) file and reports what it contains: a complexity score, font and image audits, conformance hints, and risk flags. A preflight policy turns the report into a pass/fail decision, so you can gate a document before it enters a pipeline.
Stability: experimental. Introspection is still evolving. The
InspectResultshape, the risk-flag set, and the optional accelerated inspection path may change between minor versions. Use it for diagnostics and gating; do not depend on its result shape for long-lived contracts yet.
Install
Section titled “Install”composer require nextpdf/core:^3Conceptual overview
Section titled “Conceptual overview”Use Inspector as the entry point. It implements InspectorInterface and exposes
one method: inspect(string $pdfData, InspectConfig $config = new InspectConfig()): InspectResult.
It is read-only: it parses a PDF and characterizes it; it does not modify the
document.
InspectResult is the structured report. It includes the complexity score,
audits, hints, and a set of RiskFlags. Use hasRisks() /
hasRisk(RiskFlag $flag) to branch on a specific risk instead of parsing free
text. ComplexityScore exposes a numeric score and a category() band.
FontAuditEntry and ImageAuditEntry describe embedded resources.
ComplianceHint flags likely conformance problems. InspectIssue records a
specific finding. InspectDepth sets inspection depth, and toSpectrumDepth()
maps that depth to the accelerated path when the Spectrum sidecar is available.
Inspection runs without the sidecar. The sidecar changes performance only, not
the contract. InspectResponseParser builds an InspectResult from a raw
response (for example, the accelerated path’s response) with an optional trace
ID.
PreflightPolicy is the decision layer. evaluate(InspectResult $result)
applies a configured policy to a result and returns the policy outcome. The
whole module is @since 2.2.0.
API surface
Section titled “API surface”| Type | Key members | Role |
|---|---|---|
Inspector | inspect(string $pdfData, InspectConfig $config): InspectResult | Read-only PDF inspector (@since 2.2.0) |
InspectResult | hasRisks(), hasRisk(RiskFlag), score/audit accessors | Structured inspection report (@since 2.2.0) |
ComplexityScore | category() | Numeric complexity score + band (@since 2.2.0) |
FontAuditEntry / ImageAuditEntry | resource accessors | Embedded-resource audits (@since 2.2.0) |
ComplianceHint / InspectIssue | finding accessors | Conformance hints and findings (@since 2.2.0) |
InspectDepth (enum) | toSpectrumDepth() | Inspection depth → accelerated path (@since 2.2.0) |
PreflightPolicy | evaluate(InspectResult): array, toArray() | Pass/fail preflight decision (@since 2.2.0) |
InspectResponseParser | parse(array, InspectConfig, ?string $traceId): InspectResult | Builds a result from a raw response (@since 2.2.0) |
Run composer docs:generate-api-php -- --module=Inspect to generate the full PHPDoc
table.
Code sample — Quick start
Section titled “Code sample — Quick start”Source: examples/34-inspect-layout-boxes.php demonstrates reading page
geometry. This example inspects an arbitrary PDF’s risk profile:
<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Inspect\Inspector;
$result = (new Inspector())->inspect(file_get_contents('/srv/in/incoming.pdf'));
if ($result->hasRisks()) { echo "Complexity: {$result->complexityScore()->category()}; risks present.\n";}Code sample — Production
Section titled “Code sample — Production”Gate an incoming PDF through a preflight policy and reject it on any risk.
<?php
declare(strict_types=1);
require_once __DIR__ . '/../vendor/autoload.php';
use NextPDF\Inspect\Inspector;use NextPDF\Inspect\PreflightPolicy;use Psr\Log\LoggerInterface;
final readonly class IngestPreflight{ public function __construct( private Inspector $inspector, private PreflightPolicy $policy, private LoggerInterface $logger, ) {}
public function accept(string $pdfData): bool { $result = $this->inspector->inspect($pdfData); $verdict = $this->policy->evaluate($result);
if ($verdict !== []) { $this->logger->warning('PDF rejected at preflight.', ['findings' => $verdict]);
return false; }
return true; }}Edge cases & gotchas
Section titled “Edge cases & gotchas”inspect()is read-only. It never modifies or repairs the input; do not expect it to return a “fixed” document.hasRisk(RiskFlag)is the precise check. Branching onhasRisks()alone treats every risk identically; usually, you want a specific flag.InspectDepthcontrols cost. A deep inspection of a large PDF is significantly slower; use the shallowest depth that answers your question.- The Spectrum-accelerated path changes performance, not the result contract.
Code against
InspectResult, not the accelerated response shape. PreflightPolicy::evaluate()returns the findings; it does not throw. An empty result is a pass; act on the return value.
Performance
Section titled “Performance”Inspection cost scales with document size and the chosen InspectDepth.
Shallow inspection is fast; a deep audit of a large PDF can approach the
budget. The Spectrum path offloads heavy parsing when available. The
performance_budget of 1500 ms wall / 64 MB peak describes the reference workload.
The reproducibility profile is structural: a result can include a trace ID and
timing. Two runs can differ in those fields, while the findings remain stable for
the same input.
Security notes
Section titled “Security notes”Inspector::inspect() parses untrusted PDF bytes; that is its purpose, so
treat the input as hostile. Run inspection in a constrained worker for
user-supplied documents, and bound input size upstream. A deliberately complex
PDF is a denial-of-service vector regardless of depth. The result describes the
document but does not sanitize it; a “low risk” verdict is a heuristic, not a
safety guarantee. Treat extracted strings and metadata as untrusted. See the
engine threat model in /modules/core/security/.
Conformance
Section titled “Conformance”This module reports conformance hints; it does not provide the authoritative
conformance verdict. ComplianceHint flags likely problems heuristically. For
PDF/A, PDF/UA, and International Organization for Standardization (ISO) 32000-2,
the authoritative conformance verdict comes from the reference validators driven
by /modules/core/cli/ and the oracle and golden suites described in
/modules/core/conformance/. Do not treat a clean Inspect result as
conformance certification.
See also
Section titled “See also”- Cli module — drives the authoritative external validators.
- Conformance overview — the authoritative verdict and golden suites.
- Accelerator module — the optional accelerated inspection path.
- Engine security model