Inspect: PDF introspection and preflight

At a glance

The Inspect module reads an existing Portable Document Format (PDF) file and reports what it contains: a complexity score, font and image audits, conformance hints, and risk flags. A preflight policy turns the report into a pass/fail decision, so you can gate a document before it enters a pipeline.

Stability: experimental. Introspection is still evolving. The InspectResult shape, the risk-flag set, and the optional accelerated inspection path may change between minor versions. Use it for diagnostics and gating; do not depend on its result shape for long-lived contracts yet.

Install

composer require nextpdf/core:^3

Conceptual overview

Use Inspector as the entry point. It implements InspectorInterface and exposes one method: inspect(string $pdfData, InspectConfig $config = new InspectConfig()): InspectResult. It is read-only: it parses a PDF and characterizes it; it does not modify the document.

InspectResult is the structured report. It includes the complexity score, audits, hints, and a set of RiskFlags. Use hasRisks() / hasRisk(RiskFlag $flag) to branch on a specific risk instead of parsing free text. ComplexityScore exposes a numeric score and a category() band. FontAuditEntry and ImageAuditEntry describe embedded resources. ComplianceHint flags likely conformance problems. InspectIssue records a specific finding. InspectDepth sets inspection depth, and toSpectrumDepth() maps that depth to the accelerated path when the Spectrum sidecar is available. Inspection runs without the sidecar. The sidecar changes performance only, not the contract. InspectResponseParser builds an InspectResult from a raw response (for example, the accelerated path’s response) with an optional trace ID.

PreflightPolicy is the decision layer. evaluate(InspectResult $result) applies a configured policy to a result and returns the policy outcome. The whole module is @since 2.2.0.

API surface

Type	Key members	Role
`Inspector`	`inspect(string $pdfData, InspectConfig $config): InspectResult`	Read-only PDF inspector (`@since 2.2.0`)
`InspectResult`	`hasRisks()`, `hasRisk(RiskFlag)`, score/audit accessors	Structured inspection report (`@since 2.2.0`)
`ComplexityScore`	`category()`	Numeric complexity score + band (`@since 2.2.0`)
`FontAuditEntry` / `ImageAuditEntry`	resource accessors	Embedded-resource audits (`@since 2.2.0`)
`ComplianceHint` / `InspectIssue`	finding accessors	Conformance hints and findings (`@since 2.2.0`)
`InspectDepth` (enum)	`toSpectrumDepth()`	Inspection depth → accelerated path (`@since 2.2.0`)
`PreflightPolicy`	`evaluate(InspectResult): array`, `toArray()`	Pass/fail preflight decision (`@since 2.2.0`)
`InspectResponseParser`	`parse(array, InspectConfig, ?string $traceId): InspectResult`	Builds a result from a raw response (`@since 2.2.0`)

Run composer docs:generate-api-php -- --module=Inspect to generate the full PHPDoc table.

Code sample — Quick start

Source: examples/34-inspect-layout-boxes.php demonstrates reading page geometry. This example inspects an arbitrary PDF’s risk profile:

<?php

declare(strict_types=1);

require_once __DIR__ . '/../vendor/autoload.php';

use NextPDF\Inspect\Inspector;

$result = (new Inspector())->inspect(file_get_contents('/srv/in/incoming.pdf'));

if ($result->hasRisks()) {
    echo "Complexity: {$result->complexityScore()->category()}; risks present.\n";
}

Code sample — Production

Gate an incoming PDF through a preflight policy and reject it on any risk.

<?php

declare(strict_types=1);

require_once __DIR__ . '/../vendor/autoload.php';

use NextPDF\Inspect\Inspector;
use NextPDF\Inspect\PreflightPolicy;
use Psr\Log\LoggerInterface;

final readonly class IngestPreflight
{
    public function __construct(
        private Inspector $inspector,
        private PreflightPolicy $policy,
        private LoggerInterface $logger,
    ) {}

    public function accept(string $pdfData): bool
    {
        $result = $this->inspector->inspect($pdfData);
        $verdict = $this->policy->evaluate($result);

        if ($verdict !== []) {
            $this->logger->warning('PDF rejected at preflight.', ['findings' => $verdict]);

            return false;
        }

        return true;
    }
}

Edge cases & gotchas

inspect() is read-only. It never modifies or repairs the input; do not expect it to return a “fixed” document.
hasRisk(RiskFlag) is the precise check. Branching on hasRisks() alone treats every risk identically; usually, you want a specific flag.
InspectDepth controls cost. A deep inspection of a large PDF is significantly slower; use the shallowest depth that answers your question.
The Spectrum-accelerated path changes performance, not the result contract. Code against InspectResult, not the accelerated response shape.
PreflightPolicy::evaluate() returns the findings; it does not throw. An empty result is a pass; act on the return value.

Performance

Inspection cost scales with document size and the chosen InspectDepth. Shallow inspection is fast; a deep audit of a large PDF can approach the budget. The Spectrum path offloads heavy parsing when available. The performance_budget of 1500 ms wall / 64 MB peak describes the reference workload. The reproducibility profile is structural: a result can include a trace ID and timing. Two runs can differ in those fields, while the findings remain stable for the same input.

Security notes

Inspector::inspect() parses untrusted PDF bytes; that is its purpose, so treat the input as hostile. Run inspection in a constrained worker for user-supplied documents, and bound input size upstream. A deliberately complex PDF is a denial-of-service vector regardless of depth. The result describes the document but does not sanitize it; a “low risk” verdict is a heuristic, not a safety guarantee. Treat extracted strings and metadata as untrusted. See the engine threat model in /modules/core/security/.

Conformance

This module reports conformance hints; it does not provide the authoritative conformance verdict. ComplianceHint flags likely problems heuristically. For PDF/A, PDF/UA, and International Organization for Standardization (ISO) 32000-2, the authoritative conformance verdict comes from the reference validators driven by /modules/core/cli/ and the oracle and golden suites described in /modules/core/conformance/. Do not treat a clean Inspect result as conformance certification.