Parse and inspect a PDF for structural facts
At a glance
Section titled “At a glance”This recipe uses the Core inspector Quick fallback to read structural facts from a Portable Document Format (PDF) file. You get the version, page count, encryption flag, signature flag, attachment flag, file size, and risk flags. Quick runs entirely in process, with no Spectrum sidecar and no network access. Use it for fast triage, not as a validator.
Install
Section titled “Install”composer require nextpdf/core:^3Conceptual overview
Section titled “Conceptual overview”A PDF file records its version in the file header (ISO 32000-2 §7.5.2).
The trailer carries a file identifier (/ID) as two byte strings (ISO 32000-2
§7.5.5).
When a signature is present, a signature dictionary stores Distinguished Encoding
Rules (DER)-encoded Cryptographic Message Syntax (CMS) SignedData in Contents
(ISO 32000-2 §12.8.1).
The Quick fallback uses a bounded scan of the document bytes to derive the
version, a page-count estimate, and the encryption, signature, and attachment
presence flags.
API surface
Section titled “API surface”Create new Inspector(), then call
->inspect(string $pdfData, InspectConfig::quick()). It returns an
InspectResult with $pdfVersion, $pageCount, $isEncrypted, $hasSigned,
$hasAttachments, $fileSizeBytes, $riskFlags, and the hasRisks() helper.
Code sample — Quick start
Section titled “Code sample — Quick start”<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Inspect\InspectConfig;use NextPDF\Inspect\Inspector;
$pdf = file_get_contents(__DIR__ . '/document.pdf');$result = (new Inspector())->inspect($pdf, InspectConfig::quick());
printf( "v%s, %d page(s), encrypted=%s, signed=%s\n", $result->pdfVersion ?? '?', $result->pageCount, $result->isEncrypted ? 'yes' : 'no', $result->hasSigned ? 'yes' : 'no',);Code sample — Production
Section titled “Code sample — Production”This self-contained program runs in the cookbook harness. It mirrors
examples/39-parse-and-inspect-pdf.php:
it builds a small multi-page PDF in memory, reads its structural facts with
the Quick fallback, and routes on those facts, never on a trust verdict. The
routing branch is illustrative. Replace it with your own pipeline, verifier
queue, and quarantine.
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Document;use NextPDF\Inspect\InspectConfig;use NextPDF\Inspect\Inspector;
// A self-contained input so the program runs with no external file.$doc = Document::createStandalone();$doc->setTitle('Parse-and-inspect demo');$doc->setAuthor('NextPDF Cookbook');$doc->addPage();$doc->setFont('helvetica', '', 12);$doc->cell(0, 10, 'Page one of the parse-and-inspect demonstration.', newLine: true);$doc->addPage();$doc->cell(0, 10, 'Page two.', newLine: true);$pdf = $doc->getPdfData();
$result = (new Inspector())->inspect($pdf, InspectConfig::quick());
echo 'PDF version : ' . ($result->pdfVersion ?? 'unknown') . "\n";echo 'Pages : ' . $result->pageCount . "\n";echo 'Encrypted : ' . ($result->isEncrypted ? 'yes' : 'no') . "\n";echo 'Signed : ' . ($result->hasSigned ? 'yes' : 'no') . "\n";echo 'Attachments : ' . ($result->hasAttachments ? 'yes' : 'no') . "\n";echo 'File size : ' . $result->fileSizeBytes . " bytes\n";echo 'Risk flags : ' . ($result->hasRisks() ? count($result->riskFlags) : 0) . "\n";
// Route on structural facts, not trust verdicts. Replace these calls with// your own pipeline / verifier queue / quarantine.if ($result->isEncrypted) { // $pipeline->decryptThenContinue($pdf); echo "Route: decrypt-then-continue\n";} elseif ($result->hasSigned) { // $verifierQueue->enqueue($pdf); // see the signature-inspect recipe echo "Route: enqueue for cryptographic verification\n";} elseif ($result->hasRisks()) { // $quarantine->hold($pdf, $result->riskFlags); echo "Route: quarantine (risk flags present)\n";} else { // $pipeline->continue($pdf); echo "Route: continue (no risks, unsigned, unencrypted)\n";}
// The harness sets NEXTPDF_COOKBOOK_OUTPUT and runs this script under the// semantic profile; emit the document to the side-channel.$out = getenv('NEXTPDF_COOKBOOK_OUTPUT');file_put_contents($out !== false && $out !== '' ? $out : __DIR__ . '/inspected.pdf', $pdf);Expected standard output (STDOUT) (version and size depend on the build; the demo PDF is unencrypted, unsigned, and risk-free):
PDF version : <version>Pages : 2Encrypted : noSigned : noAttachments : noFile size : <n> bytesRisk flags : 0Route: continue (no risks, unsigned, unencrypted)Edge cases & gotchas
Section titled “Edge cases & gotchas”- Quick is triage, not validation. It reports what is present and what is absent. It does not verify signatures, decrypt content, or assert conformance. Treat the result as routing input.
- Page count is an estimate. The Quick fallback counts page-object markers. A deliberately malformed object graph can skew the count. Use the Spectrum-backed depths when you need an exact count.
- Standard/Full need the sidecar.
new InspectConfig()(depthStandard) andInspectConfig::full()require the Spectrum sidecar. They throwINSPECT-SIDECAR-001when it is unavailable and do not silently degrade to Quick. - Empty input. Passing an empty string throws an inspect exception with “PDF data must not be empty”.
- Encryption flag scope. The flag reflects an
/Encrypttrailer entry. A flagged file is not decrypted by the inspector.
Performance
Section titled “Performance”The Quick fallback uses a bounded scan, not a full parse. Use it to pre-route high volumes of incoming files before heavier processing.
Security notes
Section titled “Security notes”The inspector runs in process and reads only structural markers. No document bytes leave the host, and no document text is extracted. A risk flag, such as embedded JavaScript, is an advisory routing signal. It is not an assertion that the file is safe or unsafe.
Conformance
Section titled “Conformance”| Statement | Spec | Clause | reference_id |
|---|---|---|---|
| The file header records the PDF version. | ISO 32000-2 | §7.5.2 | |
The trailer /ID is a file identifier of two byte strings. | ISO 32000-2 | §7.5.5 | |
A signature dictionary Contents holds DER CMS SignedData. | ISO 32000-2 | §12.8.1 |
This recipe reports structural facts only. It does not assert that the file is valid, safe, or conformant.
Commercial context
Section titled “Commercial context”Standard and Full inspection depths run through the Spectrum sidecar. They add richer object, font, and image analysis. The Quick fallback documented here is Core and offline.