Merge external PDFs or append pages from existing documents
At a glance
Section titled “At a glance”You have several PDF files on disk, and you need one PDF. This recipe combines
existing documents end to end with the Core merge surface,
NextPDF\Document\PdfMerger. You pass raw PDF byte strings. The merger
renumbers every object to avoid collisions, builds one page tree and one
cross-reference table, and returns a NextPDF\Document\MergeResult that you can
write to disk or stream to a client.
The same surface covers the three tasks you need most often:
- Merge an ordered list of PDFs into one document.
- Append a second PDF after a base PDF.
- Prepend pages by putting the new document first in the input order.
The merge runs in process, without a headless browser or a network call. You
need Core installed (composer require nextpdf/core:^3) and two or more
readable PDF files.
Install
Section titled “Install”composer require nextpdf/core:^3Conceptual overview
Section titled “Conceptual overview”A PDF organizes pages in a page tree whose root is a /Pages node, and it
locates each indirect object through a cross-reference table. When you combine
two source documents, their object numbers overlap. Both files almost always
contain an object 1 0 obj, a /Catalog, and a /Pages node. If you only
concatenate the bytes, you produce a corrupt file because the references no
longer point to the objects they identify.
PdfMerger resolves the overlap. It extracts the page objects from each input,
renumbers every object into one address space, rewrites each page’s /Parent
reference to point to a single merged /Pages node, and emits one catalog, one
page tree, and one trailer. The output is a structurally fresh document, not a
stapled concatenation.
The ordering rule is simple: pages appear in the same order as their source files in the input list. To append, put the base document first. To prepend, put the new document first. There is no separate prepend method because input order is the only control you need.
API surface
Section titled “API surface”new NextPDF\Document\PdfMerger() exposes two methods.
merge(list<string> $pdfFiles, int $maxFiles = 100, int $maxTotalBytes = 200_000_000): MergeResultcombines an ordered list of raw PDF byte strings. The two bound parameters cap the file count and total input size. Both default to safe production values; tighten them for each workload.append(string $basePdf, string $appendPdf): MergeResultis a convenience wrapper that merges exactly two documents in order. It is equivalent tomerge([$basePdf, $appendPdf]).
Both methods return a NextPDF\Document\MergeResult, a readonly object that
carries $pdfData (the merged bytes), $totalPages, $sourceCount,
$mergedSize, and the isValid() helper that confirms the output begins with
the %PDF header.
Inputs are raw byte strings, not file paths. Read the file yourself with
file_get_contents(), or pull the bytes from object storage. This keeps the
merger free of filesystem assumptions and lets you merge documents that never
touch disk.
If you need to import a single page from an external PDF as a reusable Form
XObject, for example, to stamp a letterhead page behind generated content, use
the cross-package importer contract
NextPDF\Contracts\ImportedFormObjectInterface, implemented by importers such
as nextpdf/artisan. For whole-document and whole-page composition, use the
PdfMerger surface documented here.
Code sample — Quick start
Section titled “Code sample — Quick start”This sample reads two files and writes the merged output. It leaves out error handling to show the call shape; the production sample below adds the full guards.
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Document\PdfMerger;
$merger = new PdfMerger();
$result = $merger->merge([ file_get_contents(__DIR__ . '/cover.pdf'), file_get_contents(__DIR__ . '/body.pdf'), file_get_contents(__DIR__ . '/appendix.pdf'),]);
file_put_contents(__DIR__ . '/combined.pdf', $result->pdfData);
printf("Merged %d source(s) into %d page(s).\n", $result->sourceCount, $result->totalPages);Code sample — Production
Section titled “Code sample — Production”This self-contained program builds two small documents in memory, so it runs
without an external file. It merges them, validates the result, and writes the
output. It catches the two exceptions the merge surface raises and rethrows each
one with context instead of swallowing it. Replace the in-memory inputs with
your own file_get_contents() reads or object-storage fetches, and wire the
output to your response or storage layer.
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Document;use NextPDF\Document\MergeResult;use NextPDF\Document\PdfMerger;use NextPDF\Exception\PageLayoutException;use NextPDF\Exception\WriterException;
/** * Build a tiny labelled PDF so the program is self-contained. * * In your own code, replace calls to this helper with reads of the external * PDFs you want to combine, for example file_get_contents($path). */function buildSample(string $label, int $pages): string{ $doc = Document::createStandalone(); $doc->setTitle($label);
for ($page = 1; $page <= $pages; $page++) { $doc->addPage(); $doc->setFont('helvetica', '', 12); $doc->cell(0, 10, sprintf('%s - page %d', $label, $page), newLine: true); }
return $doc->getPdfData();}
// Validate the input set before touching the merger. An empty set is a// configuration error, not an empty success./** @var list<string> $sources Raw PDF byte strings, in output order. */$sources = [ buildSample('Cover', 1), // first in the list -> first in the output (prepend position) buildSample('Body', 2), buildSample('Appendix', 1), // last in the list -> appended after the body];
if ($sources === []) { throw new RuntimeException('No source PDFs supplied to merge.');}
$merger = new PdfMerger();
try { // Bound the merge deliberately: at most 50 files, 100 MB total input. $result = $merger->merge($sources, maxFiles: 50, maxTotalBytes: 100_000_000);} catch (PageLayoutException $e) { // Raised when the list is empty or an input does not begin with %PDF. throw new RuntimeException( sprintf('Merge rejected an input: %s', $e->getConstraint()), previous: $e, );} catch (WriterException $e) { // Raised when the total input size exceeds the configured byte cap. throw new RuntimeException( sprintf('Merge exceeded its size budget at stage "%s".', $e->getWriterState()), previous: $e, );}
if (!$result->isValid()) { throw new RuntimeException('Merged output failed its structural header check.');}
emitResult($result);
/** * Write the merged document to the cookbook side-channel, or to a default file. */function emitResult(MergeResult $result): void{ printf( "Merged %d source(s) into %d page(s), %d bytes.\n", $result->sourceCount, $result->totalPages, $result->mergedSize, );
$out = getenv('NEXTPDF_COOKBOOK_OUTPUT'); $path = $out !== false && $out !== '' ? $out : __DIR__ . '/combined.pdf';
if (file_put_contents($path, $result->pdfData) === false) { throw new RuntimeException(sprintf('Could not write merged PDF to "%s".', $path)); }}Expected standard output (the page total is the sum of the source page counts, and the byte size depends on the build):
Merged 3 source(s) into 4 page(s), <n> bytes.Edge cases & gotchas
Section titled “Edge cases & gotchas”- Inputs are bytes, not paths.
merge()takes raw PDF strings. Read the file withfile_get_contents()first. Passing a path string makes the input fail the%PDFheader check and raisesPageLayoutException. - Order is output order. Pages land in the order their source files appear in the list. There is no prepend method: put the new document first to prepend, or last to append.
- Empty list is an error. An empty
$pdfFilesraisesPageLayoutException, not an empty result. Validate the set before you call the merger. - Every input is validated up front. Each entry must be non-empty and begin
with
%PDF. The first failing input raisesPageLayoutExceptionwith the violated constraint, and nothing is merged. - Bounds raise rather than truncate. Exceeding
maxFilesraises through the internal resource guard, and exceedingmaxTotalBytesraisesWriterException. The merger never silently drops files or clips bytes, so tune both bounds for your workload. - Output is structurally fresh, not byte-stable. The merged document carries
a new catalog, page tree, and trailer. Two runs over the same inputs are
structurally equal, but not guaranteed to be byte-identical. That is why this
recipe declares a
structuralreproducibility profile. - Page-level annotations and shared resources. The merge composes page
objects into one tree. Document-level structures that live outside the page
objects in a source file are not carried across. When you need to import a
single page as a reusable graphic with its resources, use the
ImportedFormObjectInterfacepath through an importer such asnextpdf/artisan.
Performance
Section titled “Performance”Merging is linear in the total page count. Parsing and object renumbering, not
the merger’s own bookkeeping, dominate the work. Peak memory tracks the total
input bytes because every source is held in memory as a string while the output
is assembled. The maxTotalBytes guard keeps that peak bounded. For high-volume
pipelines, set maxFiles and maxTotalBytes to the smallest values your
workload needs, so a malformed or oversized batch fails fast instead of
exhausting memory. A typical small merge sits inside a 1500 ms wall and 64 MB
peak budget.
Security notes
Section titled “Security notes”The merge runs in process; no document bytes leave the host, and no network call is made. Treat every external PDF as untrusted input:
- Keep the bounds tight.
maxFilesandmaxTotalBytesare your first line of defense against denial-of-service input. For any surface that accepts uploads, set them to your real ceiling, not the generous defaults. - Validate before you trust. A successful merge means the bytes were combined, not that the inputs are safe. Run untrusted inputs through the Core inspector first. See Parse and inspect a PDF for a bounded triage scan that flags encryption, signatures, and risk markers before heavier processing.
- Never interpolate user input into a path. This recipe writes to a fixed path or the cookbook side-channel. Derive output paths from server-controlled values, never from a request field, to avoid path traversal.
- No secrets in the document. Do not embed credentials, tokens, or internal identifiers in a merged document that you return to a client.
Conformance
Section titled “Conformance”This recipe makes no normative standards claim of its own. It composes existing
documents through the Core merge surface and validates the result with the
MergeResult::isValid() header check. The page-tree model that PdfMerger
rebuilds is the PDF 2.0 page-tree structure described in the
/modules/core/document/ reference. For a structural read of any input or
output document, including version, page count, encryption, and signature flags,
use the Core inspector documented in
Parse and inspect a PDF.
See also
Section titled “See also”- Document module reference — the full split, merge, and document-part surface.
- Parse and inspect a PDF — triage untrusted inputs before you merge them.
- Exception-aware error handling
— the NextPDF exception hierarchy behind
PageLayoutExceptionandWriterException. - Build a multi-page document — author the pages you then combine.