Merge external PDFs or append pages from existing documents

At a glance

You have several PDF files on disk, and you need one PDF. This recipe combines existing documents end to end with the Core merge surface, NextPDF\Document\PdfMerger. You pass raw PDF byte strings. The merger renumbers every object to avoid collisions, builds one page tree and one cross-reference table, and returns a NextPDF\Document\MergeResult that you can write to disk or stream to a client.

The same surface covers the three tasks you need most often:

Merge an ordered list of PDFs into one document.
Append a second PDF after a base PDF.
Prepend pages by putting the new document first in the input order.

The merge runs in process, without a headless browser or a network call. You need Core installed (composer require nextpdf/core:^3) and two or more readable PDF files.

Install

composer require nextpdf/core:^3

Conceptual overview

A PDF organizes pages in a page tree whose root is a /Pages node, and it locates each indirect object through a cross-reference table. When you combine two source documents, their object numbers overlap. Both files almost always contain an object 1 0 obj, a /Catalog, and a /Pages node. If you only concatenate the bytes, you produce a corrupt file because the references no longer point to the objects they identify.

PdfMerger resolves the overlap. It extracts the page objects from each input, renumbers every object into one address space, rewrites each page’s /Parent reference to point to a single merged /Pages node, and emits one catalog, one page tree, and one trailer. The output is a structurally fresh document, not a stapled concatenation.

The ordering rule is simple: pages appear in the same order as their source files in the input list. To append, put the base document first. To prepend, put the new document first. There is no separate prepend method because input order is the only control you need.

API surface

new NextPDF\Document\PdfMerger() exposes two methods.

merge(list<string> $pdfFiles, int $maxFiles = 100, int $maxTotalBytes = 200_000_000): MergeResult combines an ordered list of raw PDF byte strings. The two bound parameters cap the file count and total input size. Both default to safe production values; tighten them for each workload.
append(string $basePdf, string $appendPdf): MergeResult is a convenience wrapper that merges exactly two documents in order. It is equivalent to merge([$basePdf, $appendPdf]).

Both methods return a NextPDF\Document\MergeResult, a readonly object that carries $pdfData (the merged bytes), $totalPages, $sourceCount, $mergedSize, and the isValid() helper that confirms the output begins with the %PDF header.

Inputs are raw byte strings, not file paths. Read the file yourself with file_get_contents(), or pull the bytes from object storage. This keeps the merger free of filesystem assumptions and lets you merge documents that never touch disk.

If you need to import a single page from an external PDF as a reusable Form XObject, for example, to stamp a letterhead page behind generated content, use the cross-package importer contract NextPDF\Contracts\ImportedFormObjectInterface, implemented by importers such as nextpdf/artisan. For whole-document and whole-page composition, use the PdfMerger surface documented here.

Code sample — Quick start

This sample reads two files and writes the merged output. It leaves out error handling to show the call shape; the production sample below adds the full guards.

<?php

declare(strict_types=1);

require_once __DIR__ . '/vendor/autoload.php';

use NextPDF\Document\PdfMerger;

$merger = new PdfMerger();

$result = $merger->merge([
    file_get_contents(__DIR__ . '/cover.pdf'),
    file_get_contents(__DIR__ . '/body.pdf'),
    file_get_contents(__DIR__ . '/appendix.pdf'),
]);

file_put_contents(__DIR__ . '/combined.pdf', $result->pdfData);

printf("Merged %d source(s) into %d page(s).\n", $result->sourceCount, $result->totalPages);

Code sample — Production

This self-contained program builds two small documents in memory, so it runs without an external file. It merges them, validates the result, and writes the output. It catches the two exceptions the merge surface raises and rethrows each one with context instead of swallowing it. Replace the in-memory inputs with your own file_get_contents() reads or object-storage fetches, and wire the output to your response or storage layer.

<?php

declare(strict_types=1);

require_once __DIR__ . '/vendor/autoload.php';

use NextPDF\Core\Document;
use NextPDF\Document\MergeResult;
use NextPDF\Document\PdfMerger;
use NextPDF\Exception\PageLayoutException;
use NextPDF\Exception\WriterException;

/**
 * Build a tiny labelled PDF so the program is self-contained.
 *
 * In your own code, replace calls to this helper with reads of the external
 * PDFs you want to combine, for example file_get_contents($path).
 */
function buildSample(string $label, int $pages): string
{
    $doc = Document::createStandalone();
    $doc->setTitle($label);

    for ($page = 1; $page <= $pages; $page++) {
        $doc->addPage();
        $doc->setFont('helvetica', '', 12);
        $doc->cell(0, 10, sprintf('%s - page %d', $label, $page), newLine: true);
    }

    return $doc->getPdfData();
}

// Validate the input set before touching the merger. An empty set is a
// configuration error, not an empty success.
/** @var list<string> $sources Raw PDF byte strings, in output order. */
$sources = [
    buildSample('Cover', 1),     // first in the list -> first in the output (prepend position)
    buildSample('Body', 2),
    buildSample('Appendix', 1),  // last in the list -> appended after the body
];

if ($sources === []) {
    throw new RuntimeException('No source PDFs supplied to merge.');
}

$merger = new PdfMerger();

try {
    // Bound the merge deliberately: at most 50 files, 100 MB total input.
    $result = $merger->merge($sources, maxFiles: 50, maxTotalBytes: 100_000_000);
} catch (PageLayoutException $e) {
    // Raised when the list is empty or an input does not begin with %PDF.
    throw new RuntimeException(
        sprintf('Merge rejected an input: %s', $e->getConstraint()),
        previous: $e,
    );
} catch (WriterException $e) {
    // Raised when the total input size exceeds the configured byte cap.
    throw new RuntimeException(
        sprintf('Merge exceeded its size budget at stage "%s".', $e->getWriterState()),
        previous: $e,
    );
}

if (!$result->isValid()) {
    throw new RuntimeException('Merged output failed its structural header check.');
}

emitResult($result);

/**
 * Write the merged document to the cookbook side-channel, or to a default file.
 */
function emitResult(MergeResult $result): void
{
    printf(
        "Merged %d source(s) into %d page(s), %d bytes.\n",
        $result->sourceCount,
        $result->totalPages,
        $result->mergedSize,
    );

    $out = getenv('NEXTPDF_COOKBOOK_OUTPUT');
    $path = $out !== false && $out !== '' ? $out : __DIR__ . '/combined.pdf';

    if (file_put_contents($path, $result->pdfData) === false) {
        throw new RuntimeException(sprintf('Could not write merged PDF to "%s".', $path));
    }
}

Expected standard output (the page total is the sum of the source page counts, and the byte size depends on the build):

Merged 3 source(s) into 4 page(s), <n> bytes.

Edge cases & gotchas

Inputs are bytes, not paths. merge() takes raw PDF strings. Read the file with file_get_contents() first. Passing a path string makes the input fail the %PDF header check and raises PageLayoutException.
Order is output order. Pages land in the order their source files appear in the list. There is no prepend method: put the new document first to prepend, or last to append.
Empty list is an error. An empty $pdfFiles raises PageLayoutException, not an empty result. Validate the set before you call the merger.
Every input is validated up front. Each entry must be non-empty and begin with %PDF. The first failing input raises PageLayoutException with the violated constraint, and nothing is merged.
Bounds raise rather than truncate. Exceeding maxFiles raises through the internal resource guard, and exceeding maxTotalBytes raises WriterException. The merger never silently drops files or clips bytes, so tune both bounds for your workload.
Output is structurally fresh, not byte-stable. The merged document carries a new catalog, page tree, and trailer. Two runs over the same inputs are structurally equal, but not guaranteed to be byte-identical. That is why this recipe declares a structural reproducibility profile.
Page-level annotations and shared resources. The merge composes page objects into one tree. Document-level structures that live outside the page objects in a source file are not carried across. When you need to import a single page as a reusable graphic with its resources, use the ImportedFormObjectInterface path through an importer such as nextpdf/artisan.

Performance

Merging is linear in the total page count. Parsing and object renumbering, not the merger’s own bookkeeping, dominate the work. Peak memory tracks the total input bytes because every source is held in memory as a string while the output is assembled. The maxTotalBytes guard keeps that peak bounded. For high-volume pipelines, set maxFiles and maxTotalBytes to the smallest values your workload needs, so a malformed or oversized batch fails fast instead of exhausting memory. A typical small merge sits inside a 1500 ms wall and 64 MB peak budget.

Security notes

The merge runs in process; no document bytes leave the host, and no network call is made. Treat every external PDF as untrusted input:

Keep the bounds tight. maxFiles and maxTotalBytes are your first line of defense against denial-of-service input. For any surface that accepts uploads, set them to your real ceiling, not the generous defaults.
Validate before you trust. A successful merge means the bytes were combined, not that the inputs are safe. Run untrusted inputs through the Core inspector first. See Parse and inspect a PDF for a bounded triage scan that flags encryption, signatures, and risk markers before heavier processing.
Never interpolate user input into a path. This recipe writes to a fixed path or the cookbook side-channel. Derive output paths from server-controlled values, never from a request field, to avoid path traversal.
No secrets in the document. Do not embed credentials, tokens, or internal identifiers in a merged document that you return to a client.

Conformance

This recipe makes no normative standards claim of its own. It composes existing documents through the Core merge surface and validates the result with the MergeResult::isValid() header check. The page-tree model that PdfMerger rebuilds is the PDF 2.0 page-tree structure described in the /modules/core/document/ reference. For a structural read of any input or output document, including version, page count, encryption, and signature flags, use the Core inspector documented in Parse and inspect a PDF.