Skip to content

Reduce PDF file size with compression and subsetting

You want the smallest PDF the content allows, with no loss of fidelity. NextPDF gives you two controls for file size, and both are on by default:

  • Stream compression. The writer wraps every page content stream and every embedded font program in a FlateDecode (zlib) stream. The NextPDF\Core\Config flag compress stores this setting. Read it back with the withCompress() wither when you build a streaming document.
  • Font subsetting. When you embed a TrueType or CFF font, the writer rebuilds the font program so it carries only the glyphs the document uses, then FlateDecode-compresses the result. This happens automatically. There is no flag to set and no call to make. A 20,000-glyph CJK face that contributes a few hundred glyphs to a document embeds at a fraction of its disk size.

One point of honesty up front: NextPDF Core does not expose image resampling, an image-quality knob, an object-stream toggle, or a resource-deduplication setting. The two controls above are the only size controls. The rest of this recipe shows you how to use them correctly and what each one does not do.

Prerequisites: a Core install (composer require nextpdf/core:^3) and, for the subsetting path, a font file you are licensed to embed.

Terminal window
composer require nextpdf/core:^3

A PDF is a tree of objects. The largest objects are usually content streams (the drawing operators for each page) and font programs (the embedded glyph outlines). Both compress well, so the most effective size control is to FlateDecode-compress them. FlateDecode is the PDF 2.0 name for a zlib-wrapped DEFLATE stream (ISO 32000-2:2020 §7.4.4), and it is the filter NextPDF emits.

The writer pins the DEFLATE compression level at 9, the RFC 1951 maximum, through NextPDF\Writer\PinnedZlibCompressor. Level 9 trades a little extra CPU for the smallest stream. Pinning the level also keeps output deterministic, because the zlib header encodes the level and a drifting level would change the bytes. You do not choose the level — the engine fixes it so that two runs over the same input produce byte-identical streams.

The second lever is font subsetting. A font file on disk carries every glyph the typeface defines, but a document that prints “Invoice 2026” needs only a few of them. NextPDF\Typography\FontSubsetter (for TrueType) and NextPDF\Typography\CffSubsetter (for CFF / OpenType) walk the codepoints the document actually rendered, resolve composite-glyph dependencies, and rebuild only the required font tables. They emit a valid subset font binary with a deterministic six-letter subset-prefix tag (ISO 32000-2:2020 §9.9). The writer applies this whenever an embedded font’s used-glyph set is known, then FlateDecode-compresses the subset. If subsetting a particular face would save less than ten percent, the subsetter returns the original program instead, because the rebuild cost is not worth a marginal gain.

The takeaway: you keep PDFs small by leaving compression on (the default) and by embedding real font files (so subsetting has something to shrink), not by tuning a long list of options.

The only size knob you set is on the configuration object.

NextPDF\Core\Config is an immutable, final readonly value object with typed wither methods. The size-related member is:

  • compress (bool, default true) — enables FlateDecode compression. Read it back with withCompress(bool $compress): self, which returns a new Config with the flag changed and every other field preserved.

Attach a Config to a document at construction time:

  • NextPDF\Core\Document::createStandalone(?Config $config = null): self builds a document with ephemeral registries for a CLI script or a short-lived process, applying your Config.

Two members shape what the size levers have to work with, but neither is itself a compression control:

  • imageCacheBytes (int, default 52_428_800) caps the in-memory image cache, and withImageCacheBytes(int $bytes): self changes it. This bounds peak memory during a build. It does not resample, recompress, or otherwise shrink the images you embed — it is a memory ceiling, not an output-size control.
  • fontsDirectory (string) and withFontsDirectory(string $dir): self set the default search path for font files, which feeds the subsetting path.

Font work happens through the typography surface on the document:

  • setFont(string $family, string $style = '', float $size = 12.0): static selects a face. When the family resolves to an embeddable font file, the writer records the codepoints you render so it can subset that face at save time.
  • addFontDirectory(string $directory): static registers an additional directory to search for font files.

Output is the standard trio: getPdfData(): string returns the bytes, save(string $path): void writes them atomically, and output(?string $filename, OutputDestination $dest): string handles HTTP delivery.

Subsetting has no public method and no flag. It is an emergent property of embedding a font and rendering text. The writer drives FontSubsetter / CffSubsetter for you inside NextPDF\Writer\PdfFontWriter.

This example builds a document with compression explicitly enabled and an embedded, subsetted font, then writes the bytes. It omits error handling to keep the call shape clear. The production sample below adds the full guards.

<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Config;
use NextPDF\Core\Document;
// compress defaults to true; setting it explicitly documents intent.
$config = (new Config())->withCompress(true);
$doc = Document::createStandalone($config);
$doc->addFontDirectory(__DIR__ . '/fonts');
$doc->addPage();
// Selecting an embeddable face records the glyphs used, so the writer
// subsets this font automatically when the document is built.
$doc->setFont('LiberationSans', '', 12);
$doc->cell(0, 10, 'Invoice 2026 - subsetted, compressed output.', newLine: true);
$pdf = $doc->getPdfData();
file_put_contents(__DIR__ . '/small.pdf', $pdf);
printf("Wrote %d bytes.\n", strlen($pdf));

This is a self-contained program. It builds a document with compression on, embeds a font from a directory you control, renders text so the subsetter has a used-glyph set, and writes the result atomically. It catches the most specific NextPDF exceptions the build and save paths raise, then rethrows each one with context rather than swallowing it. Point NEXTPDF_FONT_DIR at a directory that holds a TrueType or CFF face you are licensed to embed; the program validates the path before it embeds.

<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Config;
use NextPDF\Core\Document;
use NextPDF\Exception\CompressionException;
use NextPDF\Exception\InvalidConfigException;
/**
* Resolve and validate the font directory from a server-controlled source.
*
* Reading the directory from the environment keeps the path off the request
* surface. The function rejects a missing or unreadable directory so the
* embedding path never runs against untrusted or absent input.
*/
function resolveFontDirectory(): string
{
$configured = getenv('NEXTPDF_FONT_DIR');
$dir = $configured !== false && $configured !== '' ? $configured : __DIR__ . '/fonts';
$real = realpath($dir);
if ($real === false || !is_dir($real) || !is_readable($real)) {
throw new RuntimeException(sprintf('Font directory "%s" is not a readable directory.', $dir));
}
return $real;
}
/**
* Build a compressed, font-subsetted document and return its bytes.
*
* @param non-empty-string $fontDirectory Validated directory of embeddable fonts.
*
* @return string Raw PDF bytes.
*/
function buildCompactPdf(string $fontDirectory): string
{
// compress is true by default; pin it so the intent is explicit and the
// streaming writer path honours it regardless of any wrapper defaults.
$config = (new Config())
->withCompress(true)
->withFontsDirectory($fontDirectory)
// Bound the image cache so a build cannot exhaust memory. This is a
// memory ceiling, not an output-size control.
->withImageCacheBytes(16 * 1024 * 1024);
$doc = Document::createStandalone($config);
$doc->addFontDirectory($fontDirectory);
$doc->addPage();
// Rendering with an embeddable face records the used codepoints, which the
// writer turns into a font subset at build time.
$doc->setFont('LiberationSans', '', 12);
$doc->cell(0, 10, 'Invoice 2026', newLine: true);
$doc->cell(0, 10, 'Compressed streams plus an automatic font subset.', newLine: true);
// getPdfData() triggers the build: page streams and the subset font program
// are FlateDecode-compressed before the bytes are returned.
return $doc->getPdfData();
}
try {
$fontDirectory = resolveFontDirectory();
$pdf = buildCompactPdf($fontDirectory);
} catch (CompressionException $e) {
// Raised if the zlib encoder hard-fails while compressing a stream.
throw new RuntimeException(
sprintf('Compression failed for a %s stream.', $e->getAlgorithm()),
previous: $e,
);
} catch (InvalidConfigException $e) {
// Raised by the output path for an invalid destination configuration.
throw new RuntimeException(
sprintf('Output configuration "%s" was rejected.', $e->getConfigKey()),
previous: $e,
);
}
$out = getenv('NEXTPDF_COOKBOOK_OUTPUT');
$path = $out !== false && $out !== '' ? $out : __DIR__ . '/small.pdf';
if (file_put_contents($path, $pdf) === false) {
throw new RuntimeException(sprintf('Could not write PDF to "%s".', $path));
}
printf("Wrote %d bytes to %s.\n", strlen($pdf), $path);

Expected STDOUT (the byte count depends on the font and the build):

Wrote <n> bytes to <path>.
  • Compression is on by default. A fresh Config has compress set to true. You rarely need withCompress() at all. Set it explicitly only to document intent, or to opt out for a debugging build where you want to read the raw streams.
  • Turning compression off makes files larger, not smaller. withCompress(false) is a diagnostic aid for inspecting uncompressed streams. It is never a size optimization. Ship with compression on.
  • Subsetting needs a real embedded font. The Base14 standard fonts (Helvetica, Times, Courier, and their relatives) are referenced by name and carry no embedded program in a plain document, so there is nothing to subset. Subsetting only shrinks faces you embed from a font file.
  • Subsetting is automatic and silent. There is no flag, no method, and no confirmation. If you embedded a font and rendered text with it, the writer subsetted it. The embedded program carries a six-letter subset-prefix tag (for example ABCDEF+LiberationSans) so a reader can tell a subset from a full embed.
  • A small saving keeps the full font. When a subset would save less than ten percent of the program size, the subsetter returns the original. This is a deliberate floor: the rebuild cost is not worth a marginal gain. Embedding a face that is already tiny, or rendering nearly all of its glyphs, can land in this case.
  • imageCacheBytes is not an image size knob. It caps memory, not output bytes. NextPDF Core embeds the image data you give it; there is no resampling, downsampling, or re-encoding step. If you need smaller images, resize and re-encode them before you embed them.
  • No object-stream or dedup setting exists. NextPDF Core does not expose a toggle for PDF 2.0 object streams or for resource deduplication. Do not look for one — the size levers are stream compression and font subsetting.

Compression at level 9 is the dominant CPU cost of writing a stream. It trades a few percent of build time for the smallest output. The cost is linear in the uncompressed byte count, so the page count and the amount of embedded font data set the budget. Subsetting adds a one-time pass per embedded face that parses the font’s table directory, resolves the used-glyph closure, and rebuilds the required tables. For a large CJK face, this is the more expensive of the two levers, but it runs once per font, not once per page. The ten-percent saving floor exists partly to keep that pass off the hot path when it would not pay off. A small document with one embedded subset sits comfortably inside a 1500 ms wall and a 96 MB peak budget. Bound imageCacheBytes to your real ceiling so a build that embeds many images fails fast on memory rather than swapping.

The build runs in process; no document bytes leave the host and no network call is made. Treat any externally supplied font or image as untrusted input:

  • Validate the font directory. The production sample reads the font path from a server-controlled environment variable and rejects a missing or unreadable directory before embedding. Never derive a font path from a request field.
  • Embed only fonts you are licensed to redistribute. A subset is still an embedded font program. Confirm the license permits embedding before you ship a document that carries the face.
  • Malformed fonts raise, they do not silently corrupt. A font file that fails to parse raises NextPDF\Exception\FontParsingException, and a hard zlib failure raises NextPDF\Exception\CompressionException. Catch the most specific exception and act on it. Never wrap the build in an empty catch.
  • Never interpolate user input into the output path. The sample writes to a fixed path or a server-controlled side-channel, and it rejects stream wrappers and null bytes through the atomic writer in save(). Derive output paths from server-controlled values to avoid path traversal.
  • No secrets in the document. Do not embed credentials, tokens, or internal identifiers in a generated document you return to a client.

This recipe makes no normative standards claim of its own. The mechanisms it uses are defined by the PDF 2.0 specification: FlateDecode stream compression (ISO 32000-2:2020 §7.4.4) and font subset naming with a six-character subset prefix (ISO 32000-2:2020 §9.9). NextPDF emits both as part of its standard write path; you do not configure them beyond the compress flag. The structural reproducibility profile this page declares reflects that the writer pins the DEFLATE level, so the compressed streams are deterministic, while document-level identifiers may still vary between runs unless you also configure deterministic settings. For the embedding mechanics behind subsetting, see the embed-and-subset recipe linked below.