Reduce PDF file size with compression and subsetting
At a glance
Section titled “At a glance”You want the smallest PDF the content allows, with no loss of fidelity. NextPDF gives you two controls for file size, and both are on by default:
- Stream compression. The writer wraps every page content stream and every
embedded font program in a FlateDecode (zlib) stream. The
NextPDF\Core\Configflagcompressstores this setting. Read it back with thewithCompress()wither when you build a streaming document. - Font subsetting. When you embed a TrueType or CFF font, the writer rebuilds
the font program so it carries only the glyphs the document uses, then
FlateDecode-compresses the result. This happens automatically. There is no
flag to set and no call to make. A
20,000-glyph CJK face that contributes a few hundred glyphs to a document embeds at a fraction of its disk size.
One point of honesty up front: NextPDF Core does not expose image resampling, an image-quality knob, an object-stream toggle, or a resource-deduplication setting. The two controls above are the only size controls. The rest of this recipe shows you how to use them correctly and what each one does not do.
Prerequisites: a Core install (composer require nextpdf/core:^3) and, for the
subsetting path, a font file you are licensed to embed.
Install
Section titled “Install”composer require nextpdf/core:^3Conceptual overview
Section titled “Conceptual overview”A PDF is a tree of objects. The largest objects are usually content streams (the drawing operators for each page) and font programs (the embedded glyph outlines). Both compress well, so the most effective size control is to FlateDecode-compress them. FlateDecode is the PDF 2.0 name for a zlib-wrapped DEFLATE stream (ISO 32000-2:2020 §7.4.4), and it is the filter NextPDF emits.
The writer pins the DEFLATE compression level at 9, the RFC 1951 maximum, through
NextPDF\Writer\PinnedZlibCompressor. Level 9 trades a little extra CPU for the
smallest stream. Pinning the level also keeps output deterministic, because the zlib
header encodes the level and a drifting level would change the bytes. You do not
choose the level — the engine fixes it so that two runs over the same input
produce byte-identical streams.
The second lever is font subsetting. A font file on disk carries every glyph the
typeface defines, but a document that prints “Invoice 2026” needs only a few of
them. NextPDF\Typography\FontSubsetter (for TrueType) and
NextPDF\Typography\CffSubsetter (for CFF / OpenType) walk the codepoints the
document actually rendered, resolve composite-glyph dependencies, and rebuild only
the required font tables. They emit a valid subset font binary with a
deterministic six-letter subset-prefix tag (ISO 32000-2:2020 §9.9). The writer
applies this whenever an embedded font’s used-glyph set is known, then
FlateDecode-compresses the subset. If subsetting a particular face would save less
than ten percent, the subsetter returns the original program instead, because the
rebuild cost is not worth a marginal gain.
The takeaway: you keep PDFs small by leaving compression on (the default) and by embedding real font files (so subsetting has something to shrink), not by tuning a long list of options.
API surface
Section titled “API surface”The only size knob you set is on the configuration object.
NextPDF\Core\Config is an immutable, final readonly value object with typed
wither methods. The size-related member is:
compress(bool, defaulttrue) — enables FlateDecode compression. Read it back withwithCompress(bool $compress): self, which returns a newConfigwith the flag changed and every other field preserved.
Attach a Config to a document at construction time:
NextPDF\Core\Document::createStandalone(?Config $config = null): selfbuilds a document with ephemeral registries for a CLI script or a short-lived process, applying yourConfig.
Two members shape what the size levers have to work with, but neither is itself a compression control:
imageCacheBytes(int, default52_428_800) caps the in-memory image cache, andwithImageCacheBytes(int $bytes): selfchanges it. This bounds peak memory during a build. It does not resample, recompress, or otherwise shrink the images you embed — it is a memory ceiling, not an output-size control.fontsDirectory(string) andwithFontsDirectory(string $dir): selfset the default search path for font files, which feeds the subsetting path.
Font work happens through the typography surface on the document:
setFont(string $family, string $style = '', float $size = 12.0): staticselects a face. When the family resolves to an embeddable font file, the writer records the codepoints you render so it can subset that face at save time.addFontDirectory(string $directory): staticregisters an additional directory to search for font files.
Output is the standard trio: getPdfData(): string returns the bytes,
save(string $path): void writes them atomically, and
output(?string $filename, OutputDestination $dest): string handles HTTP
delivery.
Subsetting has no public method and no flag. It is an emergent property of
embedding a font and rendering text. The writer drives
FontSubsetter / CffSubsetter for you inside NextPDF\Writer\PdfFontWriter.
Code sample — Quick start
Section titled “Code sample — Quick start”This example builds a document with compression explicitly enabled and an embedded, subsetted font, then writes the bytes. It omits error handling to keep the call shape clear. The production sample below adds the full guards.
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Config;use NextPDF\Core\Document;
// compress defaults to true; setting it explicitly documents intent.$config = (new Config())->withCompress(true);
$doc = Document::createStandalone($config);$doc->addFontDirectory(__DIR__ . '/fonts');$doc->addPage();
// Selecting an embeddable face records the glyphs used, so the writer// subsets this font automatically when the document is built.$doc->setFont('LiberationSans', '', 12);$doc->cell(0, 10, 'Invoice 2026 - subsetted, compressed output.', newLine: true);
$pdf = $doc->getPdfData();
file_put_contents(__DIR__ . '/small.pdf', $pdf);
printf("Wrote %d bytes.\n", strlen($pdf));Code sample — Production
Section titled “Code sample — Production”This is a self-contained program. It builds a document with compression on,
embeds a font from a directory you control, renders text so the subsetter has a
used-glyph set, and writes the result atomically. It catches the most specific
NextPDF exceptions the build and save paths raise, then rethrows each one with
context rather than swallowing it. Point NEXTPDF_FONT_DIR at a directory
that holds a TrueType or CFF face you are licensed to embed; the program validates
the path before it embeds.
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use NextPDF\Core\Config;use NextPDF\Core\Document;use NextPDF\Exception\CompressionException;use NextPDF\Exception\InvalidConfigException;
/** * Resolve and validate the font directory from a server-controlled source. * * Reading the directory from the environment keeps the path off the request * surface. The function rejects a missing or unreadable directory so the * embedding path never runs against untrusted or absent input. */function resolveFontDirectory(): string{ $configured = getenv('NEXTPDF_FONT_DIR'); $dir = $configured !== false && $configured !== '' ? $configured : __DIR__ . '/fonts';
$real = realpath($dir); if ($real === false || !is_dir($real) || !is_readable($real)) { throw new RuntimeException(sprintf('Font directory "%s" is not a readable directory.', $dir)); }
return $real;}
/** * Build a compressed, font-subsetted document and return its bytes. * * @param non-empty-string $fontDirectory Validated directory of embeddable fonts. * * @return string Raw PDF bytes. */function buildCompactPdf(string $fontDirectory): string{ // compress is true by default; pin it so the intent is explicit and the // streaming writer path honours it regardless of any wrapper defaults. $config = (new Config()) ->withCompress(true) ->withFontsDirectory($fontDirectory) // Bound the image cache so a build cannot exhaust memory. This is a // memory ceiling, not an output-size control. ->withImageCacheBytes(16 * 1024 * 1024);
$doc = Document::createStandalone($config); $doc->addFontDirectory($fontDirectory); $doc->addPage();
// Rendering with an embeddable face records the used codepoints, which the // writer turns into a font subset at build time. $doc->setFont('LiberationSans', '', 12); $doc->cell(0, 10, 'Invoice 2026', newLine: true); $doc->cell(0, 10, 'Compressed streams plus an automatic font subset.', newLine: true);
// getPdfData() triggers the build: page streams and the subset font program // are FlateDecode-compressed before the bytes are returned. return $doc->getPdfData();}
try { $fontDirectory = resolveFontDirectory(); $pdf = buildCompactPdf($fontDirectory);} catch (CompressionException $e) { // Raised if the zlib encoder hard-fails while compressing a stream. throw new RuntimeException( sprintf('Compression failed for a %s stream.', $e->getAlgorithm()), previous: $e, );} catch (InvalidConfigException $e) { // Raised by the output path for an invalid destination configuration. throw new RuntimeException( sprintf('Output configuration "%s" was rejected.', $e->getConfigKey()), previous: $e, );}
$out = getenv('NEXTPDF_COOKBOOK_OUTPUT');$path = $out !== false && $out !== '' ? $out : __DIR__ . '/small.pdf';
if (file_put_contents($path, $pdf) === false) { throw new RuntimeException(sprintf('Could not write PDF to "%s".', $path));}
printf("Wrote %d bytes to %s.\n", strlen($pdf), $path);Expected STDOUT (the byte count depends on the font and the build):
Wrote <n> bytes to <path>.Edge cases & gotchas
Section titled “Edge cases & gotchas”- Compression is on by default. A fresh
Confighascompressset totrue. You rarely needwithCompress()at all. Set it explicitly only to document intent, or to opt out for a debugging build where you want to read the raw streams. - Turning compression off makes files larger, not smaller.
withCompress(false)is a diagnostic aid for inspecting uncompressed streams. It is never a size optimization. Ship with compression on. - Subsetting needs a real embedded font. The Base14 standard fonts (Helvetica, Times, Courier, and their relatives) are referenced by name and carry no embedded program in a plain document, so there is nothing to subset. Subsetting only shrinks faces you embed from a font file.
- Subsetting is automatic and silent. There is no flag, no method, and no
confirmation. If you embedded a font and rendered text with it, the writer
subsetted it. The embedded program carries a six-letter subset-prefix tag
(for example
ABCDEF+LiberationSans) so a reader can tell a subset from a full embed. - A small saving keeps the full font. When a subset would save less than ten percent of the program size, the subsetter returns the original. This is a deliberate floor: the rebuild cost is not worth a marginal gain. Embedding a face that is already tiny, or rendering nearly all of its glyphs, can land in this case.
imageCacheBytesis not an image size knob. It caps memory, not output bytes. NextPDF Core embeds the image data you give it; there is no resampling, downsampling, or re-encoding step. If you need smaller images, resize and re-encode them before you embed them.- No object-stream or dedup setting exists. NextPDF Core does not expose a toggle for PDF 2.0 object streams or for resource deduplication. Do not look for one — the size levers are stream compression and font subsetting.
Performance
Section titled “Performance”Compression at level 9 is the dominant CPU cost of writing a stream. It trades a
few percent of build time for the smallest output. The cost is linear in the
uncompressed byte count, so the page count and the amount of embedded font data
set the budget. Subsetting adds a one-time pass per embedded face that parses the
font’s table directory, resolves the used-glyph closure, and rebuilds the
required tables. For a large CJK face, this is the more expensive of the two
levers, but it runs once per font, not once per page. The ten-percent saving
floor exists partly to keep that pass off the hot path when it would not pay off.
A small document with one embedded subset sits comfortably inside a 1500 ms wall
and a 96 MB peak budget. Bound imageCacheBytes to your real ceiling so a build
that embeds many images fails fast on memory rather than swapping.
Security notes
Section titled “Security notes”The build runs in process; no document bytes leave the host and no network call is made. Treat any externally supplied font or image as untrusted input:
- Validate the font directory. The production sample reads the font path from a server-controlled environment variable and rejects a missing or unreadable directory before embedding. Never derive a font path from a request field.
- Embed only fonts you are licensed to redistribute. A subset is still an embedded font program. Confirm the license permits embedding before you ship a document that carries the face.
- Malformed fonts raise, they do not silently corrupt. A font file that fails
to parse raises
NextPDF\Exception\FontParsingException, and a hard zlib failure raisesNextPDF\Exception\CompressionException. Catch the most specific exception and act on it. Never wrap the build in an emptycatch. - Never interpolate user input into the output path. The sample writes to a
fixed path or a server-controlled side-channel, and it rejects stream wrappers
and null bytes through the atomic writer in
save(). Derive output paths from server-controlled values to avoid path traversal. - No secrets in the document. Do not embed credentials, tokens, or internal identifiers in a generated document you return to a client.
Conformance
Section titled “Conformance”This recipe makes no normative standards claim of its own. The mechanisms it uses
are defined by the PDF 2.0 specification: FlateDecode stream compression
(ISO 32000-2:2020 §7.4.4) and font subset naming with a six-character subset
prefix (ISO 32000-2:2020 §9.9). NextPDF emits both as part of its standard write
path; you do not configure them beyond the compress flag. The structural
reproducibility profile this page declares reflects that the writer pins the
DEFLATE level, so the compressed streams are deterministic, while document-level
identifiers may still vary between runs unless you also configure deterministic
settings. For the embedding mechanics behind subsetting, see the embed-and-subset
recipe linked below.
See also
Section titled “See also”- Embed and subset a TrueType font — register a face, render with it, and inspect the embedded subset tag.
- Compose text and fonts — the broader text and font composition surface that feeds the subsetting path.
- Configuration module reference — the full
Configvalue object, its withers, and their defaults. - Exception-aware error handling
— the NextPDF exception hierarchy behind
CompressionException,FontParsingException, andInvalidConfigException.