Skip to content

Convert Office documents to PDF with Gotenberg

The Gotenberg bridge converts an Office document to PDF. It sends the document to a Gotenberg microservice over HTTPS and returns the PDF bytes. You describe the service with an immutable GotenbergConfig, wire a PSR-18 client and PSR-17 factories into GotenbergBridge, check service health, and convert a file from disk or bytes in memory. This guide covers extension-based format detection, the health probe, the typed failure contract, and the hand-off to NextPDF post-processing.

Prerequisites, stated up front:

  • NextPDF core and nextpdf/gotenberg are installed.
  • A Gotenberg service is reachable over HTTPS. The bridge rejects a plain http:// URL before any request leaves the process.
  • A PSR-18 client and PSR-17 request and stream factories are installed. For DNS and TLS pinning, you also supply a PSR-17 response factory.
  • The input is one of the six recognized Office formats: .docx, .xlsx, .pptx, .odt, .ods, or .odp. The bridge rejects any other extension with a ValueError.

This is a how-to guide. For a complete, runnable program, read the Gotenberg quickstart.

Install the bridge, a PSR-18 client, and PSR-17 factories.

Terminal window
composer require nextpdf/gotenberg guzzlehttp/guzzle

Run a Gotenberg service that is reachable over HTTPS. Source any bearer token from a secrets manager or an injected environment value. The bridge never reads environment variables and never constructs an HTTP client; you supply both.

GotenbergBridge::convertFile() takes a path on disk. It canonicalizes the path to block traversal, maps the file extension to a supported format, checks the size and filename, and sends a multipart request to <apiUrl>/forms/libreoffice/convert. convertString() follows the same path for bytes you already hold; it uses the original filename so the extension can be detected.

Format detection uses the extension. The bridge maps .docx, .xlsx, .pptx, .odt, .ods, and .odp to their formats and rejects anything else with a ValueError before any network traffic. The result object exposes the detected source format as an enum value.

The bridge makes one synchronous HTTP round-trip wrapped in validation. It does not retry, queue, cache, or rate-limit; those controls belong in the application around the bridge. Treat each conversion as a remote call to a service you operate but do not control in-process, and design for its latency and failure modes.

The bridge reports failures as typed exceptions and never returns a partial or unvalidated result:

  • A non-200 status, a Content-Type without application/pdf, or a body that does not begin with %PDF raises GotenbergConvertException. The bridge returns a result only when all three checks pass.
  • A PSR-18 client failure, including a network failure or timeout, is wrapped as GotenbergConvertException with the original exception as its cause.
  • Validation failures (non-HTTPS URL, private or reserved address, oversized input, unsafe filename) raise RuntimeException before any network traffic.
  • An unrecognized file extension raises ValueError before any network traffic.
// Configuration (final readonly):
new GotenbergConfig(
string $apiUrl, // required, must be HTTPS
int $timeout = 30, // hard transfer timeout, seconds
int $maxFileSize = 52_428_800, // 50 MiB
string $apiKey = '', // #[SensitiveParameter]; Bearer when non-empty
list<string> $pinnedPublicKeys = [], // sha256/<base64>
list<string> $backupPublicKeys = [],
)
GotenbergConfig::fromArray(array $config): self
GotenbergConfig::isValid(): bool
// The bridge:
new GotenbergBridge(
GotenbergConfig $config,
ClientInterface $httpClient, // PSR-18
RequestFactoryInterface $requestFactory, // PSR-17
StreamFactoryInterface $streamFactory, // PSR-17
?LoggerInterface $logger = null, // PSR-3
?HtmlSecurityPolicyInterface $htmlSecurityPolicy = null,
?ResponseFactoryInterface $responseFactory = null, // enables pinned transport
)
GotenbergBridge::isAvailable(): bool
GotenbergBridge::convertFile(string $path): GotenbergConvertResult
GotenbergBridge::convertString(string $bytes, string $originalFilename): GotenbergConvertResult

The result object exposes pdfData, the sourceFormat enum, isValid() (true when the body is non-empty and starts with %PDF), and size(). For the full field reference, the fromArray() key map, and the transport selection rules, see the Gotenberg configuration page linked under See also.

Describe the service, wire the bridge, probe it, and convert one file.

convert-quickstart.php
<?php
declare(strict_types=1);
require __DIR__ . '/vendor/autoload.php';
use NextPDF\Gotenberg\GotenbergBridge;
use NextPDF\Gotenberg\GotenbergConfig;
use NextPDF\Gotenberg\GotenbergConvertException;
$config = new GotenbergConfig(
apiUrl: 'https://gotenberg.example.com',
timeout: 60,
apiKey: getenv('GOTENBERG_TOKEN') ?: '',
);
$bridge = new GotenbergBridge(
config: $config,
httpClient: $httpClient, // your PSR-18 client
requestFactory: $requestFactory, // your PSR-17 factory
streamFactory: $streamFactory, // your PSR-17 factory
responseFactory: $responseFactory, // enables the pinned transport
);
// Probe before converting. The probe validates the URL with no network
// traffic, then sends a HEAD to <apiUrl>/health.
if (!$bridge->isAvailable()) {
throw new RuntimeException('Gotenberg is not reachable.');
}
try {
$result = $bridge->convertFile('/path/to/report.docx');
} catch (GotenbergConvertException $exception) {
// Bad config, HTTP failure, non-200, wrong Content-Type, or non-PDF body.
throw $exception;
}
if (!$result->isValid()) {
throw new RuntimeException('Result is not a valid PDF.');
}
file_put_contents('/path/to/report.pdf', $result->pdfData);

The class is NextPDF\Gotenberg\GotenbergConfig (the line above uses the exact namespace your code must import). isAvailable() returns false and never throws for an empty, non-HTTPS, or private-address URL, or for any network error; a status below 500 from /health means available.

A production conversion catches each failure type separately, retries only when the conditions are right, and bounds concurrency on the caller side. The catch order below is exhaustive.

OfficeConverter.php
<?php
declare(strict_types=1);
use NextPDF\Gotenberg\GotenbergBridge;
use NextPDF\Gotenberg\GotenbergConvertException;
use Psr\Log\LoggerInterface;
use RuntimeException;
use ValueError;
final readonly class OfficeConverter
{
public function __construct(
private GotenbergBridge $bridge,
private LoggerInterface $logger,
) {}
public function convert(string $path): string
{
try {
$result = $this->bridge->convertFile($path);
} catch (GotenbergConvertException $exception) {
// Transport, non-200, wrong Content-Type, or non-PDF body.
// Retry only on transport-level or 502/503/504 causes, with
// bounded exponential backoff and jitter — never blind retries.
$this->logger->error('gotenberg.convert.failed', [
'path' => basename($path),
'exception' => $exception::class,
]);
throw $exception;
} catch (ValueError $exception) {
// Extension is not one of the six recognized Office formats.
$this->logger->warning('gotenberg.convert.unsupported_format', [
'path' => basename($path),
]);
throw $exception;
} catch (RuntimeException $exception) {
// Non-HTTPS URL, private address, oversized input, or unsafe name.
$this->logger->error('gotenberg.convert.rejected', [
'path' => basename($path),
'exception' => $exception::class,
]);
throw $exception;
}
if (!$result->isValid()) {
throw new RuntimeException('Gotenberg returned an invalid PDF body.');
}
return $result->pdfData;
}
}

Retry only on a transport-level GotenbergConvertException (a wrapped PSR-18 client exception) and on idempotent server errors (502, 503, 504). A 400-class response usually means the input is wrong, so a retry fails the same way. Cap total attempts and total wall time. Bound the number of in-flight conversions to the capacity your Gotenberg deployment sustains. The bridge itself is stateless and safe to use from many workers, but the service has finite conversion capacity.

  • Format detection is by extension. A .docx renamed to .txt is rejected with ValueError; a .txt renamed to .docx is sent to Gotenberg and fails there. When accepting uploads, trust the real format, not the name.
  • fromArray() is forgiving by design. It silently substitutes defaults for malformed input. Validate the source array in your boot path so a missing URL surfaces early as a configuration error, not as a per-conversion exception.
  • The size cap is enforced in-process. maxFileSize (default 50 MiB) is checked before the request is sent, so an oversized file never consumes service capacity. Lower the cap to match what your documents need; a smaller cap is a cheaper denial-of-service control.
  • The probe is not free. Call isAvailable() from a readiness or health endpoint, not before every conversion. Running it per conversion doubles your request rate against the service for no benefit.
  • No in-process caching. If the same document is converted repeatedly, cache the resulting PDF in your application, keyed by a content hash of the input.
  • renderTimeMs is yours to set. The result’s timing field is 0.0 unless your integration measures and sets it. Time the call yourself if you need the number.

For the duration of the request, a conversion holds one connection and one LibreOffice worker on the Gotenberg side, and Office conversion takes time. Set timeout from measured conversion latency for your real documents, with headroom. Keep it below any upstream gateway or PHP max_execution_time, so the bridge times out first and you get a typed exception instead of a killed process. Bound concurrency with a queue, semaphore, or worker pool sized to the service capacity. There is no in-process cache; add one in your application if you convert the same input repeatedly.

  • HTTPS and address screening before sending. The bridge rejects a non-HTTPS URL and a destination that resolves into private or reserved address space before any request leaves the process. Each retried call re-runs that validation, so a retry cannot bypass the SSRF guard.
  • Pinned transport on request. When you supply a response factory and pins (or there is a resolved IP set), the bridge binds the connection to the resolved addresses, enforces SPKI pinning, verifies peer and host, applies the timeout, and disables redirect following. Configure a backup pin before a certificate rotation.
  • Do not trust the declared content type of an upload. When accepting user uploads, validate the real file type yourself; the extension-to-format map is a routing decision, not an authenticity check.
  • Secrets are redacted and immutable. apiKey carries #[SensitiveParameter], and the config is final readonly. Source the token from a secrets manager; never commit it. The logged conversion entry carries the URL, filename, format, and content length — never the file contents or the token.
  • Never write an empty catch block. Each example catches the specific type and logs with context.

For the full security and deployment model, see the Gotenberg security and operations page. The PSR-18 transport contract and the do-not-trust-content-type guidance are pinned to their clauses on the upstream production-usage page.

This guide makes no normative standards claim of its own. The bridge’s PSR-18 transport behaviour (a client raises only when it cannot send or parse a response; a 4xx/5xx is a normal return value), the file-upload validation guidance, and the TLS-pinning model are pinned to PSR-18, OWASP, and RFC 7469 on the upstream Gotenberg production-usage and configuration pages. This cookbook page restates the usage and defers those citations to those pages. The bridge produces PDF bytes and stops. Signing, PDF/A profiles, and watermarking are NextPDF post-processing concerns and a commercial-edition capability, not part of this bridge.