Batch generation over Connect with progress tracking
At a glance
Section titled “At a glance”Run a list of documents to completion from one client process over
NextPDF Connect, the engine’s standalone HTTP service distribution. This
recipe submits each render request to the async-job endpoint
POST /api/v1/jobs, polls each job with GET /api/v1/jobs/{id} until it
reaches a terminal state, reads the status and progress fields the
server reports for each job, and downloads every completed PDF from
GET /api/v1/jobs/{id}/result.
The job lifecycle is fixed and small. A job is pending, then running,
then exactly one terminal state: completed, failed, or cancelled.
The status response carries a progress integer from 0 to 100 when the
server tracks it, and a Retry-After header on every non-terminal poll
that tells you when to send the next request. Key each submission with an
Idempotency-Key so a retried submit returns the same job instead of
starting a second render.
This recipe uses the wire-level path. It calls the REST surface directly and does not assume a language-specific software development kit (SDK), so you can port the same flow to any HTTP client.
Install
Section titled “Install”The server side uses the standard Connect distribution:
composer require nextpdf/serverThe PHP client in the production sample below uses a Hypertext Transfer Protocol (HTTP) client and message factories that conform to PSR-18 and PSR-17. Install the implementations your project already standardizes on, for example:
composer require psr/http-client psr/http-factoryConceptual overview
Section titled “Conceptual overview”The async-job surface separates submission from retrieval. You do not hold one long HTTP connection open per document. Instead you submit a job, receive an identifier, and poll a cheap status endpoint until the job finishes. That shape makes a batch manageable: the client tracks N independent jobs at once without N blocked connections.
Three endpoints carry the flow:
POST /api/v1/jobsaccepts the same render request body as the synchronous/api/v1/renderendpoint: apage_size, anorientation, and an orderedoperationsarray. It returns201 Createdfor a new job, or200 OKwhen anIdempotency-Keymatches a job you already submitted.GET /api/v1/jobs/{id}returns the current job record. For a non-terminal job it also sets aRetry-Afterheader (the server uses a 2-second interval) and apoll_urlfield. Honor the header instead of polling in a tight loop.GET /api/v1/jobs/{id}/resultstreams the finished PDF asapplication/pdf. It returns409 Conflictif the job has not reachedcompleted, so call it only once the status poll confirms the terminal state.
Every successful response shares one envelope: a data object with the
job fields, and a meta object with the request_id, timestamp,
duration_ms, and api_version. The job fields you read live under
data: data.status, data.progress, data.job_id, and on a completed
job data.result_url.
One caveat for the current release: the server processes a submitted job
inline before it answers the POST. In practice, the submit response may
already carry a terminal status, and the result may be ready on the
first poll. The polling-and-progress contract documented here is the
stable Application Programming Interface (API) shape. The server keeps it
unchanged as the processing backend moves to a queued worker pool, so a
client that polls is correct today and stays correct after that change.
Write the poll loop. Do not assume the first response is non-terminal, and
do not assume it is terminal either.
API surface
Section titled “API surface”The server OpenAPI document and the JobHandler routing define the
Connect async-job REST surface:
POST /api/v1/jobs: submit a render job. OptionalIdempotency-Keyrequest header. Body is a render request (operationsis required and must hold at least one operation). Responses:201new,200idempotent replay,422invalid body,409idempotency conflict,429rate limited.GET /api/v1/jobs/{id}: poll status. Response200with the job record;Retry-Afterheader present while non-terminal;404if the job does not exist or belongs to another client.GET /api/v1/jobs/{id}/result: download the PDF.200application/pdfwhencompleted;409when not yet completed;404if unknown.DELETE /api/v1/jobs/{id}: cancel apendingorrunningjob, or delete acompletedone (204).
The job record under data carries these fields, exactly as the server
serializes them.
job_id: the identifier (ajob_prefix and 24 hexadecimal characters).status: one ofpending,running,completed,failed,cancelled. The first two are non-terminal; the last three are terminal.created_at, and once set,started_atandcompleted_at: ISO-8601 timestamps.progress: an integer 0 to 100, present only when the server tracks it for the job; absent (treat as unknown) otherwise.error: a message string, present only on afailedjob.result_url: present only on acompletedjob; the path to the result download.poll_url: present only while the job is non-terminal.
Authentication is a bearer token in the Authorization header:
Authorization: Bearer npk_live_{kid}_{secret}.
Code sample — Quick start
Section titled “Code sample — Quick start”This drives one job end to end at the wire level so you can see the three
calls and the fields they return. It submits, polls once, and downloads.
The production sample below adds the batch loop, the Retry-After wait,
and full error handling.
# 1. Submit an async render job. Capture the job_id from data.job_id.curl -sS -X POST "$NEXTPDF_CONNECT_URL/api/v1/jobs" \ -H "Authorization: Bearer $NEXTPDF_CONNECT_TOKEN" \ -H 'Content-Type: application/json' \ -H "Idempotency-Key: invoice-2026-04-0001" \ -d '{"page_size":"A4","orientation":"portrait","operations":[{"type":"add_text","text":"Invoice 0001"}]}'
# 2. Poll status. Read data.status and data.progress; honour Retry-After.curl -sS "$NEXTPDF_CONNECT_URL/api/v1/jobs/job_0123456789abcdef01234567" \ -H "Authorization: Bearer $NEXTPDF_CONNECT_TOKEN"
# 3. Once data.status is "completed", download the PDF binary.curl -sS "$NEXTPDF_CONNECT_URL/api/v1/jobs/job_0123456789abcdef01234567/result" \ -H "Authorization: Bearer $NEXTPDF_CONNECT_TOKEN" \ -o invoice-0001.pdfCode sample — Production
Section titled “Code sample — Production”This self-contained client submits a batch of render requests, caps how
many jobs are in flight at once, polls each job on the cadence the server
sets through Retry-After, reports the progress value the server
returns, downloads every completed PDF, and records failures. It uses a
PSR-18 HTTP client and PSR-17 factories, the transport contract the Connect
recipes standardize on. It also catches the most specific exception each
call can raise: Psr\Http\Client\ClientExceptionInterface for a transport
failure, and a typed BatchJobException for a server response that stops
the batch from continuing. No catch block is empty. Each one logs and
re-raises, or records a defined outcome.
Replace the in-line $documents list with your own inputs. Inject your
project’s concrete HTTP client and factories where the constructor expects
the PSR interfaces.
<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use Psr\Http\Client\ClientExceptionInterface;use Psr\Http\Client\ClientInterface;use Psr\Http\Message\RequestFactoryInterface;use Psr\Http\Message\StreamFactoryInterface;
/** * Raised when a Connect job response prevents the batch from proceeding. * * Distinct from the PSR-18 transport exception: this means the request was * delivered and the server answered, but the answer is one the batch * cannot act on (a non-success status code, or a job that ended in a * terminal failure). */final class BatchJobException extends RuntimeException{}
/** * Drives a batch of async render jobs over the NextPDF Connect REST surface. * * The client submits each render request, polls every job on the cadence * the server requests through Retry-After, and downloads each completed * PDF. It enforces bounded concurrency so a large batch never opens more * in-flight jobs than the host should track at once. */final readonly class ConnectBatchRunner{ /** * @param non-empty-string $baseUrl Connect base URL, no trailing slash * @param non-empty-string $bearerToken Connect API key (npk_live_...) * @param positive-int $maxInFlight Concurrent jobs cap * @param positive-int $maxPolls Per-job poll attempts before giving up */ public function __construct( private ClientInterface $httpClient, private RequestFactoryInterface $requestFactory, private StreamFactoryInterface $streamFactory, private string $baseUrl, private string $bearerToken, private int $maxInFlight = 8, private int $maxPolls = 150, ) {}
/** * Render every document in the batch and write each completed PDF. * * @param array<non-empty-string, array<string, mixed>> $documents * Map of stable document key to render request body. The key * doubles as the Idempotency-Key, so a re-run of the same batch * does not duplicate server-side work. * @param non-empty-string $outputDir Directory for the written PDFs * * @throws BatchJobException When the batch cannot proceed at all * @throws ClientExceptionInterface When the transport cannot send a request * * @return array<non-empty-string, string> Map of document key to a * human-readable outcome line */ public function run(array $documents, string $outputDir): array { $this->assertWritableDir($outputDir);
$outcomes = [];
// Process in bounded windows so the in-flight job count never // exceeds the configured cap, regardless of batch size. foreach (array_chunk($documents, $this->maxInFlight, preserve_keys: true) as $window) { $jobIds = [];
foreach ($window as $key => $body) { $jobIds[$key] = $this->submit($key, $body); }
foreach ($jobIds as $key => $jobId) { $record = $this->pollToTerminal($jobId); $outcomes[$key] = $this->finish($key, $record, $outputDir); } }
return $outcomes; }
/** * Submit one render job and return its identifier. * * @param non-empty-string $idempotencyKey Stable per-document key * @param array<string, mixed> $body Render request body * * @throws BatchJobException * @throws ClientExceptionInterface * * @return non-empty-string The job_id from data.job_id */ private function submit(string $idempotencyKey, array $body): string { $request = $this->requestFactory ->createRequest('POST', $this->baseUrl . '/api/v1/jobs') ->withHeader('Authorization', 'Bearer ' . $this->bearerToken) ->withHeader('Content-Type', 'application/json') ->withHeader('Idempotency-Key', $idempotencyKey) ->withBody($this->streamFactory->createStream($this->encode($body)));
$response = $this->httpClient->sendRequest($request); $status = $response->getStatusCode();
// 201 new job; 200 idempotent replay. Anything else stops the batch. if ($status !== 201 && $status !== 200) { throw new BatchJobException( sprintf('Submit for "%s" returned HTTP %d.', $idempotencyKey, $status), ); }
$data = $this->decodeData($response->getBody()->__toString()); $jobId = $data['job_id'] ?? null;
if (!is_string($jobId) || $jobId === '') { throw new BatchJobException( sprintf('Submit for "%s" returned no job_id.', $idempotencyKey), ); }
return $jobId; }
/** * Poll one job until it reaches a terminal state. * * Honours the Retry-After header on every non-terminal poll. Gives up * after maxPolls attempts and reports the wait as a failure so the * batch records it rather than blocking forever. * * @param non-empty-string $jobId * * @throws BatchJobException * @throws ClientExceptionInterface * * @return array<string, mixed> The terminal job record (data object) */ private function pollToTerminal(string $jobId): array { $url = $this->baseUrl . '/api/v1/jobs/' . rawurlencode($jobId);
for ($attempt = 0; $attempt < $this->maxPolls; $attempt++) { $request = $this->requestFactory ->createRequest('GET', $url) ->withHeader('Authorization', 'Bearer ' . $this->bearerToken);
$response = $this->httpClient->sendRequest($request); $status = $response->getStatusCode();
if ($status !== 200) { throw new BatchJobException( sprintf('Poll for job "%s" returned HTTP %d.', $jobId, $status), ); }
$data = $this->decodeData($response->getBody()->__toString()); $jobStatus = is_string($data['status'] ?? null) ? $data['status'] : 'unknown'; $progress = is_int($data['progress'] ?? null) ? $data['progress'] : null;
$this->logProgress($jobId, $jobStatus, $progress);
// Terminal states: completed, failed, cancelled. if (in_array($jobStatus, ['completed', 'failed', 'cancelled'], strict: true)) { return $data; }
// Non-terminal: wait the interval the server asked for. $this->waitRetryAfter($response->getHeaderLine('Retry-After')); }
throw new BatchJobException( sprintf('Job "%s" did not finish within %d polls.', $jobId, $this->maxPolls), ); }
/** * Act on a terminal job record: download a completed PDF, or report. * * @param non-empty-string $key Document key * @param array<string, mixed> $record Terminal job record (data object) * @param non-empty-string $outputDir Where to write the PDF * * @throws BatchJobException * @throws ClientExceptionInterface * * @return string A human-readable outcome line */ private function finish(string $key, array $record, string $outputDir): string { $jobStatus = is_string($record['status'] ?? null) ? $record['status'] : 'unknown'; $jobId = is_string($record['job_id'] ?? null) ? $record['job_id'] : '';
if ($jobStatus !== 'completed') { // A failed job carries an error message; surface it, do not swallow. $error = is_string($record['error'] ?? null) ? $record['error'] : 'no detail';
return sprintf('%s -> %s (%s)', $key, $jobStatus, $error); }
$path = rtrim($outputDir, '/\\') . DIRECTORY_SEPARATOR . $key . '.pdf'; $this->download($jobId, $path);
return sprintf('%s -> completed, written to %s', $key, $path); }
/** * Download a completed job result and write it to a server-derived path. * * @param non-empty-string $jobId * @param non-empty-string $path Caller-controlled output path * * @throws BatchJobException * @throws ClientExceptionInterface */ private function download(string $jobId, string $path): void { $request = $this->requestFactory ->createRequest('GET', $this->baseUrl . '/api/v1/jobs/' . rawurlencode($jobId) . '/result') ->withHeader('Authorization', 'Bearer ' . $this->bearerToken);
$response = $this->httpClient->sendRequest($request);
if ($response->getStatusCode() !== 200) { throw new BatchJobException( sprintf('Result download for job "%s" returned HTTP %d.', $jobId, $response->getStatusCode()), ); }
$bytes = $response->getBody()->__toString();
if (!str_starts_with($bytes, '%PDF')) { throw new BatchJobException( sprintf('Result for job "%s" is not a PDF.', $jobId), ); }
if (file_put_contents($path, $bytes) === false) { throw new BatchJobException(sprintf('Could not write result to "%s".', $path)); } }
/** * Sleep for the server-requested interval, with a safe floor and ceiling. */ private function waitRetryAfter(string $retryAfter): void { $seconds = ctype_digit($retryAfter) ? (int) $retryAfter : 2; // Clamp to a sane band so a hostile header cannot stall or busy-loop. $seconds = max(1, min(30, $seconds)); sleep($seconds); }
/** * Emit a progress line. Replace with your logger. */ private function logProgress(string $jobId, string $jobStatus, ?int $progress): void { $pct = $progress === null ? 'n/a' : $progress . '%'; fwrite(STDERR, sprintf("[%s] status=%s progress=%s\n", $jobId, $jobStatus, $pct)); }
/** * Decode a response envelope and return its data object. * * @throws BatchJobException When the body is not the expected envelope * * @return array<string, mixed> */ private function decodeData(string $json): array { try { /** @var mixed $decoded */ $decoded = json_decode($json, true, 32, JSON_THROW_ON_ERROR); } catch (JsonException $e) { throw new BatchJobException('Response body is not valid JSON.', previous: $e); }
if (!is_array($decoded) || !isset($decoded['data']) || !is_array($decoded['data'])) { throw new BatchJobException('Response is missing the data envelope.'); }
/** @var array<string, mixed> $data */ $data = $decoded['data'];
return $data; }
/** * @param array<string, mixed> $body * * @throws BatchJobException */ private function encode(array $body): string { try { return json_encode($body, JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES); } catch (JsonException $e) { throw new BatchJobException('Render request body is not encodable.', previous: $e); } }
/** * @param non-empty-string $dir * * @throws BatchJobException */ private function assertWritableDir(string $dir): void { if (!is_dir($dir) || !is_writable($dir)) { throw new BatchJobException(sprintf('Output directory "%s" is not writable.', $dir)); } }}
// ---------------------------------------------------------------------------// Wiring. Provide your project's concrete PSR-18 client and PSR-17 factories.// ---------------------------------------------------------------------------
/** @var ClientInterface $httpClient *//** @var RequestFactoryInterface $requestFactory *//** @var StreamFactoryInterface $streamFactory */
$baseUrl = getenv('NEXTPDF_CONNECT_URL');$token = getenv('NEXTPDF_CONNECT_TOKEN');
if ($baseUrl === false || $baseUrl === '' || $token === false || $token === '') { fwrite(STDERR, "Set NEXTPDF_CONNECT_URL and NEXTPDF_CONNECT_TOKEN.\n"); exit(2);}
/** @var array<non-empty-string, array<string, mixed>> $documents */$documents = [ 'invoice-0001' => [ 'page_size' => 'A4', 'orientation' => 'portrait', 'operations' => [ ['type' => 'add_text', 'text' => 'Invoice 0001'], ], ], 'invoice-0002' => [ 'page_size' => 'A4', 'orientation' => 'portrait', 'operations' => [ ['type' => 'add_text', 'text' => 'Invoice 0002'], ], ],];
$runner = new ConnectBatchRunner( httpClient: $httpClient, requestFactory: $requestFactory, streamFactory: $streamFactory, baseUrl: rtrim($baseUrl, '/'), bearerToken: $token, maxInFlight: 8,);
try { $outcomes = $runner->run($documents, getenv('NEXTPDF_COOKBOOK_OUTPUT') ?: sys_get_temp_dir());} catch (BatchJobException $e) { fwrite(STDERR, 'Batch stopped: ' . $e->getMessage() . "\n"); exit(1);} catch (ClientExceptionInterface $e) { fwrite(STDERR, 'Transport failure: ' . $e->getMessage() . "\n"); exit(1);}
foreach ($outcomes as $line) { echo $line, "\n";}Expected STDOUT is one line per document. Paths depend on your output directory:
invoice-0001 -> completed, written to /tmp/invoice-0001.pdfinvoice-0002 -> completed, written to /tmp/invoice-0002.pdfEdge cases & gotchas
Section titled “Edge cases & gotchas”- Read job fields under
data, not at the top level. Every successful response is wrapped in a{ "data": ..., "meta": ... }envelope.data.statusanddata.progressare the fields you act on;metacarriesrequest_idfor support correlation. progresscan be absent. The server includesprogressonly when it tracks it for that job. Treat a missing field as “unknown”, not as zero, and drive your loop offstatus, which is always present.- Submission may already be terminal. In the current release the
server renders inline before answering the
POST, so the submit response can carrystatus: completedand the result may be ready on the first poll. Your poll loop must accept a terminal state on attempt zero rather than insist on apendingfirst. - Honor
Retry-After. Non-terminal status responses setRetry-After(a 2-second interval). Polling faster wastes requests and invites a429. Clamp the value to a sane band rather than trust it blindly. /resultbefore completion is a409. Call the result endpoint only after the status poll showscompleted. A409 Conflictmeans the job is not done; it is not a transport error.- Idempotency-Key prevents duplicate work. A retried submit with the
same key returns the original job (
200instead of201). Use a stable per-document key so a network retry never starts a second render. A reused key with a different body is a409conflict. - Jobs are owner-scoped. A job submitted under one API key is invisible
to another; a cross-owner
GETreturns404, not403. Poll with the same credential you submitted with. - A
failedjob carries anerrormessage. Readdata.erroron a terminalfailedstatus and record it. Do not retry blindly.
Performance
Section titled “Performance”The cost of a batch is the sum of the renders plus the polling overhead.
Two levers control the client side. First, bound concurrency: the
maxInFlight cap fixes how many jobs are tracked at once, which keeps the
client’s open-request count and memory flat regardless of batch size. Set
it to match the server’s worker count, not higher; more in-flight jobs
than workers only lengthens each job’s queue wait. Second, respect the
poll interval: each poll is a cheap status read, but a tight loop increases
request volume and triggers the rate limiter. The server’s 2-second
Retry-After is the right default, and the runner clamps to a
1-to-30-second band so a single slow job cannot busy-loop or stall the
window.
For very large batches, process in windows (the runner uses array_chunk)
rather than submit everything up front. That bounds both the client’s
tracked state and the server’s queue depth, so a malformed or oversized
batch fails inside one window instead of after thousands of submissions.
Security notes
Section titled “Security notes”- Keep the bearer token out of logs and URLs. The API key travels in
the
Authorizationheader only. Never place it in a query string, a log line, or a written artifact. The runner logs thejob_idandstatus, never the credential. - Derive output paths from server-controlled keys. The runner builds each output path from the document key your code chose, joined to a fixed output directory, never from a value in a server response. Do not interpolate a job field into a filesystem path, which would open a path traversal.
- Validate the downloaded bytes. The runner checks a
200from/resultfor the%PDFheader before it writes the file. A successful download status is not on its own proof the body is a PDF. - Treat the result as untrusted until inspected. A completed job means the server rendered bytes, not that those bytes are safe to forward. Run results through a structural inspection step before you hand them to a client or downstream system.
- Use a least-privilege key. The async-job surface is core-tier rendering. Issue the batch a key scoped to exactly the operations it needs, and rotate it on the schedule your secret-management policy sets.
- Bound the poll budget.
maxPollsstops a stuck job from holding the client forever. The batch records the timeout as an outcome rather than blocking, which keeps one bad job from denying service to the rest.
Conformance
Section titled “Conformance”This recipe makes no normative standards claim. It consumes the NextPDF
Connect async-job REST endpoints (POST /api/v1/jobs,
GET /api/v1/jobs/{id}, GET /api/v1/jobs/{id}/result) and reads the job
record fields the server defines (status, progress, error,
result_url, poll_url). The %PDF header check on a downloaded result
confirms only that the response begins with the PDF marker; it is not a
validity or conformance determination. For a standards check across a set
of documents, use the Enterprise batch compliance tool. See
Batch standards check over Connect,
a different surface from the rendering jobs covered here.
See also
Section titled “See also”- Hello world over Connect: the smallest single render before you batch.
- Connect recipe conventions: the transport, authentication, and conformance contract every Connect recipe shares.
- Exception-aware error handling over Connect: how the server reports errors and how a client should react.
- Batch standards check over Connect: the Enterprise compliance surface, distinct from these render jobs.