Skip to content

Batch generation over Connect with progress tracking

Run a list of documents to completion from one client process over NextPDF Connect, the engine’s standalone HTTP service distribution. This recipe submits each render request to the async-job endpoint POST /api/v1/jobs, polls each job with GET /api/v1/jobs/{id} until it reaches a terminal state, reads the status and progress fields the server reports for each job, and downloads every completed PDF from GET /api/v1/jobs/{id}/result.

The job lifecycle is fixed and small. A job is pending, then running, then exactly one terminal state: completed, failed, or cancelled. The status response carries a progress integer from 0 to 100 when the server tracks it, and a Retry-After header on every non-terminal poll that tells you when to send the next request. Key each submission with an Idempotency-Key so a retried submit returns the same job instead of starting a second render.

This recipe uses the wire-level path. It calls the REST surface directly and does not assume a language-specific software development kit (SDK), so you can port the same flow to any HTTP client.

The server side uses the standard Connect distribution:

Terminal window
composer require nextpdf/server

The PHP client in the production sample below uses a Hypertext Transfer Protocol (HTTP) client and message factories that conform to PSR-18 and PSR-17. Install the implementations your project already standardizes on, for example:

Terminal window
composer require psr/http-client psr/http-factory

The async-job surface separates submission from retrieval. You do not hold one long HTTP connection open per document. Instead you submit a job, receive an identifier, and poll a cheap status endpoint until the job finishes. That shape makes a batch manageable: the client tracks N independent jobs at once without N blocked connections.

Three endpoints carry the flow:

  • POST /api/v1/jobs accepts the same render request body as the synchronous /api/v1/render endpoint: a page_size, an orientation, and an ordered operations array. It returns 201 Created for a new job, or 200 OK when an Idempotency-Key matches a job you already submitted.
  • GET /api/v1/jobs/{id} returns the current job record. For a non-terminal job it also sets a Retry-After header (the server uses a 2-second interval) and a poll_url field. Honor the header instead of polling in a tight loop.
  • GET /api/v1/jobs/{id}/result streams the finished PDF as application/pdf. It returns 409 Conflict if the job has not reached completed, so call it only once the status poll confirms the terminal state.

Every successful response shares one envelope: a data object with the job fields, and a meta object with the request_id, timestamp, duration_ms, and api_version. The job fields you read live under data: data.status, data.progress, data.job_id, and on a completed job data.result_url.

One caveat for the current release: the server processes a submitted job inline before it answers the POST. In practice, the submit response may already carry a terminal status, and the result may be ready on the first poll. The polling-and-progress contract documented here is the stable Application Programming Interface (API) shape. The server keeps it unchanged as the processing backend moves to a queued worker pool, so a client that polls is correct today and stays correct after that change. Write the poll loop. Do not assume the first response is non-terminal, and do not assume it is terminal either.

The server OpenAPI document and the JobHandler routing define the Connect async-job REST surface:

  • POST /api/v1/jobs: submit a render job. Optional Idempotency-Key request header. Body is a render request (operations is required and must hold at least one operation). Responses: 201 new, 200 idempotent replay, 422 invalid body, 409 idempotency conflict, 429 rate limited.
  • GET /api/v1/jobs/{id}: poll status. Response 200 with the job record; Retry-After header present while non-terminal; 404 if the job does not exist or belongs to another client.
  • GET /api/v1/jobs/{id}/result: download the PDF. 200 application/pdf when completed; 409 when not yet completed; 404 if unknown.
  • DELETE /api/v1/jobs/{id}: cancel a pending or running job, or delete a completed one (204).

The job record under data carries these fields, exactly as the server serializes them.

  • job_id: the identifier (a job_ prefix and 24 hexadecimal characters).
  • status: one of pending, running, completed, failed, cancelled. The first two are non-terminal; the last three are terminal.
  • created_at, and once set, started_at and completed_at: ISO-8601 timestamps.
  • progress: an integer 0 to 100, present only when the server tracks it for the job; absent (treat as unknown) otherwise.
  • error: a message string, present only on a failed job.
  • result_url: present only on a completed job; the path to the result download.
  • poll_url: present only while the job is non-terminal.

Authentication is a bearer token in the Authorization header: Authorization: Bearer npk_live_{kid}_{secret}.

This drives one job end to end at the wire level so you can see the three calls and the fields they return. It submits, polls once, and downloads. The production sample below adds the batch loop, the Retry-After wait, and full error handling.

Terminal window
# 1. Submit an async render job. Capture the job_id from data.job_id.
curl -sS -X POST "$NEXTPDF_CONNECT_URL/api/v1/jobs" \
-H "Authorization: Bearer $NEXTPDF_CONNECT_TOKEN" \
-H 'Content-Type: application/json' \
-H "Idempotency-Key: invoice-2026-04-0001" \
-d '{"page_size":"A4","orientation":"portrait","operations":[{"type":"add_text","text":"Invoice 0001"}]}'
# 2. Poll status. Read data.status and data.progress; honour Retry-After.
curl -sS "$NEXTPDF_CONNECT_URL/api/v1/jobs/job_0123456789abcdef01234567" \
-H "Authorization: Bearer $NEXTPDF_CONNECT_TOKEN"
# 3. Once data.status is "completed", download the PDF binary.
curl -sS "$NEXTPDF_CONNECT_URL/api/v1/jobs/job_0123456789abcdef01234567/result" \
-H "Authorization: Bearer $NEXTPDF_CONNECT_TOKEN" \
-o invoice-0001.pdf

This self-contained client submits a batch of render requests, caps how many jobs are in flight at once, polls each job on the cadence the server sets through Retry-After, reports the progress value the server returns, downloads every completed PDF, and records failures. It uses a PSR-18 HTTP client and PSR-17 factories, the transport contract the Connect recipes standardize on. It also catches the most specific exception each call can raise: Psr\Http\Client\ClientExceptionInterface for a transport failure, and a typed BatchJobException for a server response that stops the batch from continuing. No catch block is empty. Each one logs and re-raises, or records a defined outcome.

Replace the in-line $documents list with your own inputs. Inject your project’s concrete HTTP client and factories where the constructor expects the PSR interfaces.

<?php
declare(strict_types=1);
require_once __DIR__ . '/vendor/autoload.php';
use Psr\Http\Client\ClientExceptionInterface;
use Psr\Http\Client\ClientInterface;
use Psr\Http\Message\RequestFactoryInterface;
use Psr\Http\Message\StreamFactoryInterface;
/**
* Raised when a Connect job response prevents the batch from proceeding.
*
* Distinct from the PSR-18 transport exception: this means the request was
* delivered and the server answered, but the answer is one the batch
* cannot act on (a non-success status code, or a job that ended in a
* terminal failure).
*/
final class BatchJobException extends RuntimeException
{
}
/**
* Drives a batch of async render jobs over the NextPDF Connect REST surface.
*
* The client submits each render request, polls every job on the cadence
* the server requests through Retry-After, and downloads each completed
* PDF. It enforces bounded concurrency so a large batch never opens more
* in-flight jobs than the host should track at once.
*/
final readonly class ConnectBatchRunner
{
/**
* @param non-empty-string $baseUrl Connect base URL, no trailing slash
* @param non-empty-string $bearerToken Connect API key (npk_live_...)
* @param positive-int $maxInFlight Concurrent jobs cap
* @param positive-int $maxPolls Per-job poll attempts before giving up
*/
public function __construct(
private ClientInterface $httpClient,
private RequestFactoryInterface $requestFactory,
private StreamFactoryInterface $streamFactory,
private string $baseUrl,
private string $bearerToken,
private int $maxInFlight = 8,
private int $maxPolls = 150,
) {}
/**
* Render every document in the batch and write each completed PDF.
*
* @param array<non-empty-string, array<string, mixed>> $documents
* Map of stable document key to render request body. The key
* doubles as the Idempotency-Key, so a re-run of the same batch
* does not duplicate server-side work.
* @param non-empty-string $outputDir Directory for the written PDFs
*
* @throws BatchJobException When the batch cannot proceed at all
* @throws ClientExceptionInterface When the transport cannot send a request
*
* @return array<non-empty-string, string> Map of document key to a
* human-readable outcome line
*/
public function run(array $documents, string $outputDir): array
{
$this->assertWritableDir($outputDir);
$outcomes = [];
// Process in bounded windows so the in-flight job count never
// exceeds the configured cap, regardless of batch size.
foreach (array_chunk($documents, $this->maxInFlight, preserve_keys: true) as $window) {
$jobIds = [];
foreach ($window as $key => $body) {
$jobIds[$key] = $this->submit($key, $body);
}
foreach ($jobIds as $key => $jobId) {
$record = $this->pollToTerminal($jobId);
$outcomes[$key] = $this->finish($key, $record, $outputDir);
}
}
return $outcomes;
}
/**
* Submit one render job and return its identifier.
*
* @param non-empty-string $idempotencyKey Stable per-document key
* @param array<string, mixed> $body Render request body
*
* @throws BatchJobException
* @throws ClientExceptionInterface
*
* @return non-empty-string The job_id from data.job_id
*/
private function submit(string $idempotencyKey, array $body): string
{
$request = $this->requestFactory
->createRequest('POST', $this->baseUrl . '/api/v1/jobs')
->withHeader('Authorization', 'Bearer ' . $this->bearerToken)
->withHeader('Content-Type', 'application/json')
->withHeader('Idempotency-Key', $idempotencyKey)
->withBody($this->streamFactory->createStream($this->encode($body)));
$response = $this->httpClient->sendRequest($request);
$status = $response->getStatusCode();
// 201 new job; 200 idempotent replay. Anything else stops the batch.
if ($status !== 201 && $status !== 200) {
throw new BatchJobException(
sprintf('Submit for "%s" returned HTTP %d.', $idempotencyKey, $status),
);
}
$data = $this->decodeData($response->getBody()->__toString());
$jobId = $data['job_id'] ?? null;
if (!is_string($jobId) || $jobId === '') {
throw new BatchJobException(
sprintf('Submit for "%s" returned no job_id.', $idempotencyKey),
);
}
return $jobId;
}
/**
* Poll one job until it reaches a terminal state.
*
* Honours the Retry-After header on every non-terminal poll. Gives up
* after maxPolls attempts and reports the wait as a failure so the
* batch records it rather than blocking forever.
*
* @param non-empty-string $jobId
*
* @throws BatchJobException
* @throws ClientExceptionInterface
*
* @return array<string, mixed> The terminal job record (data object)
*/
private function pollToTerminal(string $jobId): array
{
$url = $this->baseUrl . '/api/v1/jobs/' . rawurlencode($jobId);
for ($attempt = 0; $attempt < $this->maxPolls; $attempt++) {
$request = $this->requestFactory
->createRequest('GET', $url)
->withHeader('Authorization', 'Bearer ' . $this->bearerToken);
$response = $this->httpClient->sendRequest($request);
$status = $response->getStatusCode();
if ($status !== 200) {
throw new BatchJobException(
sprintf('Poll for job "%s" returned HTTP %d.', $jobId, $status),
);
}
$data = $this->decodeData($response->getBody()->__toString());
$jobStatus = is_string($data['status'] ?? null) ? $data['status'] : 'unknown';
$progress = is_int($data['progress'] ?? null) ? $data['progress'] : null;
$this->logProgress($jobId, $jobStatus, $progress);
// Terminal states: completed, failed, cancelled.
if (in_array($jobStatus, ['completed', 'failed', 'cancelled'], strict: true)) {
return $data;
}
// Non-terminal: wait the interval the server asked for.
$this->waitRetryAfter($response->getHeaderLine('Retry-After'));
}
throw new BatchJobException(
sprintf('Job "%s" did not finish within %d polls.', $jobId, $this->maxPolls),
);
}
/**
* Act on a terminal job record: download a completed PDF, or report.
*
* @param non-empty-string $key Document key
* @param array<string, mixed> $record Terminal job record (data object)
* @param non-empty-string $outputDir Where to write the PDF
*
* @throws BatchJobException
* @throws ClientExceptionInterface
*
* @return string A human-readable outcome line
*/
private function finish(string $key, array $record, string $outputDir): string
{
$jobStatus = is_string($record['status'] ?? null) ? $record['status'] : 'unknown';
$jobId = is_string($record['job_id'] ?? null) ? $record['job_id'] : '';
if ($jobStatus !== 'completed') {
// A failed job carries an error message; surface it, do not swallow.
$error = is_string($record['error'] ?? null) ? $record['error'] : 'no detail';
return sprintf('%s -> %s (%s)', $key, $jobStatus, $error);
}
$path = rtrim($outputDir, '/\\') . DIRECTORY_SEPARATOR . $key . '.pdf';
$this->download($jobId, $path);
return sprintf('%s -> completed, written to %s', $key, $path);
}
/**
* Download a completed job result and write it to a server-derived path.
*
* @param non-empty-string $jobId
* @param non-empty-string $path Caller-controlled output path
*
* @throws BatchJobException
* @throws ClientExceptionInterface
*/
private function download(string $jobId, string $path): void
{
$request = $this->requestFactory
->createRequest('GET', $this->baseUrl . '/api/v1/jobs/' . rawurlencode($jobId) . '/result')
->withHeader('Authorization', 'Bearer ' . $this->bearerToken);
$response = $this->httpClient->sendRequest($request);
if ($response->getStatusCode() !== 200) {
throw new BatchJobException(
sprintf('Result download for job "%s" returned HTTP %d.', $jobId, $response->getStatusCode()),
);
}
$bytes = $response->getBody()->__toString();
if (!str_starts_with($bytes, '%PDF')) {
throw new BatchJobException(
sprintf('Result for job "%s" is not a PDF.', $jobId),
);
}
if (file_put_contents($path, $bytes) === false) {
throw new BatchJobException(sprintf('Could not write result to "%s".', $path));
}
}
/**
* Sleep for the server-requested interval, with a safe floor and ceiling.
*/
private function waitRetryAfter(string $retryAfter): void
{
$seconds = ctype_digit($retryAfter) ? (int) $retryAfter : 2;
// Clamp to a sane band so a hostile header cannot stall or busy-loop.
$seconds = max(1, min(30, $seconds));
sleep($seconds);
}
/**
* Emit a progress line. Replace with your logger.
*/
private function logProgress(string $jobId, string $jobStatus, ?int $progress): void
{
$pct = $progress === null ? 'n/a' : $progress . '%';
fwrite(STDERR, sprintf("[%s] status=%s progress=%s\n", $jobId, $jobStatus, $pct));
}
/**
* Decode a response envelope and return its data object.
*
* @throws BatchJobException When the body is not the expected envelope
*
* @return array<string, mixed>
*/
private function decodeData(string $json): array
{
try {
/** @var mixed $decoded */
$decoded = json_decode($json, true, 32, JSON_THROW_ON_ERROR);
} catch (JsonException $e) {
throw new BatchJobException('Response body is not valid JSON.', previous: $e);
}
if (!is_array($decoded) || !isset($decoded['data']) || !is_array($decoded['data'])) {
throw new BatchJobException('Response is missing the data envelope.');
}
/** @var array<string, mixed> $data */
$data = $decoded['data'];
return $data;
}
/**
* @param array<string, mixed> $body
*
* @throws BatchJobException
*/
private function encode(array $body): string
{
try {
return json_encode($body, JSON_THROW_ON_ERROR | JSON_UNESCAPED_SLASHES);
} catch (JsonException $e) {
throw new BatchJobException('Render request body is not encodable.', previous: $e);
}
}
/**
* @param non-empty-string $dir
*
* @throws BatchJobException
*/
private function assertWritableDir(string $dir): void
{
if (!is_dir($dir) || !is_writable($dir)) {
throw new BatchJobException(sprintf('Output directory "%s" is not writable.', $dir));
}
}
}
// ---------------------------------------------------------------------------
// Wiring. Provide your project's concrete PSR-18 client and PSR-17 factories.
// ---------------------------------------------------------------------------
/** @var ClientInterface $httpClient */
/** @var RequestFactoryInterface $requestFactory */
/** @var StreamFactoryInterface $streamFactory */
$baseUrl = getenv('NEXTPDF_CONNECT_URL');
$token = getenv('NEXTPDF_CONNECT_TOKEN');
if ($baseUrl === false || $baseUrl === '' || $token === false || $token === '') {
fwrite(STDERR, "Set NEXTPDF_CONNECT_URL and NEXTPDF_CONNECT_TOKEN.\n");
exit(2);
}
/** @var array<non-empty-string, array<string, mixed>> $documents */
$documents = [
'invoice-0001' => [
'page_size' => 'A4',
'orientation' => 'portrait',
'operations' => [
['type' => 'add_text', 'text' => 'Invoice 0001'],
],
],
'invoice-0002' => [
'page_size' => 'A4',
'orientation' => 'portrait',
'operations' => [
['type' => 'add_text', 'text' => 'Invoice 0002'],
],
],
];
$runner = new ConnectBatchRunner(
httpClient: $httpClient,
requestFactory: $requestFactory,
streamFactory: $streamFactory,
baseUrl: rtrim($baseUrl, '/'),
bearerToken: $token,
maxInFlight: 8,
);
try {
$outcomes = $runner->run($documents, getenv('NEXTPDF_COOKBOOK_OUTPUT') ?: sys_get_temp_dir());
} catch (BatchJobException $e) {
fwrite(STDERR, 'Batch stopped: ' . $e->getMessage() . "\n");
exit(1);
} catch (ClientExceptionInterface $e) {
fwrite(STDERR, 'Transport failure: ' . $e->getMessage() . "\n");
exit(1);
}
foreach ($outcomes as $line) {
echo $line, "\n";
}

Expected STDOUT is one line per document. Paths depend on your output directory:

invoice-0001 -> completed, written to /tmp/invoice-0001.pdf
invoice-0002 -> completed, written to /tmp/invoice-0002.pdf
  • Read job fields under data, not at the top level. Every successful response is wrapped in a { "data": ..., "meta": ... } envelope. data.status and data.progress are the fields you act on; meta carries request_id for support correlation.
  • progress can be absent. The server includes progress only when it tracks it for that job. Treat a missing field as “unknown”, not as zero, and drive your loop off status, which is always present.
  • Submission may already be terminal. In the current release the server renders inline before answering the POST, so the submit response can carry status: completed and the result may be ready on the first poll. Your poll loop must accept a terminal state on attempt zero rather than insist on a pending first.
  • Honor Retry-After. Non-terminal status responses set Retry-After (a 2-second interval). Polling faster wastes requests and invites a 429. Clamp the value to a sane band rather than trust it blindly.
  • /result before completion is a 409. Call the result endpoint only after the status poll shows completed. A 409 Conflict means the job is not done; it is not a transport error.
  • Idempotency-Key prevents duplicate work. A retried submit with the same key returns the original job (200 instead of 201). Use a stable per-document key so a network retry never starts a second render. A reused key with a different body is a 409 conflict.
  • Jobs are owner-scoped. A job submitted under one API key is invisible to another; a cross-owner GET returns 404, not 403. Poll with the same credential you submitted with.
  • A failed job carries an error message. Read data.error on a terminal failed status and record it. Do not retry blindly.

The cost of a batch is the sum of the renders plus the polling overhead. Two levers control the client side. First, bound concurrency: the maxInFlight cap fixes how many jobs are tracked at once, which keeps the client’s open-request count and memory flat regardless of batch size. Set it to match the server’s worker count, not higher; more in-flight jobs than workers only lengthens each job’s queue wait. Second, respect the poll interval: each poll is a cheap status read, but a tight loop increases request volume and triggers the rate limiter. The server’s 2-second Retry-After is the right default, and the runner clamps to a 1-to-30-second band so a single slow job cannot busy-loop or stall the window.

For very large batches, process in windows (the runner uses array_chunk) rather than submit everything up front. That bounds both the client’s tracked state and the server’s queue depth, so a malformed or oversized batch fails inside one window instead of after thousands of submissions.

  • Keep the bearer token out of logs and URLs. The API key travels in the Authorization header only. Never place it in a query string, a log line, or a written artifact. The runner logs the job_id and status, never the credential.
  • Derive output paths from server-controlled keys. The runner builds each output path from the document key your code chose, joined to a fixed output directory, never from a value in a server response. Do not interpolate a job field into a filesystem path, which would open a path traversal.
  • Validate the downloaded bytes. The runner checks a 200 from /result for the %PDF header before it writes the file. A successful download status is not on its own proof the body is a PDF.
  • Treat the result as untrusted until inspected. A completed job means the server rendered bytes, not that those bytes are safe to forward. Run results through a structural inspection step before you hand them to a client or downstream system.
  • Use a least-privilege key. The async-job surface is core-tier rendering. Issue the batch a key scoped to exactly the operations it needs, and rotate it on the schedule your secret-management policy sets.
  • Bound the poll budget. maxPolls stops a stuck job from holding the client forever. The batch records the timeout as an outcome rather than blocking, which keeps one bad job from denying service to the rest.

This recipe makes no normative standards claim. It consumes the NextPDF Connect async-job REST endpoints (POST /api/v1/jobs, GET /api/v1/jobs/{id}, GET /api/v1/jobs/{id}/result) and reads the job record fields the server defines (status, progress, error, result_url, poll_url). The %PDF header check on a downloaded result confirms only that the response begins with the PDF marker; it is not a validity or conformance determination. For a standards check across a set of documents, use the Enterprise batch compliance tool. See Batch standards check over Connect, a different surface from the rendering jobs covered here.