跳转到内容

Python SDK 快速入门

从 PyPI 安装 SDK:

Terminal window
pip install nextpdf

创建一个指向你的 NextPDF Connect endpoint 的客户端:

from nextpdf import NextPDF
client = NextPDF(base_url="http://localhost:8080", api_key="your-key")
with open("document.pdf", "rb") as file:
blocks = client.ast.extract_cited_text(file.read())
for block in blocks:
page = block.citation.page_index
confidence = block.citation.confidence
print(f"[page {page}, confidence {confidence:.2f}] {block.text[:100]}")

如果你的 endpoint 不需要 API key,则可以省略 api_key

CLI 和 AI Agent(代理)工作流程可以从环境变量读取连接设置:

Terminal window
export NEXTPDF_BASE_URL=http://localhost:8080
export NEXTPDF_API_KEY=your-key

在 Windows PowerShell 中:

Terminal window
$env:NEXTPDF_BASE_URL = "http://localhost:8080"
$env:NEXTPDF_API_KEY = "your-key"

在提取调用外层捕获 SDK 和 API 异常:

from nextpdf import NextPDF
from nextpdf.models.errors import NextPDFAPIError, NextPDFError, QuotaExceededError
client = NextPDF(base_url="http://localhost:8080", api_key="your-key")
try:
with open("document.pdf", "rb") as file:
blocks = client.ast.extract_cited_text(file.read())
except QuotaExceededError as error:
print(f"Rate limit hit: {error}")
except NextPDFAPIError as error:
print(f"API error {error.status_code}: {error}")
except NextPDFError as error:
print(f"SDK error: {error}")

对于超过 100 MB 的 PDF,请改用 CLI。这样结果就能以流式方式输出,无需一次性将每个提取出的块全部加载到内存中。