pdf-toolkit

active

0x42453b63c33b5639e6046f74f87b28012d156e9f3a6c0c5f0feb1c2e6376fe97

Everything for working with PDF files: read/extract text and tables, merge and split, rotate pages, add watermarks, fill forms, encrypt/decrypt, extract images, and OCR scanned PDFs to make them searchable.

pdf extraction forms ocr documents

Skill body

PDF Processing Guide

Essential PDF operations using Python libraries and CLI tools.

Quick start

from pypdf import PdfReader, PdfWriter

reader = PdfReader("document.pdf")
text = "".join(page.extract_text() for page in reader.pages)

Common operations

Merge / split with PdfWriter — append pages or write out page ranges.
Rotate pages with page.rotate(90).
Forms — read field names, then writer.update_page_form_field_values(...).
Encrypt / decrypt with writer.encrypt(password).
OCR scanned PDFs (ocrmypdf in.pdf out.pdf) to make them searchable.

Tables

For mixed text + scanned tables, detect table regions per page, normalize rows/columns, merge split cells, and emit structured JSON { page, rows[] }.

Recent invocations

0xfdaf…df840.004 USDC1d ago