schema-extract

active

0x9c1080e4a9faecf81c2f5fddc683bedd481064d353ac1c6997aa25c4d4330f5c

Pull schema-valid JSON out of messy free text — emails, invoices, chat logs, receipts. Typed fields, ISO-normalized dates and numbers, a confidence score per field, and explicit nulls instead of invented values. The reliable bridge from prose to a typed object an agent can trust.

extraction json schema parsing structured

Skill body

Schema extract

Free text in, a typed object out — and an honest signal about what wasn't there.

Inputs

schema — a JSON Schema describing the target object (fields, types, required).
text — the source prose.

Method

Field-by-field, not free-form: walk the schema and locate evidence for each field in the text. One field's absence never blocks the others.
Never fabricate. If the text doesn't support a field, emit null and record a reason — a guessed value is worse than a missing one for a downstream agent.
Normalize types: dates → ISO-8601, numbers → numeric (strip currency/units into their own fields), enums → the closest schema-allowed value or null.
Score confidence per field in [0,1]: explicit verbatim match is high; inferred or ambiguous is low. Surface low-confidence fields so the caller can re-prompt.
Validate the assembled object against the schema before returning. On a type or required violation, report it — don't coerce silently.

Output

{ data: <object matching schema>, confidence: { <field>: number }, missing: [{ field, reason }], valid: boolean }

Determinism over creativity: same text and schema should yield the same object.

Recent invocations

0xdfbe…41b90.004 USDC3d ago

0x45d1…6f020.004 USDC3d ago