Cookbook: extract structured fields from a lab report PDF
Turn a PDF lab report into a typed JSON record in 60 lines.
End-to-end recipe: a clinic uploads a scanned-or-typed lab report PDF, you read the text out of it, and OHM Studio gives you structured JSON ready to push into your EMR — no template-matching, no regex, just one API call.
Studio side
- + New API → Blank API (lab reports vary too much to start from a clinical visit template).
- Add three fields under a single section:
tests—investigation-list— Required.interpretation—textarea.referenceRangesNoted—boolean.
- Prompt: leave the OHM Clinical Foundation Block on. Add this user prompt:
You are extracting structured data from a clinical lab report. For each test, capture the canonical analyte name (use LOINC-friendly wording), the value with units, and any flag (H, L, critical). Put doctor notes in
interpretation. SetreferenceRangesNoted: trueonly if the document explicitly lists reference ranges. - Publish as
lab-extract.
Server (Node + pdf-parse)
import { OHM } from "@ohm_studio/sdk";
import pdf from "pdf-parse";
const ohm = new OHM({ apiKey: process.env.OHM_API_KEY! });
export async function extractLabReport(pdfBytes: Buffer) {
// Step 1 — pull text out of the PDF (works for typed reports; OCR step
// needed for scanned ones — pair with tesseract.js or a cloud OCR).
const { text } = await pdf(pdfBytes);
// Step 2 — let OHM extract the structured JSON in one call.
const { data } = await ohm.extract({
apiSlug: "lab-extract",
text,
});
return data;
}import { extractLabReport } from "./lab";
app.post("/upload-lab", async (req, res) => {
const buf = Buffer.from(await req.arrayBuffer());
const data = await extractLabReport(buf);
res.json(data);
});What you get back
{
"tests": [
{ "name": "Hemoglobin", "code": "718-7", "notes": "12.4 g/dL (L)" },
{ "name": "Total Leukocyte Count", "code": "6690-2", "notes": "11,200 /µL (H)" },
{ "name": "Platelet Count", "code": "777-3", "notes": "245,000 /µL" }
],
"interpretation": "Mild anemia. Mild leukocytosis — likely reactive. Recommend repeat in 2 weeks.",
"referenceRangesNoted": true
}Production tips
- OCR scanned PDFs before passing to OHM. Tesseract works fine for typed reports; for handwritten chits use a vision-model OCR.
- Cache by PDF SHA-256 — labs often re-upload the same file. Skip the extract call when you've already processed it.
- Validate with Zod on top of the SDK response if you want runtime type safety beyond TypeScript inference.
The structured JSON returned by lab-extract slots cleanly into FHIR
Observation resources — tests[].code is your LOINC code, notes carries
the value + flag, and interpretation becomes the Observation.note.