Cookbook: extract structured fields from a lab report PDF

End-to-end recipe: a clinic uploads a scanned-or-typed lab report PDF, you read the text out of it, and OHM Studio gives you structured JSON ready to push into your EMR — no template-matching, no regex, just one API call.

Studio side

+ New API → Blank API (lab reports vary too much to start from a clinical visit template).
Add three fields under a single section:
- tests — investigation-list — Required.
- interpretation — textarea.
- referenceRangesNoted — boolean.
Prompt: leave the OHM Clinical Foundation Block on. Add this user prompt:

You are extracting structured data from a clinical lab report. For each test, capture the canonical analyte name (use LOINC-friendly wording), the value with units, and any flag (H, L, critical). Put doctor notes in interpretation. Set referenceRangesNoted: true only if the document explicitly lists reference ranges.
Publish as lab-extract.

Server (Node + pdf-parse)

server/lab.ts

import { OHM } from "@ohm_studio/sdk";
import pdf from "pdf-parse";

const ohm = new OHM({ apiKey: process.env.OHM_API_KEY! });

export async function extractLabReport(pdfBytes: Buffer) {
  // Step 1 — pull text out of the PDF (works for typed reports; OCR step
  // needed for scanned ones — pair with tesseract.js or a cloud OCR).
  const { text } = await pdf(pdfBytes);

  // Step 2 — let OHM extract the structured JSON in one call.
  const { data } = await ohm.extract({
    apiSlug: "lab-extract",
    text,
  });

  return data;
}

server/route.ts

import { extractLabReport } from "./lab";

app.post("/upload-lab", async (req, res) => {
  const buf = Buffer.from(await req.arrayBuffer());
  const data = await extractLabReport(buf);
  res.json(data);
});

What you get back

{
  "tests": [
    { "name": "Hemoglobin", "code": "718-7", "notes": "12.4 g/dL (L)" },
    { "name": "Total Leukocyte Count", "code": "6690-2", "notes": "11,200 /µL (H)" },
    { "name": "Platelet Count", "code": "777-3", "notes": "245,000 /µL" }
  ],
  "interpretation": "Mild anemia. Mild leukocytosis — likely reactive. Recommend repeat in 2 weeks.",
  "referenceRangesNoted": true
}

Production tips

OCR scanned PDFs before passing to OHM. Tesseract works fine for typed reports; for handwritten chits use a vision-model OCR.
Cache by PDF SHA-256 — labs often re-upload the same file. Skip the extract call when you've already processed it.
Validate with Zod on top of the SDK response if you want runtime type safety beyond TypeScript inference.

The structured JSON returned by lab-extract slots cleanly into FHIR Observation resources — tests[].code is your LOINC code, notes carries the value + flag, and interpretation becomes the Observation.note.