OHMOHM Studio

Cookbook: extract structured fields from a lab report PDF

Turn a PDF lab report into a typed JSON record in 60 lines.

View as Markdown

End-to-end recipe: a clinic uploads a scanned-or-typed lab report PDF, you read the text out of it, and OHM Studio gives you structured JSON ready to push into your EMR — no template-matching, no regex, just one API call.

Studio side

  1. + New API → Blank API (lab reports vary too much to start from a clinical visit template).
  2. Add three fields under a single section:
    • testsinvestigation-list — Required.
    • interpretationtextarea.
    • referenceRangesNotedboolean.
  3. Prompt: leave the OHM Clinical Foundation Block on. Add this user prompt:

    You are extracting structured data from a clinical lab report. For each test, capture the canonical analyte name (use LOINC-friendly wording), the value with units, and any flag (H, L, critical). Put doctor notes in interpretation. Set referenceRangesNoted: true only if the document explicitly lists reference ranges.

  4. Publish as lab-extract.

Server (Node + pdf-parse)

server/lab.ts
import { OHM } from "@ohm_studio/sdk";
import pdf from "pdf-parse";

const ohm = new OHM({ apiKey: process.env.OHM_API_KEY! });

export async function extractLabReport(pdfBytes: Buffer) {
  // Step 1 — pull text out of the PDF (works for typed reports; OCR step
  // needed for scanned ones — pair with tesseract.js or a cloud OCR).
  const { text } = await pdf(pdfBytes);

  // Step 2 — let OHM extract the structured JSON in one call.
  const { data } = await ohm.extract({
    apiSlug: "lab-extract",
    text,
  });

  return data;
}
server/route.ts
import { extractLabReport } from "./lab";

app.post("/upload-lab", async (req, res) => {
  const buf = Buffer.from(await req.arrayBuffer());
  const data = await extractLabReport(buf);
  res.json(data);
});

What you get back

{
  "tests": [
    { "name": "Hemoglobin", "code": "718-7", "notes": "12.4 g/dL (L)" },
    { "name": "Total Leukocyte Count", "code": "6690-2", "notes": "11,200 /µL (H)" },
    { "name": "Platelet Count", "code": "777-3", "notes": "245,000 /µL" }
  ],
  "interpretation": "Mild anemia. Mild leukocytosis — likely reactive. Recommend repeat in 2 weeks.",
  "referenceRangesNoted": true
}

Production tips

  • OCR scanned PDFs before passing to OHM. Tesseract works fine for typed reports; for handwritten chits use a vision-model OCR.
  • Cache by PDF SHA-256 — labs often re-upload the same file. Skip the extract call when you've already processed it.
  • Validate with Zod on top of the SDK response if you want runtime type safety beyond TypeScript inference.

The structured JSON returned by lab-extract slots cleanly into FHIR Observation resources — tests[].code is your LOINC code, notes carries the value + flag, and interpretation becomes the Observation.note.