Changelog
Release notes for the OHM Studio platform, SDKs, and docs.
All notable changes to OHM Studio's APIs, SDKs, and docs are listed here. We follow Semantic Versioning — breaking changes bump the major version, additive features bump the minor, fixes bump the patch.
The SDK packages live on npm:
| Package | Short name | Latest | Install |
|---|---|---|---|
@ohm_studio/sdk | OHM SDK | 0.12.0 | npm install @ohm_studio/sdk |
@ohm_studio/sdk-react-native | OHM RN SDK | 0.12.0 | npm install @ohm_studio/sdk-react-native |
@ohm_studio/sdk-core | core | 0.10.0 | (transitive — pulled in by the wrappers) |
@ohm_studio/cli | Studio CLI | 0.1.0 | npm install -D @ohm_studio/cli |
v0.12.0 — Hospital-deployed architecture
Release date: 2026-05-18
Affects: @ohm_studio/sdk@0.12.0, @ohm_studio/sdk-react-native@0.12.0, @ohm_studio/sdk-core@0.10.0.
Not a breaking change for working SDK code, but the operational model has changed and you should update your config:
What changed at the platform level
OHM now ships as two separately-deployed images:
- Hospital orchestrator — runs at each hospital (
api.<hospital>.example). Holds PHI, runs auth, audit, queue. - OHM Engine — runs only at OHM (
api.ohm-engine.in). Holds the STT/LLM vendors, prompts, extraction logic, full reference data.
When you call the SDK, requests go to the hospital orchestrator (just like before). The hospital then forwards AI work to the engine internally. Customers never see the engine.
What changes for SDK users
The baseUrl you pass should be the hospital's API URL, not a single global SaaS endpoint. Each hospital has its own:
const ohm = new OHM({
apiKey: process.env.OHM_API_KEY!,
baseUrl: process.env.OHM_API_URL!, // e.g. https://api.kauvery.example
});https://api.ohm.doctor continues to work as the default — it's OHM's own demo hospital deployment. But for any production integration, set baseUrl to the actual hospital you're connecting to. Ask the hospital admin if unsure.
Response shapes
Every endpoint that returned tokensUsed / inputTokens / outputTokens now also surfaces these in a consistent usage block alongside the result. Existing fields stay for backward compatibility; new code should read the usage block.
Vendor-neutral errors
All upstream failures are now mapped to vendor-neutral codes — ENGINE_VENDOR_UNAVAILABLE, ENGINE_RATE_LIMITED, ENGINE_TIMEOUT. Upstream provider names no longer appear in any error path. Your error-handling code should not reference those strings.
v0.11.1 — Full validation pass
Release date: 2026-05-18
Affects: @ohm_studio/sdk@0.11.1, @ohm_studio/sdk-react-native@0.11.1, @ohm_studio/sdk-core@0.9.1. Pair with the matching server release.
Plugs the gaps that v0.11.0 left open. Three classes of fixes:
Hard LLM timeout on every Studio surface
v0.11.0 added a 240-sec timeout to extract. v0.11.1 brings the same protection to every other LLM-using Studio service so no customer-facing endpoint can hang:
| Service | Endpoint(s) | Ceiling |
|---|---|---|
StudioExtractService | /extract, /audio/extract, streaming, async jobs, playground | 240s |
StudioInsightsService | /insights, playground insights tab | 240s |
StudioSummarizeService | /summarize | 120s |
StudioAiAssistService | /ai-assist (Studio UI prompt drafting) | 120s |
A hung upstream connection now surfaces a clean OHMServerError after the ceiling expires; the customer retries instead of waiting.
chunked / chunkCount propagated through ALL audio paths
In v0.11.0 these flags only appeared on the sync transcribe / extract responses. v0.11.1 adds them to:
StreamChunk(type: "transcript") — streaming consumers see the flag the instant transcription completes.JobDetail— async-job pollers see it in the final terminal-state response.- Sync audio extract response — was missing the field even though
transcribehad it; now consistent.
// Streaming
for await (const chunk of ohm.audio.extractStream({ apiSlug, file })) {
if (chunk.type === "transcript" && chunk.chunked) {
toast.warn(`Processed in ${chunk.chunkCount} chunks.`);
}
}
// Async jobs
const result = await ohm.audio.jobs.poll(jobId);
if (result.chunked) {
toast.warn(`Processed in ${result.chunkCount} chunks.`);
}Schema migration
StudioExtractionJob gains two nullable columns:
chunked Boolean? @default(false)
chunkCount Int?Purely additive — prisma db push --skip-generate --accept-data-loss=false (which runs automatically on api container boot) applies it without locking the table.
Migration
npm install @ohm_studio/sdk@^0.11.1
# or:
npm install @ohm_studio/sdk-react-native@^0.11.1No code changes required.
v0.11.0 — Long-audio chunking, file-size pre-flight, LLM hard-timeout
Release date: 2026-05-18
Affects: @ohm_studio/sdk@0.11.0, @ohm_studio/sdk-react-native@0.11.0, @ohm_studio/sdk-core@0.9.0. Pair with the matching server release.
Hardening sweep so hour-plus consultations and overweight files don't silently fail. Every change is backward-compatible — drop the new SDK in and the new behavior is automatic.
Server: long-audio chunking on the Studio extract path
The STT provider has a documented per-file ceiling around 1 hour. Until this release, anything longer was silently truncated at minute 60 — the second half of a 90-min consultation just disappeared from the transcript with no error, no warning.
Now, the server ffprobes the upload, and if it's longer than 55 min,
splits into ≤55-min chunks using ffmpeg -c copy (stream-copy, no
re-encode, ~instant), submits each chunk separately, and merges the
transcripts. Same path covers /api/studio/v1/audio/transcribe and
every /api/studio/v1/audio/extract/:apiSlug* endpoint.
The response surfaces a chunked: true + chunkCount: N pair so SDK
consumers can show a tiny warning ("This recording was processed in N
chunks — a sentence spanning a chunk boundary may be missing 2–3
words"). Otherwise everything looks the same to the caller.
Server: 240-sec hard timeout on the LLM extraction
A hung upstream connection used to freeze the entire extract response
indefinitely. Now wrapped in Promise.race against a 240-sec timer —
plenty of room for p99 latency on ~50k-char transcripts; anything
longer is genuinely broken and the caller gets a clean error to retry.
SDK: chunked + chunkCount on transcribe / extract results
const { transcript, chunked, chunkCount } = await ohm.audio.transcribe({ file });
if (chunked) {
toast.warn(
`Long recording — processed in ${chunkCount} chunks. ` +
`Sentences across chunk boundaries may be missing 2–3 words.`
);
}AudioExtractResult extends AudioTranscribeResult, so these fields
also appear on ohm.audio.extract({...}) responses.
SDK: pre-upload file-size guard (500 MB hard cap)
A multi-GB mis-attached file (someone dropped a recorded lecture or a
video) used to crawl across the wire before failing server-side. The
SDK now reads the file's size / byteLength BEFORE constructing the
multipart body, and throws OHMValidationError synchronously if it
exceeds 500 MB (~2 hours of 16 kHz mono WAV — far above any realistic
clinical encounter).
try {
await ohm.audio.extract({ apiSlug: "opd", file: hugeFile });
} catch (err) {
if (err instanceof OHMValidationError) {
// "Audio file too large (812.3 MB). Maximum is 500 MB."
}
}Applies to: audio.transcribe, audio.extract, audio.extractStream,
audio.jobs.create. The cap is a generous safety net — if your
clinical use case actually needs files larger than 500 MB, open an
issue and we'll discuss.
Migration
npm install @ohm_studio/sdk@^0.11.0
# or:
npm install @ohm_studio/sdk-react-native@^0.11.0No code changes required. Optional: surface a chunk-boundary warning
when result.chunked === true.
v0.10.0 — Total-deadline + auto-idempotency + bulk + warmUp + hooks
Release date: 2026-05-11
Affects: @ohm_studio/sdk@0.10.0, @ohm_studio/sdk-react-native@0.10.0, @ohm_studio/sdk-core@0.8.0. No server changes.
A 15-point reliability + performance sweep. Every customer-visible addition is BC — drop the new SDK in without code changes and you inherit the wins automatically.
Reliability — P0
totalTimeoutMs — bounded worst-case latency
Without it, a chatty upstream + 3 retries could keep a request open for
3 × timeoutMs + Σbackoff (~3 minutes with defaults). With it, the SDK
throws OHMTimeoutError as soon as the budget is exhausted — even
mid-retry, even mid-sleep.
const ohm = new OHM({
apiKey,
timeoutMs: 30_000, // per-attempt
totalTimeoutMs: 60_000, // total wall-clock — NEW
maxRetries: 2,
});Auto Idempotency-Key on every unsafe method
POST / PATCH / PUT / DELETE now get an auto-generated UUID v4 in
Idempotency-Key when the caller doesn't supply one. Eliminates
duplicate-write bugs from mobile retries — the server short-circuits
same-key calls within 24 h to the cached response.
await ohm.extract({ apiSlug, text }); // auto-keyed
await ohm.extract({ apiSlug, text, idempotencyKey: "visit_42" }); // explicit
await ohm.extract({ apiSlug, text, idempotencyKey: null }); // opt-out
new OHM({ apiKey, disableAutoIdempotency: true }); // disable globallywithOverrides({ ... }) — per-call timeout / retry tuning
const slow = ohm.withOverrides({ timeoutMs: 5 * 60_000, maxRetries: 1 });
await slow.audio.extract({ apiSlug, file: hourLongAudio });OHMError.responseHeaders + responseBody
Every server-originated error now carries the raw HTTP headers and body of the failed response. Debug "the server returned 502 but what cache header was on it?" tickets without a second round-trip.
catch (e) {
if (e instanceof OHMError) {
console.log(e.responseHeaders); // { "cf-cache-status": "MISS", ... }
console.log(e.responseBody); // server's error envelope
}
}Speed — P1
ohm.warmUp() — drops cold-start latency by ~300 ms
const ohm = new OHM({ apiKey });
void ohm.warmUp(); // fire-and-forget at app boot
// ... first real call now ~150 ms instead of ~500 msAutomatic keepalive: true on small JSON POSTs
extract, summarize, insights (anything ≤ 60 KB body) now passes
keepalive: true to fetch. Saves ~30 ms on every call after the first
by reusing the TCP socket. Multipart audio uploads skip this — browser
caps keepalive bodies at 64 KB. No opt-in, no code change.
enableHttp2() — opt-in HTTP/2 multiplexing on Node
import { enableHttp2 } from "@ohm_studio/sdk/http2";
enableHttp2(); // call once at process startSaves 50–100 ms on parallel calls. Node-only (browsers + RN already use the platform's H2 stack). Silently no-ops elsewhere.
streamBufferMs reserved option
Forward-compatible knob for delta-streaming (when we ship transcript.delta
chunks). No behavior change today; ship streamBufferMs: 50 then to
coalesce.
Developer experience — P2
Lifecycle hooks { onRequest, onResponse, onError }
Cleaner than the old onUsage for non-trivial observability — you tap
into individual phases instead of getting one combined event.
const ohm = new OHM({
apiKey,
hooks: {
onRequest: ({ method, url, attempt }) => log.info("→", method, url),
onResponse: ({ status, latencyMs, requestId }) =>
log.info("←", status, latencyMs + "ms", requestId),
onError: ({ error, attempt, willRetry }) =>
log.warn(error.name, { attempt, willRetry }),
},
});All hooks fire-and-forget — exceptions are caught and never affect the
request. onUsage continues to work for backwards compat.
User-Agent with runtime info (Node only)
User-Agent: ohm-sdk/0.8.0 (node/22.16; darwin x64) — helps your
server logs identify which client / Node / OS is misbehaving without
asking the customer. Browsers + RN skip the header (forbidden).
ohm.extractBulk([...]) — batched concurrent extract
const results = await ohm.extractBulk(
transcripts.map((t) => ({ apiSlug: "opd-clinic", text: t })),
{ concurrency: 8, onProgress: (done, total) => console.log(`${done}/${total}`) },
);
const ok = results.filter((r) => r.ok);
const err = results.filter((r) => !r.ok); // partial failures don't fail the batchjobs.poll exponential backoff
Polling interval now grows 1.5× per attempt, capped at maxIntervalMs
(default 30 s). Protects the worker from a chatty client when a job
stays PROCESSING for 10+ minutes. Same first-poll latency; smarter
after.
Documentation
- NEW
/sdk/reliability— retry policy, deadline math, idempotency semantics, full error class table, what we DON'T retry. - NEW
/sdk/performance— per-endpoint p50/p95, sync vs streaming vs async decision matrix,warmUp()pattern,enableHttp2()hint, bulk extraction, performance anti-patterns. /versionscompatibility matrix expanded with the v0.10 row.
Migration from 0.9 → 0.10
Zero breaking changes. Every new field is optional. To pick up the wins:
// Before
const ohm = new OHM({ apiKey, timeoutMs: 60_000 });
// After — three new lines, zero behavior loss
const ohm = new OHM({
apiKey,
timeoutMs: 60_000,
totalTimeoutMs: 120_000, // NEW — bounded worst-case
});
void ohm.warmUp(); // NEW — drops first-call latency
// enableHttp2() if you're on Node and fanning out parallel callsv0.9.0 — Granular error classes + zero-config defaults + async-job probe
Release date: 2026-05-11
Affects: @ohm_studio/sdk@0.9.0, @ohm_studio/sdk-react-native@0.9.0, @ohm_studio/sdk-core@0.7.0. Server: new probe + comprehensive config-reference doc.
Reliability + DX hardening. Backwards-compatible.
Four new error classes
Customers can now pattern-match HTTP failure modes precisely:
import {
OHMTimeoutError, // 408 / 504 / client-side timeout
OHMNetworkError, // DNS / TCP / TLS / dropped connection
OHMNotFoundError, // 404 — slug not published, job purged, …
OHMQuotaExceededError, // 402 / 429-with-quota — distinct from rate limit
} from "@ohm_studio/sdk";OHMNotFoundError carries availableSlugs[] when the server can offer
alternatives (powers a customer-side picker without a second
round-trip).
OHMQuotaExceededError carries resetAt (ISO-8601) + quotaKind
("tokens" | "audio_seconds" | "calls" | "storage") so customers can
show "you'll be able to extract again at HH:MM" messaging or trigger
upgrade-plan modals.
OHMTimeoutError distinguishes deadline-exceeded from OHMAbortError
(user cancellation). OHMNetworkError is the canonical "you're
offline → queue it locally" signal — pair with OhmQueue on RN.
Stable error-code constants exported as OHM_ERROR_CODES:
const codes = OHM_ERROR_CODES;
// { ABORTED: "aborted", AUTH_ERROR: "auth_error", ... }The class hierarchy may evolve; the codes don't. Use them for log analytics, alerting rules, customer-side error analytics.
New "Configuration" docs page
Single source of truth at docs.ohm.doctor/configuration listing every server env var, every per-API toggle in Studio, every SDK option, and every default. The TL;DR: hospitals don't have to configure anything to start. The page is the menu when they want to customise.
Async-job end-to-end probe
apps/api/scripts/probes/async-jobs.ts — verifies the v0.8.0
async-extraction pipeline end-to-end:
- Enqueue + idempotency replay (same key returns same jobId)
- Worker claim + processing + terminal state
- Cancel from QUEUED → CANCELLED
- Webhook delivery to a local mock receiver with HMAC verification
- delivery-id presence + payload-shape check
Run via pnpm exec tsx scripts/probes/async-jobs.ts. Use as a
pre-deploy gate.
Upgrade
npm install @ohm_studio/sdk@0.9.0
npm install @ohm_studio/sdk-react-native@0.9.0Existing 0.8.0 callers continue to work unchanged. Errors that
previously surfaced as OHMServerError may now surface as the more
specific subclass — they're all still OHMError so a generic catch
still works.
v0.8.0 — Async-extraction jobs (long recordings, webhook callbacks)
Release date: 2026-05-10
Affects: @ohm_studio/sdk@0.8.0, @ohm_studio/sdk-react-native@0.8.0, @ohm_studio/sdk-core@0.6.0. Server: new studio_extraction_jobs table + worker.
The synchronous extract surface (HTTP POST, hold connection open until done) breaks down for audio over ~30 minutes — proxies kill the connection, mobile apps can't keep it alive when backgrounded. v0.8.0 adds an async-job pattern hospitals expect for long-recording workflows: submit, poll OR webhook, done.
New SDK surface — ohm.audio.jobs.{create,get,cancel,poll}
// Submit (~100ms — just an upload)
const { jobId } = await ohm.audio.jobs.create({
apiSlug: "long-consult",
file: bigBlob,
webhookUrl: "https://your-backend/ohm-callback", // optional
patientHash: sha256(`abha:${patient.abhaId}`),
recordedById: currentUser.id,
});
// Poll (client-side)
const result = await ohm.audio.jobs.poll(jobId, {
intervalMs: 3000,
maxWaitMs: 30 * 60_000,
onProgress: (j) => setProgress(j.workerProgress),
});
// OR convenience: submit + poll in one call
const result = await ohm.audio.extractAsync({ apiSlug, file, ... });What's signed and retried
Webhooks fire on every terminal state (extraction.job.completed or
extraction.job.failed). HMAC-SHA256 signed with a per-job secret.
Stripe-style retry schedule on 4xx/5xx: 5min → 30min → 2h → 5h →
10h → 24h → 24h, ~3 days total. Dead-letter after 7 attempts.
Headers on every webhook delivery:
X-OHM-Event— event nameX-OHM-Delivery-Id— UUID v4 unique per attempt (idempotency on your side)X-OHM-Signature: sha256=<hex>— HMAC of the JSON body
When to use which
| Audio length | Mode |
|---|---|
| 0–10 min | sync (ohm.audio.extract) |
| 10–30 min | streaming (ohm.audio.extractStream) |
| 30 min – 1 hr | async polling |
| > 1 hr or mobile background | async webhook |
| Bulk replay (10 000 historical) | async webhook + idempotency keys |
Server: in-process worker
DB-backed queue with FOR UPDATE SKIP LOCKED claim — multi-instance
safe. Run any number of API pods on the same DB; each picks up a
different job. Worker disabled per-pod via STUDIO_JOBS_WORKER_DISABLED
env if you want read-only replicas.
Migration applied non-destructively
CREATE TYPE "StudioJobStatus" AS ENUM ('QUEUED','PROCESSING','COMPLETED','FAILED','CANCELLED');
CREATE TABLE "studio_extraction_jobs" (...);
-- 4 indexes including org/patientHash/createdAt for audit search.Existing StudioInvocation rows untouched. Async-job completions
mirror to StudioInvocation so per-API Logs panels include async
traffic in their token / audio-second roll-ups.
Upgrade
npm install @ohm_studio/sdk@0.8.0
npm install @ohm_studio/sdk-react-native@0.8.0Existing 0.7.0 callers continue to work unchanged. New methods are opt-in; sync extract is unchanged.
See Async extraction for the full reference, including a webhook receiver template and sync/streaming/async comparison matrix.
v0.7.0 — Hospital-readiness pack: audit, idempotency, PHI, alerts, offline queue
Release date: 2026-05-10
Affects: @ohm_studio/sdk@0.7.0, @ohm_studio/sdk-react-native@0.7.0, @ohm_studio/sdk-core@0.5.0. SDK-core minor bump because of new exports.
Backwards-compatible — every new field / method is opt-in. Existing 0.6.0 callers continue to compile and behave identically.
New methods
ohm.apis.get(slug)— full schema detail for one API (description, publishedSchema, publishedSystemPrompt, publishedInputs). Use for dynamic playground UIs / runtime validation.ohm.invocations.searchByPatient({ patientHash, sinceDays?, limit? })— patient-level audit search. Returns metadata-only invocation rows (timing, tokens, recordedById). Transcripts and extracted JSON are never returned via this surface.
New audit fields on every method
patientHash, recordedById, idempotencyKey accepted on:
ohm.extract,ohm.audio.transcribe,ohm.audio.extract,ohm.audio.extractStream,ohm.insights
Idempotency-Key is sent as an HTTP header automatically (Stripe /
Twilio convention). Same key in same org returns cached response for
24 h — protects mobile retries from duplicate chart entries.
New helpers
restoreTokens(data, phiTokenMap)— restore PHI when an API has Studio's "Redact PHI before extraction" toggle on. The server returnsphiTokenMap; this helper deep-clonesdataswapping tokens like[PATIENT_1]back to original strings.OhmQueue(RN only) — offline queue. Persists failed extractions to AsyncStorage; replays onflush(). Bring-your-own storage adapter viamakeAsyncStorageAdapter(AsyncStorage).
Server-side improvements (no SDK change required)
- PHI redaction — opt-in per-API toggle in Studio Settings. Server scrubs patient names (after honorifics), ABHA / Aadhaar / phone / MRN / UHID / IPD identifiers from the transcript before the LLM call.
- Critical-value alerts —
_alertsarray emitted on every extraction with vitals:SpO₂ < 90,BP ≥ 180/120,HR < 40or> 150,Temp ≥ 104°F,Pain ≥ 8/10, etc. Centralised threshold logic. - List-type recovery — when the LLM drops items from chained drug / diagnosis / lab dictation, regex recovery picks up the missed entries against built-in dictionaries.
- Auto-retry on transient LLM failures — 2 retries with exponential backoff. Permanent shape errors bypass retry.
- FHIR R4 mappers for doctor-note (Composition with OPConsultRecord profile) and nurse-shift (Composition + Procedure bundle) — ABDM-ready, kept inert until ABHA gateway integration.
Migrations applied non-destructively
-- v0.6 (audit fields)
ALTER TABLE studio_invocations
ADD COLUMN idempotencyKey VARCHAR(128),
ADD COLUMN patientHash VARCHAR(128),
ADD COLUMN recordedById TEXT,
ALTER COLUMN apiId DROP NOT NULL;
-- v0.7 (PHI redaction toggle)
ALTER TABLE studio_apis
ADD COLUMN "redactPHI" BOOLEAN NOT NULL DEFAULT false;Both additive — existing rows unchanged.
Upgrade
npm install @ohm_studio/sdk@0.7.0
npm install @ohm_studio/sdk-react-native@0.7.0Server 2026-05-10 — Extraction reliability: messy clinical dictation
Release date: 2026-05-10
Affects: API server only (apps/api). No SDK update required.
Real-world clinical dictation rarely follows the textbook "label number, label number, label number" cadence. Speakers slip in connector words ("temperature is 99", "saturation was 88"), chain mentions across non-schema vitals ("saturation 88, temperature 99, weight 105"), or state partial blood pressure ("BP 120" with no diastolic). The extraction layer would intermittently drop fields in these patterns.
What changed: added a deterministic regex-based recovery pass
(recoverFlatVitals) that runs on every Studio extraction with a
flat numeric-vitals schema. For each vital the LLM dropped, we
re-scan the transcript for the label + number with any connector word
(is, was, of, at, =, comma, dash) and inject the value.
The LLM's judgment still wins for fields it actually emitted —
recovery only fills gaps.
Pain-score gets a smarter reconciler:
- Explicit number stated → use it.
- "No pain" / "pain free" → 0.
- LLM emitted 0 but the speaker never said "pain" / "NRS" / "score" → drop the field. Catches the textbook default-normal hallucination.
What customers see: higher field-completion rate on natural- language dictations, especially the "BP 120" partial-systolic case and "temperature is 99" connector-word case. No code change required in your app — same response shape, just more populated.
Also tightened the inpatient-vitals schema's system prompt + per-
field helpText to match — visible only to customers who repaste the
updated schema from examples/hospital-integration/. The server-side
recovery runs regardless of which prompt version you've published.
Studio 2026-05-10 — Builder JSON tab auto-classifies full schemas
Release date: 2026-05-10
Affects: studio.ohm.doctor only. No SDK or API change.
Customers pasting a full Studio schema JSON (the *.studio.json files
shipped in the public examples) into the Builder → JSON tab used to
hit "Top level must be an array of sections" — the textarea only
accepted a bare sections array. Now the tab accepts both:
- A bare sections array (canonical Builder shape — unchanged).
- A full schema object —
sectionspopulates the Builder, plussystemPromptlands in the Prompt tab,inputslands in the Inputs tab, and anyinsightsSchema/insightsPrompt/insightsEnabledlands in the Insights tab. One paste fills every tab.
Any field absent from the pasted JSON is left alone — partial pastes don't wipe existing state. Toast confirms what was applied: "Imported full schema · 1 section + prompt + 2 inputs".
v0.6.0 — Cancellation, upload progress, slug discovery, CLI codegen
Release date: 2026-05-10
Affects: @ohm_studio/sdk@0.6.0, @ohm_studio/sdk-react-native@0.6.0, @ohm_studio/sdk-core@0.4.0. New: @ohm_studio/cli@0.1.0.
Five additive items, zero breaking changes. Existing call sites continue to compile and behave identically; new options are opt-in.
signal: AbortSignal on every method
Every SDK method now accepts a signal?: AbortSignal:
const controller = new AbortController();
const promise = ohm.extract({
apiSlug: "opd-clinic",
text: transcript,
signal: controller.signal,
});
// User clicked "Cancel"
controller.abort();Aborts surface as a typed OHMAbortError (code: "aborted", status: 0) so you can distinguish cancellation from genuine errors. The
caller-supplied signal is bridged into the SDK's internal timeout
controller — either source trips the same fetch abort, and an
already-aborted signal short-circuits before any work starts.
React hooks auto-abort on unmount
useOhmExtract, useOhmAudioExtract, useOhmSummarize, and
useRecorder all attach an internal AbortController and call
abort() on unmount and on the next mutateAsync(...) call. A user
navigating away mid-extract never debits a half-finished call. Each
hook also exposes a cancel() method for explicit Cancel-button
flows.
onProgress for audio uploads
audio.transcribe and audio.extract accept an onProgress callback
that receives { loaded, total, percent } while the file is uploading:
await ohm.audio.extract({
apiSlug: "opd-clinic",
file: audioBlob,
onProgress: (e) => setUploadPct(e.percent),
});When onProgress is set, the SDK routes through XMLHttpRequest
(which exposes upload progress on both browser and React Native);
when it isn't, the existing fetch path stays unchanged with zero
overhead. No-op for callers who don't pass the callback.
ohm.apis.list() for slug discovery
Enumerate the published Studio APIs the credential can see, without hard-coding slugs:
const apis = await ohm.apis.list(); // PUBLISHED only
const drafts = await ohm.apis.list({ status: "DRAFT" });Returns ApiSummary[] — { slug, name, status, version, updatedAt }.
Powered by the new GET /api/studio/v1/apis endpoint, which accepts
either an API key (returns project-scoped APIs) or a Studio user
JWT (returns organisation-wide APIs). Useful for typeahead pickers,
codegen pipelines, and admin dashboards.
@ohm_studio/cli — codegen for typed data
New companion package. Generates TypeScript interfaces from your
published Studio API schemas, so ohm.extract<MyApiData>(...) is
fully typed against the schema you designed in Studio:
npm install -D @ohm_studio/cli
export OHM_API_KEY=ohms_live_xxx
ohm-studio pull opd-clinic # → ./ohm-types/opd-clinic.ts
ohm-studio pull-all --out src/ohm # every published API
ohm-studio list # what's published?Pre-build hook to keep types fresh:
{
"scripts": {
"predev": "ohm-studio pull-all --out src/ohm",
"prebuild": "ohm-studio pull-all --out src/ohm"
}
}Covers every Studio field type: text, textarea, rich-text,
date, number, boolean, choice (typed enums), multi-choice,
vitals-block, diagnosis-list, medication-list, allergy-list,
investigation-list, referral-list, procedure-list, code-list,
repeater (nested item types).
Server: vitals-block extraction reliability fix
The internal extraction schema for vitals-block was reshaped so the
clinical engine reliably emits every vital the speaker mentioned —
previously some multi-field readings dropped HR / RR / BP under load.
The new shape is post-processed back to the canonical
vitals.bp.{systolic, diastolic} form before any consumer sees
the data, so the 15+ existing call sites (doctor app, Visit feature,
FHIR mappers) continue to read vitals.bp.systolic unchanged.
Server-side only — no client code change required.
The Studio extraction stack was also upgraded to a higher-tier clinical-grade model. Verified extraction quality on the same Indian-English dictation that previously dropped fields:
| Probe | Before | After |
|---|---|---|
vitals-block (7 vitals) | 4/9 | 9/9 |
| Vitals (flat 7-field) | 7/7 | 7/7 |
| Doctor note (content + plan) | 2/2 | 2/2 |
| Nurse-shift (SOAP+timeline) | passes | passes |
Upgrade
npm install @ohm_studio/sdk@0.6.0
npm install @ohm_studio/sdk-react-native@0.6.0No code changes required. To opt in to the new features, see the sections above and the API reference.
Server 2026-05-10 — LLM stack migrated to Vercel AI SDK doc-canonical pattern
Release date: 2026-05-10
Affects: API server only (apps/api). No SDK update required.
Internal cleanup — same HTTP request/response shapes for every customer
endpoint (/extract, /audio/extract, /audio/transcribe, /summarize,
/insights). Customers using @ohm_studio/sdk@0.5.3 get this for free.
What changed: the API server's structured-extraction path was reworked to follow the canonical structured-output pattern recommended by the underlying inference SDK we use end-to-end:
const result = await generateText({
model: wrapLanguageModel({ model, middleware: extractJsonMiddleware() }),
output: Output.object({ schema, name, description }),
...
});- The new middleware handles models that wrap JSON in markdown code
fences. Eliminates the
Structured extraction failed after 3 attempts: No object generatederrors we'd see intermittently. - System content is now routed through the dedicated option instead of
interleaved into
messages[]— improves prompt-injection isolation. - Removed ~150 lines of manual workaround code: 3-attempt retry loop,
temperature sweep, mode juggling, regex JSON extraction, manual
fence stripping, fallback parsers, and a dead
generateStructuredmethod that had zero callers.
What customers see: higher first-attempt success rate, fewer 502 retries, identical response shapes. No code change required in your app.
v0.5.3 — README polish (translate-mode note)
Release date: 2026-05-10
Patch release — README only, no behavioural change.
The npm package READMEs now call out that audio.transcribe and
audio.extract always return an English transcript regardless of the
spoken language — the server runs OHM's STT layer in translate mode,
so a Tamil / Hindi / Telugu / Bengali / code-mixed consult comes back as
clean English text. This was already the runtime behaviour shipped in
v0.5.2; v0.5.3 just surfaces it on the npm package landing page.
Same code, drop-in upgrade.
v0.5.2 — End-to-end English transcript pipeline · foundation@v3
Release date: 2026-05-10
Server-side hardening — no SDK API changes, drop-in upgrade. End-to-end extraction now scores 100% across English, Hindi, Tamil, and Telugu OPD recordings on our internal benchmark.
audio.transcribe returns English regardless of source language
Studio's /api/studio/v1/audio/transcribe (and the audio path inside
audio.extract) now runs OHM's STT layer in translate mode. The
transcript comes back in English no matter what the speaker spoke —
Tamil, Hindi, Telugu, Bengali, or any code-mixed combination. This matches
the Visit/doctor app pipeline; downstream extraction prompts only ever see
clean English clinical text.
Cost: ~150 ms extra on already-English audio (the translate layer runs as a no-op).
foundation@v3 — English-first, speaker-neutral
The OHM Clinical Foundation Block was rewritten to match the new pipeline assumption:
- English-first — dropped the multi-script interpretation rules (Devanagari / Tamil / Telugu / Bengali / Gujarati examples) that are now handled at the STT layer. The prompt is leaner; the LLM has fewer conflicting instructions.
- Speaker-neutral — replaces every "the doctor" with "clinician / nurse / resident / student / patient / family member". Anyone in the room can speak; clinical facts get extracted regardless of role.
- Fahrenheit window rule — most clinicians say "temp 99" or "temperature 101.2" without saying "Fahrenheit". The prompt now codifies: 90–110 = °F (convert), 33–43 = °C (use as-is), 43–90 = ambiguous (omit).
- Past-history vs active comorbidity — explicit routing: if the patient is on a current chronic medication for a named condition, the condition is an active diagnosis. "Known diabetic on Metformin" → both Diabetes (active) and Metformin (continuing).
- Named-investigation extraction — when a test is named (CBC, ECG, CK-MB, cardiac troponin, lipid profile, 2D Echo, CT brain, sputum culture, HbA1c …), it goes into the investigation list as a separate entry, even if the result is dictated inline.
Already-published Studio APIs are unaffected — they continue to use the
prompt snapshot they were published with. Republish to pick up v3.
hardenVitals() post-processor
Some LLMs drop temp when forced to do °F→°C math under tight schema
constraints, or hallucinate a height value by copying the weight number.
A small post-processor in extract.service now:
- recovers
tempfrom the transcript if the LLM dropped it (regex around "temperature / temp / fever" + the same Fahrenheit-window conversion the prompt specifies), - deletes hallucinated
heightwhen no anchor word ("height / cm / metres / feet / inches / tall") appears in the transcript.
Tightened the Zod schema for the vitals-block field type:
tempdescription now leads with the conversion instruction.heightlower bound moved from 25 cm to 50 cm — eliminates overlap with Fahrenheit values (94–108) so out-of-band routing is impossible at the schema level too.
Extraction quality (internal benchmark)
End-to-end test: 5-minute synthesised consults in 4 languages, run through
synthesised TTS → OHM's translate-mode STT → production
StudioExtractService:
| Script | Language | Score |
|---|---|---|
| OPD fever / dengue | English | 100% |
| OPD cough / pneumonia | Hindi | 100% |
| OPD migraine | Tamil | 100% |
| OPD STEMI / chest pain | Telugu | 100% |
Vitals (5/5), diagnoses, medications, investigations, and negation handling all hit ceiling on every script.
v0.5.1 — Repository URL & metadata polish
Release date: 2026-05-10
Patch release — metadata only, no behavioural change.
repository.urlon every SDK package now points at the public examples repo at github.com/open-holistic-medicine/ohm-sdk. The npmjs.com "Repository" link on each package page now resolves cleanly.- The public examples repo holds clone-and-run versions of all four examples (Node CLI, Next.js server action, Expo mic recorder, bare React Native) wired against the published SDKs.
No code changes — safe drop-in upgrade.
v0.5.0 — Speaker mode (doctor / doctor + patient)
Release date: 2026-05-10
The Studio Playground audio tab and audio.transcribe / audio.extract
endpoints now accept exactly two speaker modes — doctor (default,
single-speaker dictation) or doctor_patient (two-speaker
conversation).
SpeakerMode end to end
-
New
SpeakerModetype andSPEAKER_MODESconstant exported from@ohm_studio/sdkand@ohm_studio/sdk-react-native. Use the constant to render a picker:import { SPEAKER_MODES } from "@ohm_studio/sdk"; <select onChange={(e) => setMode(e.target.value as SpeakerMode)}> {SPEAKER_MODES.map((m) => ( <option key={m.code} value={m.code}>{m.label}</option> ))} </select> -
useRecorder({ apiSlug, speakerMode: "doctor_patient" })and the imperativeohm.audio.extract({ ..., speakerMode })/ohm.audio.transcribe({ ..., speakerMode })thread the mode to the server. -
Studio Playground gained a card-button picker above the language dropdown — exactly two intentional choices.
Server-side language code mapping
ohm.audio.transcribe({ language }) now accepts every customer-facing
form — "auto", ISO short codes ("en", "hi", "ta", ...), and the
provider-shaped xx-IN codes. The server normalises before calling STT
and returns a clear 400 for anything outside the 23 supported
languages.
Studio Playground UX
- Audio file upload (in addition to mic record). 100 MB cap.
- Language dropdown (auto-detect default; English label + native script for each entry).
- Audio source badge (mic vs upload, with filename and size).
- The previously paste-a-test-mode-key footer is gone — every project now gets an auto-minted default Playground key on creation. Users no longer manage keys to use the Playground.
Default Playground key (auto-minted)
Organizationprojects now ship with aPlayground (default)key markedisPlaygroundDefault. Test-mode only (ohms_test_*).- Cannot be revoked from the UI — the SDK service refuses with a 400 and a clear message.
- Can be rotated via
POST /api/studio/v1/projects/:id/playground-key/rotate(or "Rotate" button in the Keys page).
v0.4.0 — WebM duration fix · IndexedDB recovery · BareRecorder · 85 tests
Release date: 2026-05-10
Web (@ohm_studio/sdk@0.4.0)
- WebM duration metadata patch — Chrome's
MediaRecorderproduces WebM files with broken duration in the EBML header (<audio>showsInfinityduration, seek breaks, some upload pipelines reject the file). TheRecordernow lazy-loadsfix-webm-durationonstop()and patches the header with the actual recorded duration before returning the Blob. Non-WebM blobs and patcher failures fall through unchanged. - Crash-safe IndexedDB persistence — opt in via
useRecorder({ persist: true })or usesaveRecording / getRecording / listRecordings / removeRecording / clearRecordingsdirectly. Stores the Blob to IndexedDB before extraction so a tab crash, browser close, or network drop mid-upload doesn't lose the consult. NewusePendingRecordings()React hook surfaces unsent recordings on mount for one-click retry. Backed byidb-keyval(~600 B). useNetworkStatus()hook — liveonline/offlinestate fromnavigator.onLine+ window events. Gate uploads on flaky hospital wifi without writing the listener boilerplate.- Studio Playground dogfooding — Studio's own Playground now uses
Recorderfrom this SDK with VU meter, 8s silence auto-stop, 10-min cap, and wake-lock. Same code path customers ship. - Next.js example refreshed —
examples/nextjs-server-actionnow uses the modernRecorderAPI.
React Native (@ohm_studio/sdk-react-native@0.4.0)
BareRecorder— first-class adapter forreact-native-audio-recorder-player(bare RN, no Expo). Same lifecycle, state machine, error codes, level metering, silence auto-stop, and max-duration cap asExpoRecorder. Configures the native module with a clinical preset (16 kHz mono AAC at 32 kbps, AndroidVOICE_RECOGNITIONsource).expo-audio(Expo SDK 54+) — documented as a first-class path: driveuseAudioRecorderdirectly and pass{ uri, name, type }toohm.audio.extract. Code snippet on the React Native page.- Expo example uses
useRecorder—examples/expo-mic-recordernow uses the hook with auto-extract.
Studio app reliability
- Tab error boundaries — every tab in the API builder
(Builder, Prompt, Inputs, Insights, Playground, API call, Logs,
Versions, Settings) is wrapped in
react-error-boundary. A render-time crash in one tab shows a "Reset tab" panel; the rest of the page, including unsaved drafts, keeps working.
Test coverage
85 tests across both SDK packages. All scenarios covered:
| Browser SDK (60 tests) | React Native (25 tests) |
|---|---|
| Codec cascade — webm/opus, mp4/aac, ogg/opus, fallbacks | ExpoRecorder permission flow + iOS audio session |
| All 6 error codes (Permission/NoMic/Busy/OverConstrained/Lost/NotSupported/Unknown) | Pause/resume + NotSupported on older SDKs |
| State machine (transitions, double-start, pause-while-idle, etc.) | Silence auto-stop + speech recovery |
| Stop returns Blob with correct mime; tracks released | Max-duration cap |
| Cancel cleanup; idle no-op | dB→linear math (peak/mid/floor); NaN guard |
Device-lost mid-record fires onDeviceLost + auto-cancel | Keep-awake activate/deactivate |
| Level metering + silence auto-stop + reset | BareRecorder start/stop returns RNFile |
| Max-duration cap | BareRecorder pause/resume + NotSupported when missing |
| Wake lock acquired on demand; survives denial | Constructor accepts class or instance |
| Duration tracking excludes paused time | RecorderError shape (code/name/cause) |
| Timeslice chunks emit periodically | |
useRecorder lifecycle + auto-extract via apiSlug | |
| IndexedDB persist round-trip + custom id + ordering + delete + clear | |
useNetworkStatus flips on online/offline events | |
usePendingRecordings loading + non-empty + empty paths |
Run with pnpm --filter @ohm_studio/sdk test and
pnpm --filter @ohm_studio/sdk-react-native test.
v0.3.0 — Recorder upgrade + useRecorder() hook
Release date: 2026-05-10
Affects: @ohm_studio/sdk and @ohm_studio/sdk-react-native.
The browser Recorder is now production-grade for clinical use. Drop-in
compatible — existing new Recorder().start() / .stop() code continues to
work.
Browser compatibility
- Codec cascade —
MediaRecorder.isTypeSupported()picks the best ofaudio/webm;codecs=opus→audio/mp4;codecs=mp4a.40.2→audio/ogg;codecs=opus→audio/mp4→audio/webm. Fixes recording on iOS Safari and older Firefox builds where WebM/Opus isn't available. - Clinical defaults — by default
getUserMediais requested with 16 kHz mono,echoCancellation,noiseSuppression,autoGainControl. PassclinicalDefaults: falseto opt out.
New features
- Pause / resume —
rec.pause()/rec.resume(); duration tracking excludes paused time. - VU level metering —
onLevel: (rms) => …emits linear RMS 0–1 via anAudioContextAnalyserNode. Wire it to a meter UI. - Silence auto-stop —
silenceAutoStop: { ms, threshold }auto-stops after sustained silence. - Wake Lock —
wakeLock: truekeeps tablets/phones awake during long consults. - Tab-hidden pause —
pauseOnHidden: truepauses when the tab loses focus. - Streaming chunks —
timesliceMs+onChunkfor chunked uploads. - Hard duration cap —
maxDurationMsauto-stops after N ms. - Permissions preflight —
Recorder.probePermission()returns"granted" | "denied" | "prompt" | "unknown"without prompting. - Microphone enumeration —
Recorder.listMicrophones()+deviceIdoption for picker UI. - Mic disconnect detection —
onDeviceLostfires when the track ends mid-record. - Typed errors —
RecorderErrorwithcode: "PermissionDenied" | "NoMicrophone" | "MicrophoneBusy" | "OverConstrained" | "DeviceLost" | "NotSupported" | "InvalidState".
useRecorder() React hook
A new one-call hook in @ohm_studio/sdk/react:
const r = useRecorder({
apiSlug: "visit-extract", // optional — auto-extracts on stop
silenceAutoStop: { ms: 6000 },
maxDurationMs: 15 * 60_000,
wakeLock: true,
});
// r.start, r.stop, r.pause, r.resume,
// r.state, r.level, r.durationSec, r.transcript, r.dataFull reference: Browser Recorder.
React Native parity
@ohm_studio/sdk-react-native@0.3.0 got the same upgrade:
- Clinical recording preset by default — 16 kHz mono AAC at 32 kbps,
replaces the old
HIGH_QUALITYpreset (which was overkill for clinical speech and produced 3-4× larger files). - iOS audio session — automatically configured so recording works in
silent mode (
playsInSilentModeIOS: true, allowsRecordingIOS: true). - Pause / resume —
await rec.pause()/await rec.resume()on Expo SDK 50+. - dB → linear level metering — Expo emits
meteringin dBFS; SDK linearises to 0–1 for theonLevel/ hookr.levelfield. - Silence auto-stop —
silenceAutoStop: { ms, thresholdDb }. - Hard duration cap —
maxDurationMs. - Optional keep-awake hooks — pass
{ activate, deactivate }to wireexpo-keep-awakewithout us depending on it. useRecorder()hook — same shape as web (r.start,r.stop,r.level,r.durationSec,r.transcript,r.datawith auto-extract).- Typed
RecorderErrorwith codesPermissionDenied | NoMicrophone | InvalidState | Interrupted | NotSupported | Unknown. probePermission()— non-prompting permission check.
Drop-in compatible — existing new ExpoRecorder(Audio).start()/.stop()
code keeps working; the new options are all opt-in.
v0.2.0 — Streaming, Recorder, mock mode
Release date: 2026-05-09
Streaming
ohm.audio.extract.stream(...)— returns anAsyncIterableofStreamChunkevents so UI can render the transcript first, then update fields when the extraction LLM call completes. Backed by Server-Sent Events; the SDK parses the event stream automatically.- New backend endpoint:
POST /api/studio/v1/audio/extract/:slug/stream - Chunks:
{ type: "transcript", transcript, language? },{ type: "data", data, apiSlug },{ type: "done", latencyMs },{ type: "error", message, code? }.
Recorder utilities
Recorderfrom@ohm_studio/sdk— browserMediaRecorderwrapper.await rec.start()/await rec.stop()returns aBlobready forohm.audio.extract.isRecordingSupported()for SSR-safe detection.ExpoRecorderfrom@ohm_studio/sdk-react-native— thin Expo-AV adapter. Pass theAudionamespace fromexpo-av; we don't take a hard dep so bare-RN customers can stay withreact-native-audio-recorder-player.
Mock mode
new OHM({ mock: true })returns deterministic canned data for every method without hitting the network. Override per method viamockResponses: { extract, transcribe, audioExtract, summarize, insights }. Streaming variant emits the same canned chunks. Useful for unit tests, Storybook, and local preview builds.
Examples
New examples/ folder with three runnable samples:
node-cli/— CLI summarizer usingohm.summarize.nextjs-server-action/— browser uploader + Next.js'use server'action callingohm.audio.extract. Live key stays on the server.expo-mic-recorder/— Expo app usingExpoRecorder+ohm.audio.extract.
Tooling
- Removed the
${NODE_AUTH_TOKEN}reference from the committed.npmrcto silence pnpm's "Failed to replace env in config" warning on every install. Token is now passed to publish via--//registry…:_authTokenper-invocation.
v0.1.0 — Initial public release
Release date: 2026-05-09
Studio platform
- Multi-tenant developer platform at
studio.ohm.doctor- Projects, APIs, keys, versions, invocations, usage, audit log
- Org-scoped tenant isolation for projects / APIs / keys / invocations
- Default landing page is a Dashboard with 24-hour activity KPIs
- API Builder
- 16 field types — including 7 medical primitives (vitals, diagnoses, medications, allergies, investigations, referrals, procedures)
- Visual builder + JSON view, two-way sync
- Drag-and-drop section + field reordering, inline edit, delete
- Field palette with categories (Basic, Clinical primitives)
- Prompt tab
- Editable system prompt with the OHM Clinical Foundation Block (vital sanity, negation, code-mix, narrative formatting) prepended by default
- Per-API opt-out with required reason (audit-logged)
- Inputs tab — declare HTTP-time variables your callers pass
- Insights tab — toggle the second-pass insights extraction with its own schema + prompt
- Playground tab — text or browser-mic audio, JWT-authed, runs the draft spec without debiting customer keys
- API call tab — copy-paste cURL, JS / Node, React hook, React Native, Python (cURL fallback) snippets
- Logs tab — invocations table (status, latency, tokens, error) plus 24-hour and 30-day stat cards
- Versions tab — list of published snapshots with current marker
- Settings tab — rate limit, payload retention, foundation opt-out, archive
- Keys page — mint test-mode + live-mode keys, one-time plaintext reveal, last-used timestamp & IP, revoke
- AI assistant — six modes (chat / suggest fields / improve prompt / find edge cases / generate test transcript / diagnose extraction). Cost borne by OHM, never debited to customer quota.
Public extraction API (under /api/studio/v1)
POST /audio/transcribe— multipart audio → transcriptPOST /extract/:apiSlug— text → structured JSONPOST /audio/extract/:apiSlug— audio → transcript → structured JSON in one callPOST /summarize— text → summary (4 styles: patient / handover / executive / progress-note)POST /insights/:apiSlug— transcript → specialty insights- API-key auth, bcrypt-hashed at rest, per-key rate limit
- Bundle-key safeguard: live keys blocked in React Native unless
acknowledgeBundledKey: trueis passed - Sanitised vendor-neutral error messages — provider names never leak into HTTP responses or SDK error stacks
SDKs
@ohm_studio/sdk(OHM SDK) — works in browser, Node 18+, Next.js (server actions, route handlers, edge runtime)- Methods:
extract,summarize,insights,audio.transcribe,audio.extract - React hooks via
@ohm_studio/sdk/react:useOhmExtract,useOhmAudioExtract,useOhmSummarize,<OhmProvider>
- Methods:
@ohm_studio/sdk-react-native(OHM RN SDK) — RN-shaped multipart uploads ({ uri, name, type }), native fetch, hooks subentry- Shared
@ohm_studio/sdk-core— http client, errors, retry, types - Bundle size:
@ohm_studio/sdk~6 KB packed, RN ~5 KB packed - All three packages built dual ESM+CJS+
.d.ts, dist-only
Docs
- New site at
docs.ohm.doctor(Fumadocs / Next.js / MDX) - Pages: Quickstart, Your first audio API, Authentication, Templates & schemas, Field types, Prompts & Foundation, JS SDK, React hooks, React Native SDK, API reference, RN key handling, Compliance, Cookbook (triage form on a tablet), Changelog
- Static icon map (12 lucide icons) — sidebar tree icons render correctly under Turbopack ESM bundling
Roadmap
These ship in subsequent minor versions:
- 0.2 — streaming primitive (
ohm.audio.extract.stream(...)returning AsyncIterable for live transcript + field updates) - 0.2 — Recorder utility (web
MediaRecorder+ Expo / bare RN wrappers) so customers don't have to set up their own - 0.2 —
new OHM({ mock: true })returns canned responses for tests - 0.3 — Webhook callbacks for async extractions (schema reserved in v0.1)
- 0.3 — Custom domain mapping per project
- 0.4 — Marketplace / template sharing across orgs
- 0.4 — Native Python SDK