Browser Recorder
Clinical-grade audio capture in the browser — codec cascade, VU meter, wake-lock, silence detection, useRecorder() hook. Available in @ohm_studio/sdk@0.3+.
@ohm_studio/sdk ships a Recorder class and a useRecorder() React hook
that handle everything you'd otherwise glue together yourself: codec
selection across iOS Safari / Firefox / Chromium, microphone permission,
VU level metering, silence auto-stop, wake-lock, mic-disconnect detection,
and pause/resume.
Designed for clinical consults — defaults are 16 kHz mono with echo cancellation, noise suppression, and auto-gain on, the audio shape every clinical STT model expects.
The hook (recommended)
"use client";
import { useRecorder, OhmProvider } from "@ohm_studio/sdk/react";
function VisitRecorder() {
const r = useRecorder({
apiSlug: "visit-extract", // auto-extract on stop (optional)
silenceAutoStop: { ms: 6000 }, // stop after 6s silence
maxDurationMs: 15 * 60_000, // hard cap at 15 minutes
wakeLock: true, // keep the screen awake
});
return (
<button onClick={r.isRecording ? r.stop : r.start} disabled={r.extracting}>
{r.extracting ? "Extracting…" :
r.isRecording ? `Stop (${r.durationSec.toFixed(0)}s)` :
"Record"}
</button>
);
}The hook returns a state object that updates as recording progresses:
| Field | Type | What it is |
|---|---|---|
state | "idle" | "starting" | "recording" | "paused" | "stopping" | Lifecycle. |
isRecording, isPaused | boolean | Convenience flags. |
level | number | Linear RMS 0–1, ~60 Hz updates. Wire to a VU meter. |
durationSec | number | Live duration (excludes paused time), ticks every 250 ms. |
blob | Blob | null | Final audio after stop(). |
error | RecorderError | null | Recorder-side error. |
extracting | boolean | True while auto-extract is in flight. |
transcript, data | string | null, T | null | Filled if apiSlug was passed. |
extractError | Error | null | Auto-extract failure. |
start, stop, pause, resume, cancel, reset | () => … | Imperative controls. |
<OhmProvider client={ohm}> must wrap any component using useRecorder
only if you set apiSlug (the hook needs the client for auto-extract).
Otherwise the hook works without a provider.
The class (framework-free)
import { Recorder } from "@ohm_studio/sdk";
const rec = new Recorder({
silenceAutoStop: { ms: 6000 },
maxDurationMs: 15 * 60_000,
wakeLock: true,
onLevel: (rms) => setVu(rms),
onError: (e) => toast(e.message),
});
await rec.start();
// …user records…
const blob = await rec.stop(); // ready for ohm.audio.extract({ file: blob })Options
interface RecorderOptions {
mimeType?: string; // override codec; default = auto-cascade
audioBitsPerSecond?: number; // default 32_000
deviceId?: string; // pick a specific microphone
audioConstraints?: MediaTrackConstraints;
clinicalDefaults?: boolean; // default true (16 kHz mono, EC/NS/AGC)
timesliceMs?: number; // emit dataavailable every N ms (for streaming)
maxDurationMs?: number;
silenceAutoStop?: { ms?: number; threshold?: number };
wakeLock?: boolean;
pauseOnHidden?: boolean; // pause when tab becomes hidden
onStateChange?: (state) => void;
onLevel?: (rms) => void;
onChunk?: (blob) => void; // streaming uploads
onError?: (err: RecorderError) => void;
onDeviceLost?: () => void;
}Methods
| Call | What it does |
|---|---|
await rec.start() | Request mic, begin recording. Throws RecorderError. |
await rec.stop() | Stop, cleanup, return final Blob. |
await rec.stopAfter(ms) | Convenience — stop in N ms. |
rec.pause() / rec.resume() | Pause/resume; duration tracking excludes paused time. |
rec.cancel() | Abort — no Blob returned. |
rec.getDuration() | ms recorded so far (live). |
rec.mimeType | Final MIME the browser is producing. |
Static helpers
Recorder.isSupported(); // boolean
Recorder.getSupportedMimeType(); // best codec for this browser
await Recorder.probePermission(); // "granted" | "denied" | "prompt" | "unknown"
await Recorder.listMicrophones(); // [{ deviceId, label, groupId }]Codec cascade
The default mimeType is picked by MediaRecorder.isTypeSupported() in
this order. Most apps don't need to override it.
| Browser | Picked codec |
|---|---|
| Chromium (desktop, Android) | audio/webm;codecs=opus |
| Firefox | audio/webm;codecs=opus (or audio/ogg;codecs=opus on older builds) |
| Safari (macOS 14.4+) | audio/mp4;codecs=mp4a.40.2 |
| iOS Safari (14.3+) | audio/mp4 |
iOS gotcha
iOS Safari requires recording to start in response to a user gesture
(button click, touch). Don't call start() from a useEffect — the
permission prompt won't appear and you'll get NotAllowedError.
Typed errors
Every recorder failure surfaces as RecorderError with a code:
import { RecorderError } from "@ohm_studio/sdk";
try {
await rec.start();
} catch (e) {
if (e instanceof RecorderError) {
switch (e.code) {
case "PermissionDenied":
return showPrompt("Please allow microphone access in browser settings.");
case "NoMicrophone":
return showPrompt("No microphone detected. Plug one in and try again.");
case "MicrophoneBusy":
return showPrompt("Another app is using the microphone.");
case "OverConstrained":
return showPrompt("This device doesn't support our audio settings.");
case "DeviceLost":
return showPrompt("Microphone was disconnected mid-recording.");
case "NotSupported":
return showFallbackUploader();
default:
return showPrompt("Recording failed: " + e.message);
}
}
}Streaming uploads (advanced)
Pass timesliceMs and an onChunk handler to stream audio as it's
captured. Pair this with the audio.extractStream
endpoint for half the perceived latency.
const rec = new Recorder({
timesliceMs: 250, // chunk every 250 ms
onChunk: (blob) => uploadChunk(blob),
});
await rec.start();Microphone picker
const [mics, setMics] = useState<MicrophoneInfo[]>([]);
useEffect(() => {
Recorder.probePermission().then((p) => {
if (p === "granted") Recorder.listMicrophones().then(setMics);
});
}, []);
return (
<select onChange={(e) => setDeviceId(e.target.value)}>
{mics.map((m) => <option key={m.deviceId} value={m.deviceId}>{m.label}</option>)}
</select>
);
// Then:
const rec = new Recorder({ deviceId });MicrophoneInfo.label is empty until the user has granted permission at
least once — that's a browser security rule, not a SDK choice.
Crash-safe persistence — never lose a consult
For clinical apps where losing a recording is a critical failure (doctor records 7 minutes, browser tab crashes during upload, audio is gone), the SDK ships an opt-in IndexedDB persistence layer.
One-flag opt-in via useRecorder
const r = useRecorder({
apiSlug: "visit-extract",
persist: { metadata: { visitId, doctorId } }, // ← one flag
});When persist is set, the recorded Blob is written to IndexedDB before
extraction. If extraction succeeds, the persisted copy is removed. If
the tab crashes, browser closes, or the network drops mid-upload, the
recording survives in IndexedDB and can be recovered on next mount.
Recovery flow
import { usePendingRecordings } from "@ohm_studio/sdk/react";
import { removeRecording } from "@ohm_studio/sdk";
function RecoveryDialog() {
const { pending, refresh, dismiss } = usePendingRecordings();
if (pending.length === 0) return null;
return (
<dialog open>
<p>We found {pending.length} unsent recording(s) from earlier.</p>
{pending.map((p) => (
<div key={p.id}>
<span>Visit {(p.metadata as any)?.visitId} · saved {timeAgo(p.savedAt)}</span>
<button
onClick={async () => {
await ohm.audio.extract({ apiSlug: "visit-extract", file: p.blob });
await removeRecording(p.id);
dismiss(p.id);
}}
>
Retry upload
</button>
<button onClick={async () => { await removeRecording(p.id); dismiss(p.id); }}>
Discard
</button>
</div>
))}
</dialog>
);
}Imperative API
import {
saveRecording,
getRecording,
listRecordings,
removeRecording,
clearRecordings,
isPersistenceSupported,
} from "@ohm_studio/sdk";
// Save
const id = await saveRecording(blob, { metadata: { visitId } });
// Recover
const pending = await listRecordings(); // newest first
// Cleanup
await removeRecording(id); // single
await clearRecordings(); // all (use on logout)Backed by idb-keyval (~600
bytes). All APIs are SSR-safe and gracefully no-op when IndexedDB
isn't available (Node, very old browsers, some Firefox private modes).
Speaker mode
OHM ships exactly two intentional modes — pick the one that matches your recording context:
import { SPEAKER_MODES } from "@ohm_studio/sdk";
<select onChange={(e) => setMode(e.target.value as SpeakerMode)}>
{SPEAKER_MODES.map((m) => (
<option key={m.code} value={m.code}>{m.label}</option>
))}
</select>
const r = useRecorder({
apiSlug: "opd-clinic",
speakerMode: mode, // "doctor" (default) | "doctor_patient"
});
// or imperative:
await ohm.audio.extract({ apiSlug, file, speakerMode: "doctor_patient" });
await ohm.audio.transcribe({ file, speakerMode: "doctor" });| Mode | When to use |
|---|---|
doctor (default) | Single-speaker dictation. Nurse-station vitals, dictated notes, ward rounds, post-op summaries — any flow where only the clinician talks into the mic. |
doctor_patient | Two-speaker conversation. The full visit recording. Hints the extractor to attribute history-of-illness to the patient and assessment / plan to the doctor. |
We deliberately don't expose multi-speaker variants (3+, panel
discussions, group rounds). Clinical extraction is tuned for the two
shapes above; everything else routes through audio.transcribe first
and into your own LLM call.
Language picker
The SDK ships a SUPPORTED_LANGUAGES constant so you don't hard-code
the list (we expand it as the pipeline gains coverage):
import { SUPPORTED_LANGUAGES } from "@ohm_studio/sdk";
<select onChange={(e) => setLang(e.target.value)}>
{SUPPORTED_LANGUAGES.map((l) => (
<option key={l.code} value={l.code}>
{l.label}{l.nativeLabel ? ` · ${l.nativeLabel}` : ""}
</option>
))}
</select>Pass the chosen code to extract / transcribe / useRecorder:
const r = useRecorder({
apiSlug: "opd-clinic",
extractLanguage: lang, // "auto" | "en" | "hi" | "ta" | …
});
// or for the imperative API:
await ohm.audio.extract({ apiSlug, file, language: lang });
await ohm.audio.transcribe({ file, language: lang });Supported codes (auto-detect + 23 languages): auto, en, hi,
ta, te, mr, bn, kn, ml, gu, pa, ur, or, as, ne,
kok, ks, sd, sa, sat, mni, brx, mai, doi. The provider's
shape (en-IN, hi-IN, …) is also accepted — the server normalises both.
isSupportedLanguageCode(code) is exported as a helper for validation.
Network-aware UX
import { useNetworkStatus } from "@ohm_studio/sdk/react";
const { online } = useNetworkStatus();
// gate uploads, show "Offline — we'll send when you're back" UIBacked by navigator.onLine + online/offline events. Zero deps.
What's not in the box
We deliberately don't bundle these — most apps don't need them, and the ones that do tend to want their own:
- Custom WAV encoder — long consults blow memory; we error on
NotSupportedinstead, so you can show an "upgrade your browser" UI. - IndexedDB persistence — bring your own; recovery semantics are app-specific.
- Real-time captions — use the streaming endpoint instead; it transcribes and extracts in one pass.
React Native
Use @ohm_studio/sdk-react-native for mobile —
it ships an ExpoRecorder and a BareRecorder with the same
start() / stop() shape but native audio session handling.