OHMOHM Studio

Browser Recorder

Clinical-grade audio capture in the browser — codec cascade, VU meter, wake-lock, silence detection, useRecorder() hook. Available in @ohm_studio/sdk@0.3+.

View as Markdown

@ohm_studio/sdk ships a Recorder class and a useRecorder() React hook that handle everything you'd otherwise glue together yourself: codec selection across iOS Safari / Firefox / Chromium, microphone permission, VU level metering, silence auto-stop, wake-lock, mic-disconnect detection, and pause/resume.

Designed for clinical consults — defaults are 16 kHz mono with echo cancellation, noise suppression, and auto-gain on, the audio shape every clinical STT model expects.

"use client";
import { useRecorder, OhmProvider } from "@ohm_studio/sdk/react";

function VisitRecorder() {
  const r = useRecorder({
    apiSlug: "visit-extract",          // auto-extract on stop (optional)
    silenceAutoStop: { ms: 6000 },     // stop after 6s silence
    maxDurationMs: 15 * 60_000,        // hard cap at 15 minutes
    wakeLock: true,                    // keep the screen awake
  });

  return (
    <button onClick={r.isRecording ? r.stop : r.start} disabled={r.extracting}>
      {r.extracting   ? "Extracting…" :
       r.isRecording  ? `Stop (${r.durationSec.toFixed(0)}s)` :
                        "Record"}
    </button>
  );
}

The hook returns a state object that updates as recording progresses:

FieldTypeWhat it is
state"idle" | "starting" | "recording" | "paused" | "stopping"Lifecycle.
isRecording, isPausedbooleanConvenience flags.
levelnumberLinear RMS 0–1, ~60 Hz updates. Wire to a VU meter.
durationSecnumberLive duration (excludes paused time), ticks every 250 ms.
blobBlob | nullFinal audio after stop().
errorRecorderError | nullRecorder-side error.
extractingbooleanTrue while auto-extract is in flight.
transcript, datastring | null, T | nullFilled if apiSlug was passed.
extractErrorError | nullAuto-extract failure.
start, stop, pause, resume, cancel, reset() => …Imperative controls.

<OhmProvider client={ohm}> must wrap any component using useRecorder only if you set apiSlug (the hook needs the client for auto-extract). Otherwise the hook works without a provider.

The class (framework-free)

import { Recorder } from "@ohm_studio/sdk";

const rec = new Recorder({
  silenceAutoStop: { ms: 6000 },
  maxDurationMs: 15 * 60_000,
  wakeLock: true,
  onLevel: (rms) => setVu(rms),
  onError: (e)   => toast(e.message),
});

await rec.start();
// …user records…
const blob = await rec.stop();    // ready for ohm.audio.extract({ file: blob })

Options

interface RecorderOptions {
  mimeType?: string;                  // override codec; default = auto-cascade
  audioBitsPerSecond?: number;        // default 32_000
  deviceId?: string;                  // pick a specific microphone
  audioConstraints?: MediaTrackConstraints;
  clinicalDefaults?: boolean;         // default true (16 kHz mono, EC/NS/AGC)
  timesliceMs?: number;               // emit dataavailable every N ms (for streaming)
  maxDurationMs?: number;
  silenceAutoStop?: { ms?: number; threshold?: number };
  wakeLock?: boolean;
  pauseOnHidden?: boolean;            // pause when tab becomes hidden
  onStateChange?: (state) => void;
  onLevel?:       (rms) => void;
  onChunk?:       (blob) => void;     // streaming uploads
  onError?:       (err: RecorderError) => void;
  onDeviceLost?:  () => void;
}

Methods

CallWhat it does
await rec.start()Request mic, begin recording. Throws RecorderError.
await rec.stop()Stop, cleanup, return final Blob.
await rec.stopAfter(ms)Convenience — stop in N ms.
rec.pause() / rec.resume()Pause/resume; duration tracking excludes paused time.
rec.cancel()Abort — no Blob returned.
rec.getDuration()ms recorded so far (live).
rec.mimeTypeFinal MIME the browser is producing.

Static helpers

Recorder.isSupported();             // boolean
Recorder.getSupportedMimeType();    // best codec for this browser
await Recorder.probePermission();   // "granted" | "denied" | "prompt" | "unknown"
await Recorder.listMicrophones();   // [{ deviceId, label, groupId }]

Codec cascade

The default mimeType is picked by MediaRecorder.isTypeSupported() in this order. Most apps don't need to override it.

BrowserPicked codec
Chromium (desktop, Android)audio/webm;codecs=opus
Firefoxaudio/webm;codecs=opus (or audio/ogg;codecs=opus on older builds)
Safari (macOS 14.4+)audio/mp4;codecs=mp4a.40.2
iOS Safari (14.3+)audio/mp4

iOS gotcha

iOS Safari requires recording to start in response to a user gesture (button click, touch). Don't call start() from a useEffect — the permission prompt won't appear and you'll get NotAllowedError.

Typed errors

Every recorder failure surfaces as RecorderError with a code:

import { RecorderError } from "@ohm_studio/sdk";

try {
  await rec.start();
} catch (e) {
  if (e instanceof RecorderError) {
    switch (e.code) {
      case "PermissionDenied":
        return showPrompt("Please allow microphone access in browser settings.");
      case "NoMicrophone":
        return showPrompt("No microphone detected. Plug one in and try again.");
      case "MicrophoneBusy":
        return showPrompt("Another app is using the microphone.");
      case "OverConstrained":
        return showPrompt("This device doesn't support our audio settings.");
      case "DeviceLost":
        return showPrompt("Microphone was disconnected mid-recording.");
      case "NotSupported":
        return showFallbackUploader();
      default:
        return showPrompt("Recording failed: " + e.message);
    }
  }
}

Streaming uploads (advanced)

Pass timesliceMs and an onChunk handler to stream audio as it's captured. Pair this with the audio.extractStream endpoint for half the perceived latency.

const rec = new Recorder({
  timesliceMs: 250,                    // chunk every 250 ms
  onChunk: (blob) => uploadChunk(blob),
});
await rec.start();

Microphone picker

const [mics, setMics] = useState<MicrophoneInfo[]>([]);
useEffect(() => {
  Recorder.probePermission().then((p) => {
    if (p === "granted") Recorder.listMicrophones().then(setMics);
  });
}, []);

return (
  <select onChange={(e) => setDeviceId(e.target.value)}>
    {mics.map((m) => <option key={m.deviceId} value={m.deviceId}>{m.label}</option>)}
  </select>
);

// Then:
const rec = new Recorder({ deviceId });

MicrophoneInfo.label is empty until the user has granted permission at least once — that's a browser security rule, not a SDK choice.

Crash-safe persistence — never lose a consult

For clinical apps where losing a recording is a critical failure (doctor records 7 minutes, browser tab crashes during upload, audio is gone), the SDK ships an opt-in IndexedDB persistence layer.

One-flag opt-in via useRecorder

const r = useRecorder({
  apiSlug: "visit-extract",
  persist: { metadata: { visitId, doctorId } },  // ← one flag
});

When persist is set, the recorded Blob is written to IndexedDB before extraction. If extraction succeeds, the persisted copy is removed. If the tab crashes, browser closes, or the network drops mid-upload, the recording survives in IndexedDB and can be recovered on next mount.

Recovery flow

import { usePendingRecordings } from "@ohm_studio/sdk/react";
import { removeRecording } from "@ohm_studio/sdk";

function RecoveryDialog() {
  const { pending, refresh, dismiss } = usePendingRecordings();

  if (pending.length === 0) return null;

  return (
    <dialog open>
      <p>We found {pending.length} unsent recording(s) from earlier.</p>
      {pending.map((p) => (
        <div key={p.id}>
          <span>Visit {(p.metadata as any)?.visitId} · saved {timeAgo(p.savedAt)}</span>
          <button
            onClick={async () => {
              await ohm.audio.extract({ apiSlug: "visit-extract", file: p.blob });
              await removeRecording(p.id);
              dismiss(p.id);
            }}
          >
            Retry upload
          </button>
          <button onClick={async () => { await removeRecording(p.id); dismiss(p.id); }}>
            Discard
          </button>
        </div>
      ))}
    </dialog>
  );
}

Imperative API

import {
  saveRecording,
  getRecording,
  listRecordings,
  removeRecording,
  clearRecordings,
  isPersistenceSupported,
} from "@ohm_studio/sdk";

// Save
const id = await saveRecording(blob, { metadata: { visitId } });

// Recover
const pending = await listRecordings();   // newest first

// Cleanup
await removeRecording(id);                 // single
await clearRecordings();                   // all (use on logout)

Backed by idb-keyval (~600 bytes). All APIs are SSR-safe and gracefully no-op when IndexedDB isn't available (Node, very old browsers, some Firefox private modes).

Speaker mode

OHM ships exactly two intentional modes — pick the one that matches your recording context:

import { SPEAKER_MODES } from "@ohm_studio/sdk";

<select onChange={(e) => setMode(e.target.value as SpeakerMode)}>
  {SPEAKER_MODES.map((m) => (
    <option key={m.code} value={m.code}>{m.label}</option>
  ))}
</select>

const r = useRecorder({
  apiSlug: "opd-clinic",
  speakerMode: mode,                // "doctor" (default) | "doctor_patient"
});

// or imperative:
await ohm.audio.extract({ apiSlug, file, speakerMode: "doctor_patient" });
await ohm.audio.transcribe({ file, speakerMode: "doctor" });
ModeWhen to use
doctor (default)Single-speaker dictation. Nurse-station vitals, dictated notes, ward rounds, post-op summaries — any flow where only the clinician talks into the mic.
doctor_patientTwo-speaker conversation. The full visit recording. Hints the extractor to attribute history-of-illness to the patient and assessment / plan to the doctor.

We deliberately don't expose multi-speaker variants (3+, panel discussions, group rounds). Clinical extraction is tuned for the two shapes above; everything else routes through audio.transcribe first and into your own LLM call.

Language picker

The SDK ships a SUPPORTED_LANGUAGES constant so you don't hard-code the list (we expand it as the pipeline gains coverage):

import { SUPPORTED_LANGUAGES } from "@ohm_studio/sdk";

<select onChange={(e) => setLang(e.target.value)}>
  {SUPPORTED_LANGUAGES.map((l) => (
    <option key={l.code} value={l.code}>
      {l.label}{l.nativeLabel ? ` · ${l.nativeLabel}` : ""}
    </option>
  ))}
</select>

Pass the chosen code to extract / transcribe / useRecorder:

const r = useRecorder({
  apiSlug: "opd-clinic",
  extractLanguage: lang,            // "auto" | "en" | "hi" | "ta" | …
});

// or for the imperative API:
await ohm.audio.extract({ apiSlug, file, language: lang });
await ohm.audio.transcribe({ file, language: lang });

Supported codes (auto-detect + 23 languages): auto, en, hi, ta, te, mr, bn, kn, ml, gu, pa, ur, or, as, ne, kok, ks, sd, sa, sat, mni, brx, mai, doi. The provider's shape (en-IN, hi-IN, …) is also accepted — the server normalises both.

isSupportedLanguageCode(code) is exported as a helper for validation.

Network-aware UX

import { useNetworkStatus } from "@ohm_studio/sdk/react";

const { online } = useNetworkStatus();
// gate uploads, show "Offline — we'll send when you're back" UI

Backed by navigator.onLine + online/offline events. Zero deps.

What's not in the box

We deliberately don't bundle these — most apps don't need them, and the ones that do tend to want their own:

  • Custom WAV encoder — long consults blow memory; we error on NotSupported instead, so you can show an "upgrade your browser" UI.
  • IndexedDB persistence — bring your own; recovery semantics are app-specific.
  • Real-time captions — use the streaming endpoint instead; it transcribes and extracts in one pass.

React Native

Use @ohm_studio/sdk-react-native for mobile — it ships an ExpoRecorder and a BareRecorder with the same start() / stop() shape but native audio session handling.