Audio runtime — WebAudio wrapper, scene-graph emitters, mixer buses
The framework’s audio surface. WebAudio’s AudioContext +
AudioBufferSourceNode + PannerNode stay underneath; vibesmith
adds a scene-graph-aware wrapper above so consumers attach sound
to nodes the same way they attach scripts. No consumer ships its
own WebAudio plumbing.
Package: @vibesmith/audio-runtime (re-exported by
@vibesmith/runtime in projects opened via the binary).
What the framework owns vs what WebAudio owns
Section titled “What the framework owns vs what WebAudio owns”| Layer | Owned by |
|---|---|
AudioContext lifecycle, sample rate, base latency | WebAudio |
AudioBuffer decoding, scheduling, looping | WebAudio |
PannerNode, GainNode, BiquadFilterNode, ConvolverNode | WebAudio |
AudioListener position + orientation | WebAudio (driven by framework) |
<AudioEmitter> scene-node component | Framework |
| Camera → listener sync | Framework |
| Mixer bus hierarchy (master / music / sfx / dialogue / ambient) | Framework |
| Per-bus volume / mute / solo / lowpass / compression persistence | Framework |
| Asset manifest → buffer cache (content-addressable) | Framework |
| Recipe parameter → audio graph mapping | Framework |
| Scenario capture / replay of emitter state | Framework |
| Autoplay-gesture gate + deferred init | Framework |
| Eject hatch to raw WebAudio nodes | Framework |
WebAudio nodes stay reachable via emitter.raw() —
{ panner, gain, source, lowpass } for the unusual cases the
wrapper doesn’t cover. The framework owns the common shape; it
does not gate access to the primitive.
Sound emitter component
Section titled “Sound emitter component”A scene node carries one or more emitters the same way it carries a script. The emitter is the unit of “this node makes sound.”
import { AudioEmitter, ListenerSync } from '@vibesmith/audio-runtime/r3f';
function Scene() { return ( <> <ListenerSync /> <group position={[5, 0, -3]}> <AudioEmitter source="sfx.footstep.stone" bus="sfx" loop={false} /> <AudioEmitter source="ambient.wind.loop" bus="ambient" loop autoplay /> </group> </> );}The emitter inherits the carrying node’s world position every frame — no manual sync. Removing the node removes the emitter; HMR preserves state via the same Zustand pattern scenarios use for client state.
A node may carry multiple emitters (footsteps + breath + a voice line). Each emitter is independent — its own bus routing, gain, lifecycle. Composition over configuration.
Imperative API (useAudio())
Section titled “Imperative API (useAudio())”Not every sound is scene-graph-shaped. HUD sounds, one-shot notifications, music transitions live above the scene tree.
import { useAudio } from '@vibesmith/audio-runtime';
const audio = useAudio();audio.play('ui.button.confirm', { bus: 'sfx', gainDb: -3 });audio.music.crossfade('music.combat', { durationMs: 2000 });audio.bus('dialogue').onPlay(() => audio.bus('music').duck({ targetDb: -12, attackMs: 200, holdMs: 0, releaseMs: 600 }),);3D positional audio
Section titled “3D positional audio”3D emitters route through a PannerNode whose position matches
the carrying scene node’s world transform. The camera drives the
AudioListener position + orientation via the <ListenerSync />
component.
Three named attenuation models — the WebAudio panner shapes, surfaced verbatim:
| Model | When |
|---|---|
inverse | physical realism; long-range falloff |
linear | gameplay-readable; bounded range |
exponential | dramatic dropoff; ambient zones |
Each carries refDistance, maxDistance, rolloffFactor — the
WebAudio panner parameters. The framework does not invent a
fourth model.
Doppler (opt-in)
Section titled “Doppler (opt-in)”WebAudio dropped automatic doppler in the 2017 spec revision. The framework ships a manual implementation behind a flag:
<AudioEmitter source="vehicle.engine.loop" loop spatialization="3d" doppler={{ enabled: true, velocityFactor: 1 }}/>When enabled, the runtime samples emitter velocity (frame-over-
frame world-position delta) and pitches the source via detune.
Off by default — doppler artefacts in non-vehicular games
outweigh the realism gain.
Occlusion
Section titled “Occlusion”Geometry-based occlusion is a Track P authoring concern, not a
runtime concern. The runtime exposes a per-emitter lowpassHz
so the authoring layer can drive occlusion from above; the
runtime does not raycast on its own.
Mixer bus model
Section titled “Mixer bus model”A fixed, small hierarchy:
master├── music├── sfx├── dialogue└── ambientPer-bus state persists in localStorage under
vibesmith.audio.mixer. The dev shell’s Audio panel surfaces
sliders + mute / solo buttons; consumer code reads + writes via:
audio.bus('music').setGainDb(-6);audio.bus('sfx').mute();audio.bus('dialogue').solo();audio.bus('music').setLowpassHz(8000);audio.bus('master').setCompression({ thresholdDb: -18, ratio: 4, attackMs: 5, releaseMs: 200, makeupDb: 2,});Why these five and not more
Section titled “Why these five and not more”- music vs ambient split because their typical mix behaviour diverges (music ducks under dialogue; ambient does not).
- dialogue is its own bus because it’s the most commonly ducked-against bus, and consumers need a stable name to wire ducking against.
- sfx holds everything transient.
- master is one knob for “everything quieter.”
A sixth bus (e.g., ui) is consumer territory — the runtime
supports custom buses via createBus(name, parent). The
framework’s preset is the five named above.
Ducking
Section titled “Ducking”audio.bus('dialogue').onPlay(() => audio.bus('music').duck({ targetDb: -12, attackMs: 200, holdMs: 0, releaseMs: 600, }),);The implementation is a GainNode automation, not a real
DynamicsCompressorNode — ducking is a curve, not a real-time
compressor, and the curve is cheaper.
Asset shape
Section titled “Asset shape”The audio runtime consumes the asset pipeline’s Stage 6 output (see asset-pipeline reference) — Opus primary + Vorbis fallback, loudness-normalised to −16 LUFS. The buffer cache is manifest-driven, not URL-driven:
import { registerManifest } from '@vibesmith/audio-runtime';
registerManifest({ assets: { 'sfx.footstep.stone': { key: 'sfx.footstep.stone', formats: { primary: 'sfx/footstep_stone.webm', fallback: 'sfx/footstep_stone.ogg', }, durationMs: 380, channels: 1, defaultBus: 'sfx', hash: 'a8b3c7…', }, },}, { assetBase: '/audio/' });Decoded buffers are cached in-memory keyed by hash. Two
manifest keys pointing at the same hash share one decoded
buffer.
Streaming vs decoded
Section titled “Streaming vs decoded”| Asset type | Strategy |
|---|---|
| SFX (< 5 s) | Fully decoded into AudioBuffer |
| Music / ambient loops (> 5 s) | Streamed via MediaElementAudioSourceNode |
| Dialogue | Streamed (variable length, often skipped) |
The 5-second threshold is the default; overridable per-asset via
streaming: true | false.
Browser autoplay policy
Section titled “Browser autoplay policy”The runtime uses a deferred-init pattern: the AudioContext
is created lazily on the first user gesture
(pointerdown / keydown / touchstart), and emitters that
requested autoplay before that gesture are queued and started in
declaration order once the context is live.
import { installAutoplayGate, unlockAudio } from '@vibesmith/audio-runtime';
// Auto: install once at app startupinstallAutoplayGate();
// Or explicit: tie to a "click to start" buttonbutton.addEventListener('click', () => unlockAudio());Tab-visibility resumes the context on return-to-foreground; if Safari refuses, the gesture queue re-arms so the next user interaction unlocks.
Performance budget
Section titled “Performance budget”Per-tier ceilings compose with the adaptive-rendering tier:
| Tier | Concurrent emitters | Decoded buffers | Bus filters |
|---|---|---|---|
| LOW | 16 | 32 | per-bus lowpass only |
| MEDIUM | 32 | 64 | + compressor |
| HIGH | 64 | 128 | + convolution reverb |
| ULTRA | 128 | 256 | + multi-tap delay |
audio.stats() returns
{ activeEmitters, decodedBuffers, contextTimeMs, dropouts } —
read each frame by a performance critic.
Scenario integration
Section titled “Scenario integration”Audio state is part of every scenario snapshot. On launch, bus state restores before any emitter starts; active emitters re-schedule at their recorded offset (a loop captured mid-cycle resumes mid-cycle); applied recipes re-apply in declaration order.
import { captureAudioState, restoreAudioState } from '@vibesmith/audio-runtime';
// On scenario capture: the scenario layer readsconst snapshot = captureAudioState();
// On scenario launch: the scenario layer writesrestoreAudioState(snapshot, { resolveEmitter: (snap) => myEmitterRegistry.get(snap.emitterId),});Determinism: pitch jitter reads from the scenario’s rngSeed —
two launches of the same scenario produce the same audio
sequence.
What this is NOT
Section titled “What this is NOT”- Not a DAW. No sends, returns, arbitrary nesting, plugin
chains, automation curves beyond the bus-level duck / gain.
Consumers wanting that surface eject to raw WebAudio nodes
via
emitter.raw(). - Not a runtime LLM surface. No model touches audio bytes at runtime. AI involvement is at authoring time — picking recipes, adapting parameters, naming buses.
- Not a replacement for a music engine. Interactive music (vertical re-orchestration, horizontal re-sequencing) is a consumer-side authoring concern. The runtime exposes the mixer + ducking primitives the authoring layer needs.
- Not a Web Audio polyfill. Browsers without AudioContext fall back silently — no sound, no errors, no synthetic replacement. The framework targets evergreen browsers per the WebGL constraints baseline.