Skip to content

Audio runtime — WebAudio wrapper, scene-graph emitters, mixer buses

The framework’s audio surface. WebAudio’s AudioContext + AudioBufferSourceNode + PannerNode stay underneath; vibesmith adds a scene-graph-aware wrapper above so consumers attach sound to nodes the same way they attach scripts. No consumer ships its own WebAudio plumbing.

Package: @vibesmith/audio-runtime (re-exported by @vibesmith/runtime in projects opened via the binary).

What the framework owns vs what WebAudio owns

Section titled “What the framework owns vs what WebAudio owns”
LayerOwned by
AudioContext lifecycle, sample rate, base latencyWebAudio
AudioBuffer decoding, scheduling, loopingWebAudio
PannerNode, GainNode, BiquadFilterNode, ConvolverNodeWebAudio
AudioListener position + orientationWebAudio (driven by framework)
<AudioEmitter> scene-node componentFramework
Camera → listener syncFramework
Mixer bus hierarchy (master / music / sfx / dialogue / ambient)Framework
Per-bus volume / mute / solo / lowpass / compression persistenceFramework
Asset manifest → buffer cache (content-addressable)Framework
Recipe parameter → audio graph mappingFramework
Scenario capture / replay of emitter stateFramework
Autoplay-gesture gate + deferred initFramework
Eject hatch to raw WebAudio nodesFramework

WebAudio nodes stay reachable via emitter.raw(){ panner, gain, source, lowpass } for the unusual cases the wrapper doesn’t cover. The framework owns the common shape; it does not gate access to the primitive.

A scene node carries one or more emitters the same way it carries a script. The emitter is the unit of “this node makes sound.”

import { AudioEmitter, ListenerSync } from '@vibesmith/audio-runtime/r3f';
function Scene() {
return (
<>
<ListenerSync />
<group position={[5, 0, -3]}>
<AudioEmitter source="sfx.footstep.stone" bus="sfx" loop={false} />
<AudioEmitter source="ambient.wind.loop" bus="ambient" loop autoplay />
</group>
</>
);
}

The emitter inherits the carrying node’s world position every frame — no manual sync. Removing the node removes the emitter; HMR preserves state via the same Zustand pattern scenarios use for client state.

A node may carry multiple emitters (footsteps + breath + a voice line). Each emitter is independent — its own bus routing, gain, lifecycle. Composition over configuration.

Not every sound is scene-graph-shaped. HUD sounds, one-shot notifications, music transitions live above the scene tree.

import { useAudio } from '@vibesmith/audio-runtime';
const audio = useAudio();
audio.play('ui.button.confirm', { bus: 'sfx', gainDb: -3 });
audio.music.crossfade('music.combat', { durationMs: 2000 });
audio.bus('dialogue').onPlay(() =>
audio.bus('music').duck({ targetDb: -12, attackMs: 200, holdMs: 0, releaseMs: 600 }),
);

3D emitters route through a PannerNode whose position matches the carrying scene node’s world transform. The camera drives the AudioListener position + orientation via the <ListenerSync /> component.

Three named attenuation models — the WebAudio panner shapes, surfaced verbatim:

ModelWhen
inversephysical realism; long-range falloff
lineargameplay-readable; bounded range
exponentialdramatic dropoff; ambient zones

Each carries refDistance, maxDistance, rolloffFactor — the WebAudio panner parameters. The framework does not invent a fourth model.

WebAudio dropped automatic doppler in the 2017 spec revision. The framework ships a manual implementation behind a flag:

<AudioEmitter
source="vehicle.engine.loop"
loop
spatialization="3d"
doppler={{ enabled: true, velocityFactor: 1 }}
/>

When enabled, the runtime samples emitter velocity (frame-over- frame world-position delta) and pitches the source via detune. Off by default — doppler artefacts in non-vehicular games outweigh the realism gain.

Geometry-based occlusion is a Track P authoring concern, not a runtime concern. The runtime exposes a per-emitter lowpassHz so the authoring layer can drive occlusion from above; the runtime does not raycast on its own.

A fixed, small hierarchy:

master
├── music
├── sfx
├── dialogue
└── ambient

Per-bus state persists in localStorage under vibesmith.audio.mixer. The dev shell’s Audio panel surfaces sliders + mute / solo buttons; consumer code reads + writes via:

audio.bus('music').setGainDb(-6);
audio.bus('sfx').mute();
audio.bus('dialogue').solo();
audio.bus('music').setLowpassHz(8000);
audio.bus('master').setCompression({
thresholdDb: -18,
ratio: 4,
attackMs: 5,
releaseMs: 200,
makeupDb: 2,
});
  • music vs ambient split because their typical mix behaviour diverges (music ducks under dialogue; ambient does not).
  • dialogue is its own bus because it’s the most commonly ducked-against bus, and consumers need a stable name to wire ducking against.
  • sfx holds everything transient.
  • master is one knob for “everything quieter.”

A sixth bus (e.g., ui) is consumer territory — the runtime supports custom buses via createBus(name, parent). The framework’s preset is the five named above.

audio.bus('dialogue').onPlay(() =>
audio.bus('music').duck({
targetDb: -12,
attackMs: 200,
holdMs: 0,
releaseMs: 600,
}),
);

The implementation is a GainNode automation, not a real DynamicsCompressorNode — ducking is a curve, not a real-time compressor, and the curve is cheaper.

The audio runtime consumes the asset pipeline’s Stage 6 output (see asset-pipeline reference) — Opus primary + Vorbis fallback, loudness-normalised to −16 LUFS. The buffer cache is manifest-driven, not URL-driven:

import { registerManifest } from '@vibesmith/audio-runtime';
registerManifest({
assets: {
'sfx.footstep.stone': {
key: 'sfx.footstep.stone',
formats: {
primary: 'sfx/footstep_stone.webm',
fallback: 'sfx/footstep_stone.ogg',
},
durationMs: 380,
channels: 1,
defaultBus: 'sfx',
hash: 'a8b3c7…',
},
},
}, { assetBase: '/audio/' });

Decoded buffers are cached in-memory keyed by hash. Two manifest keys pointing at the same hash share one decoded buffer.

Asset typeStrategy
SFX (< 5 s)Fully decoded into AudioBuffer
Music / ambient loops (> 5 s)Streamed via MediaElementAudioSourceNode
DialogueStreamed (variable length, often skipped)

The 5-second threshold is the default; overridable per-asset via streaming: true | false.

The runtime uses a deferred-init pattern: the AudioContext is created lazily on the first user gesture (pointerdown / keydown / touchstart), and emitters that requested autoplay before that gesture are queued and started in declaration order once the context is live.

import { installAutoplayGate, unlockAudio } from '@vibesmith/audio-runtime';
// Auto: install once at app startup
installAutoplayGate();
// Or explicit: tie to a "click to start" button
button.addEventListener('click', () => unlockAudio());

Tab-visibility resumes the context on return-to-foreground; if Safari refuses, the gesture queue re-arms so the next user interaction unlocks.

Per-tier ceilings compose with the adaptive-rendering tier:

TierConcurrent emittersDecoded buffersBus filters
LOW1632per-bus lowpass only
MEDIUM3264+ compressor
HIGH64128+ convolution reverb
ULTRA128256+ multi-tap delay

audio.stats() returns { activeEmitters, decodedBuffers, contextTimeMs, dropouts } — read each frame by a performance critic.

Audio state is part of every scenario snapshot. On launch, bus state restores before any emitter starts; active emitters re-schedule at their recorded offset (a loop captured mid-cycle resumes mid-cycle); applied recipes re-apply in declaration order.

import { captureAudioState, restoreAudioState } from '@vibesmith/audio-runtime';
// On scenario capture: the scenario layer reads
const snapshot = captureAudioState();
// On scenario launch: the scenario layer writes
restoreAudioState(snapshot, {
resolveEmitter: (snap) => myEmitterRegistry.get(snap.emitterId),
});

Determinism: pitch jitter reads from the scenario’s rngSeed — two launches of the same scenario produce the same audio sequence.

  • Not a DAW. No sends, returns, arbitrary nesting, plugin chains, automation curves beyond the bus-level duck / gain. Consumers wanting that surface eject to raw WebAudio nodes via emitter.raw().
  • Not a runtime LLM surface. No model touches audio bytes at runtime. AI involvement is at authoring time — picking recipes, adapting parameters, naming buses.
  • Not a replacement for a music engine. Interactive music (vertical re-orchestration, horizontal re-sequencing) is a consumer-side authoring concern. The runtime exposes the mixer + ducking primitives the authoring layer needs.
  • Not a Web Audio polyfill. Browsers without AudioContext fall back silently — no sound, no errors, no synthetic replacement. The framework targets evergreen browsers per the WebGL constraints baseline.