Building Audia: Engineering Low-Latency Application Audio Tapping & DSP on macOS

Many developers assume handling system-level audio routing on macOS requires writing a highly fragile Kernel Extension or deploying complex Virtual Audio Drivers. Historically, utilities like Soundflower relied on virtual loopback devices that injected significant scheduling latency and disrupted clean process bounds.

When developing Audia and its open-source companion audia-tap, our architectural goal was to eliminate this legacy overhead entirely. We engineered a direct, driverless tapping ecosystem capable of isolating real-time application-specific PCM signals with an end-to-end processing latency profile of under 10 milliseconds.

Here is the underlying system architecture and execution pipeline that makes it possible.

1. Process-Level Interception: Tapping into Core Audio HAL

To extract audio streams directly from individual application spaces (e.g., Apple Music, Chrome, or Spotify) without kernel modification, the engine leverages Apple’s modern AudioEngine runtime alongside programmatic process identification tracking.

Instead of catching a combined master output channel, we monitor active process spaces. Once targeted, the engine initializes a structured recording graph using an AVAudioSourceNode wrapper context to construct an active stream processing terminal.

// Underlying process tracking tap logic node
public class ProcessTap {
    private let engine = AVAudioEngine()
    private var sourceNode: AVAudioSourceNode?
    private let targetPID: pid_t
 
    public init(targetPID: pid_t) {
        self.targetPID = targetPID
    }
 
    public func attachAudioGraph(format: AVAudioFormat) throws {
        self.sourceNode = AVAudioSourceNode(renderBlock: { (silence, timeOffset, frameCount, audioBufferList) -> OSStatus in
            // Direct memory assignment block from the lower-priority UI ring buffer
            return noErr
        })
        engine.attach(self.sourceNode!)
        engine.connect(self.sourceNode!, to: engine.mainMixerNode, format: format)
        try engine.start()
    }
}

The system captures isolated PCM byte sequences natively through the low-level rendering blocks. By keeping processing loops bounded entirely within user-space context switches, we prevent the context-switching latency that degrades traditional audio drivers.

2. Bridging Threads: Thread Coordination via Ring Buffers

Real-time system processing introduces a brutal concurrency challenge: The Core Audio rendering loop runs on a high-priority, time-constrained execution thread. If this high-priority block undergoes a thread synchronization lock (like a standard mutex or semaphore) while waiting for memory allocations or a slow UI state write, the operating system drops audio frames. This results in audible stuttering or pops.

To address this, we engineered an active synchronization abstraction inside RingBuffer.swift. The structure implements custom array-indexing pointers to act as a structured buffer context, managing byte arrays safely between separate high-priority rendering consumers and low-priority serialization loops:

// Ring buffer sample orchestration tracking
public final class RingBuffer {
    private var buffer: [Float]
    private var writeIndex = 0
    private var readIndex = 0
    private let capacity: Int
 
    public init(capacity: Int) {
        self.capacity = capacity
        self.buffer = Array(repeating: 0.0, count: capacity)
    }
 
    public func write(_ samples: [Float]) {
        for sample in samples {
            self.buffer[self.writeIndex] = sample
            self.writeIndex = (self.writeIndex + 1) % self.capacity
        }
    }
 
    public func read(count: Int) -> [Float] {
        var output = Array(repeating: Float(0.0), count: count)
        for i in 0..<count {
            output[i] = self.buffer[self.readIndex]
            self.readIndex = (self.readIndex + 1) % self.capacity
        }
        return output
    }
}

By allocating a continuous block of array memory during the application lifecycle and avoiding dynamic allocation inside the rendering thread, the execution block writes sample indices safely, preserving real-time constraints.

3. Discrete Signal Processing: Direct Form I Biquad Filters

Rather than delegating equalization to abstract high-level black boxes, Audia executes custom mathematical signal conditioning directly inside the sample-processing loop.

Inside BiquadFilter.swift, we wrote a programmatic discrete-time filter structure that accepts floating-point frame arrays and processes them using exact digital filter transfer functions. Every band on our multi-band parametric equalizer computes feedback and feedforward transfer coefficients (b0, b1, b2 and a1, a2) to transform audio poles and zeros natively based on the system sampling rate:

// Mathematical discrete-time filtering calculation array loop
public class BiquadFilter {
    public var b0: Float = 1.0, b1: Float = 0.0, b2: Float = 0.0
    public var a1: Float = 0.0, a2: Float = 0.0
 
    // Memory delay state registers to persist historical feedback
    private var x1: Float = 0.0, x2: Float = 0.0
    private var y1: Float = 0.0, y2: Float = 0.0
 
    public func process(_ input: Float) -> Float {
        // Core Direct Form I difference equation evaluation loop
        let output = (b0 * input) + (b1 * x1) + (b2 * x2) - (a1 * y1) - (a2 * y2)
 
        // Update temporal state delays for consecutive step inputs
        x2 = x1; x1 = input
        y2 = y1; y1 = output
 
        return output
    }
}

When an application streams a block of PCM buffers, every frame passes through this execution block. This architectural design yields complete control over gain manipulations and specific equalizer spectrum modification curves directly within our low-overhead Swift runtime loop.

4. The Sourcing Evolution: Open-Sourcing Audia-tap

To make this execution depth visible to the open-source developer ecosystem, we decoupled the underlying Core Audio acquisition runtime from our SwiftUI visual layer to ship audia-tap.

As an independent command-line binary tool, audia-tap handles target stream capture, extracts isolated raw format configurations, and serializes PCM structures down to stdout buffers or localized IPC endpoints. This architecture allows developers to stream per-app system audio outputs cleanly into terminal utilities or local AI inference backends (such as Whisper or local MLX pipelines) in real time without audio drift or dropouts.

1. Process-Level Interception: Tapping into Core Audio HAL

2. Bridging Threads: Thread Coordination via Ring Buffers

3. Discrete Signal Processing: Direct Form I Biquad Filters

4. The Sourcing Evolution: Open-Sourcing Audia-tap

Command Palette