Voice Synthesis (YAP)

YAP (Yet Another Presenter) provides real-time text-to-speech capabilities for your Neko automation sessions. Convert text messages into natural-sounding speech that plays directly in the browser.

What is YAP?

YAP transforms text into speech using advanced AI voice synthesis (F5-TTS) and streams the audio through WebRTC to your Neko browser session. This enables:

Voice announcements during automation tasks
Interactive conversations through chat commands
Live narration of automation steps
Multiple voice personas with custom characteristics

Quick Start

1. Prerequisites

Before using YAP, ensure you have:

A running Neko server (see Getting Started)
GPU environment for optimal performance: nix develop .#gpu
Basic voice files in the ./voices directory

2. Start YAP Service

# Connect to local Neko server
export NEKO_URL="http://localhost:8080"
export NEKO_USER="user" 
export NEKO_PASS="password"

uv run src/yap.py

You should see:

[12:34:56] yap INFO - WS connected
[12:34:56] yap INFO - RTC answer sent; audio track live.
[12:34:56] yap INFO - Voices reloaded (1 entries)

3. Test Voice Output

In your Neko browser chat, type:

/yap Hello! I am your voice assistant.

You should hear the text spoken through the browser audio.

Voice Commands

YAP responds to commands in the Neko chat interface:

Immediate Speech

Speak text immediately:

/yap Good morning! Ready to start automation.
/yap The task has been completed successfully.

Streaming Mode

For longer conversations or live narration:

/yap:begin
I'm starting the automation task now...
Navigating to the website...
Filling out the form...
Submitting the data...
/yap:end

In streaming mode, YAP processes text incrementally as you type, enabling natural conversation flow.

Stop/Clear Queue

Cancel current speech and clear the queue:

/yap:stop

Voice Management

Default Voice Setup

YAP needs at least one voice configured. Create a basic setup:

Create voices directory:
```
mkdir -p voices
```

Add a reference audio file:

# Record or copy a 3-10 second WAV file
cp your-voice-sample.wav voices/default.wav

YAP will auto-create voices.json on first run with default settings.

Adding New Voices

Add voices through chat commands:

/yap:voice add --spk alice --ref ./voices/alice.wav --ref-text "Hello, my name is Alice" --styles "friendly,calm"

Parameters:

--spk: Voice ID/name
--ref: Path to reference audio file (WAV format, 3-10 seconds)
--ref-text: Transcript of the reference audio
--styles: Comma-separated style tags
--rate: Speech speed (0.5-2.0, default 1.0)
--pitch: Pitch shift in semitones (-12 to +12, default 0.0)

Switching Voices

Change the active voice and parameters:

/yap:voice set --spk alice
/yap:voice set --spk bob --rate 1.2 --pitch -0.5
/yap:voice set --rate 0.8

Reload Voice Configuration

After manually editing voices/voices.json:

/yap:voice reload

Configuration

Basic Settings

Set these environment variables before starting YAP:

# Connection (choose one method)
export YAP_WS="wss://demo.neko.com/api/ws?token=your_token"
# OR
export NEKO_URL="https://demo.neko.com"
export NEKO_USER="username"
export NEKO_PASS="password"

# Voice directory
export YAP_VOICES_DIR="./voices"
export YAP_SPK_DEFAULT="default"

Audio Quality

# Audio format (recommended settings)
export YAP_SR=48000              # Sample rate (Hz)
export YAP_AUDIO_CHANNELS=1      # Channels (1=mono, 2=stereo)
export YAP_FRAME_MS=20           # WebRTC frame size

# Processing
export YAP_PARALLEL=2            # TTS worker threads
export YAP_MAX_CHARS=350         # Max characters per chunk
export YAP_OVERLAP_MS=30         # Audio crossfade overlap

Performance Tuning

# For faster response (lower quality)
export YAP_MAX_CHARS=200
export YAP_PARALLEL=4

# For better quality (higher latency)  
export YAP_MAX_CHARS=500
export YAP_OVERLAP_MS=50

# Buffer management
export YAP_JITTER_MAX_SEC=6.0    # Audio buffer size

Voice Configuration File

YAP stores voice settings in voices/voices.json:

{
  "default": {
    "ref_audio": "./voices/default.wav",
    "ref_text": "This is my default voice sample.",
    "styles": ["neutral"],
    "params": {
      "rate": 1.0,
      "pitch": 0.0
    }
  },
  "alice": {
    "ref_audio": "./voices/alice.wav", 
    "ref_text": "Hello, my name is Alice and I sound friendly.",
    "styles": ["friendly", "energetic"],
    "params": {
      "rate": 1.1,
      "pitch": 0.2
    }
  },
  "narrator": {
    "ref_audio": "./voices/narrator.wav",
    "ref_text": "I will be narrating the automation process.",
    "styles": ["professional", "clear"],
    "params": {
      "rate": 0.9,
      "pitch": -0.3
    }
  }
}

Voice Parameters

ref_audio: Path to reference WAV file (3-10 seconds, clear speech)
ref_text: Exact transcript of the reference audio
styles: Descriptive tags (friendly, professional, calm, energetic)
rate: Speech speed multiplier (0.5=slow, 1.0=normal, 2.0=fast)
pitch: Pitch adjustment in semitones (negative=lower, positive=higher)

Usage Scenarios

Automation Announcements

Start YAP alongside your automation agent:

# Terminal 1: Start YAP
uv run src/yap.py

# Terminal 2: Run automation with voice (audio is enabled by default)
uv run src/agent.py --task "Fill out contact form"

The agent can announce progress:

/yap Starting automation task: Fill out contact form
/yap Navigating to the website...
/yap Form submitted successfully!

Interactive Sessions

Use YAP for live interaction during manual control:

# Terminal 1: Start YAP
uv run src/yap.py

# Terminal 2: Manual control
python src/manual.py

Then control both automation and voice through chat:

!click 100 200
/yap Clicked on the submit button
!type "Hello World"
/yap Entered text in the field

Multi-Voice Conversations

Set up different voices for different purposes:

# Set up voices
/yap:voice add --spk system --ref ./voices/system.wav --ref-text "System notification" --rate 0.9
/yap:voice add --spk user --ref ./voices/user.wav --ref-text "User interaction" --rate 1.1

# Use in conversation
/yap:voice set --spk system
/yap System initialization complete.

/yap:voice set --spk user  
/yap Thank you for the update!

Troubleshooting

No Audio Output

Check browser audio permissions:

Click the browser's address bar lock icon
Ensure "Sound" is allowed
Check browser volume settings

Verify WebRTC connection:

Open browser developer tools (F12)
Go to Console tab
Look for WebRTC connection messages
Check for audio stream indicators

Test connection:

uv run src/yap.py --healthcheck

Poor Audio Quality

Check reference audio:

Use high-quality WAV files (16-bit, 22kHz+)
3-10 second samples with clear speech
No background noise or music
Single speaker only

Adjust processing:

# Increase overlap for smoother transitions
export YAP_OVERLAP_MS=50

# Reduce chunk size for lower latency
export YAP_MAX_CHARS=250

High Latency

Optimize for speed:

# Increase parallel workers
export YAP_PARALLEL=4

# Reduce chunk size
export YAP_MAX_CHARS=200

# Reduce buffer size
export YAP_JITTER_MAX_SEC=3.0

Check GPU usage:

# Monitor GPU usage while YAP is running
nvidia-smi -l 1

Connection Issues

WebSocket connection failed:

# Test WebSocket endpoint
curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
  http://localhost:8080/api/ws

Authentication failed:

# Test REST login
curl -X POST http://localhost:8080/api/login \
  -H "Content-Type: application/json" \
  -d '{"username":"user","password":"password"}'

Check firewall/network:

Ensure ports 8080 (HTTP) and WebRTC ports are accessible
Test with STUN server connectivity
Check for corporate proxy/firewall blocking WebRTC

Debug Mode

Enable detailed logging:

export YAP_LOGLEVEL=DEBUG
export YAP_LOG_FORMAT=json
uv run src/yap.py 2>&1 | jq .

Look for:

WebSocket connection status
WebRTC negotiation progress
TTS processing times
Audio buffer status

Advanced Usage

Custom Voice Training

For better voice quality, record multiple reference samples:

Record varied samples:

# Different emotions/styles
voices/alice-happy.wav
voices/alice-serious.wav
voices/alice-excited.wav

Test different samples:

/yap:voice set --spk alice-happy
/yap I'm so excited about this automation!

/yap:voice set --spk alice-serious  
/yap Please review these results carefully.

Integration with Automation

Modify automation scripts to include voice feedback:

# In your automation script
import requests

def announce(text):
    """Send voice announcement to YAP"""
    requests.post(f"{neko_url}/api/chat", json={
        "message": f"/yap {text}"
    })

# Use in automation
announce("Starting login process")
agent.click_login_button()
announce("Login successful, proceeding to dashboard")

Docker Deployment

Deploy YAP as a container service:

FROM python:3.9-slim

# Install system dependencies
RUN apt-get update && apt-get install -y ffmpeg

# Install Python dependencies  
COPY requirements.txt .
RUN pip install -r requirements.txt

# Copy application and voices
COPY src/ /app/src/
COPY voices/ /app/voices/
WORKDIR /app

# Configuration
ENV YAP_VOICES_DIR=/app/voices
ENV YAP_SR=48000
ENV YAP_PARALLEL=2

ENTRYPOINT ["python", "src/yap.py"]

Next Steps

Learn about Training Data Capture to improve voice models
Explore Core Agent for automation integration
Read TTS Service Technical Details for advanced configuration
Check Neko Integration for server setup options

Getting Started - Initial system setup
Training Data Capture - Data collection for improvements
Manual Control CLI - Interactive testing
Architecture Overview - System design

Keyboard shortcuts

Neko Agent Documentation