Voice Synthesis (YAP)
YAP (Yet Another Presenter) provides real-time text-to-speech capabilities for your Neko automation sessions. Convert text messages into natural-sounding speech that plays directly in the browser.
What is YAP?
YAP transforms text into speech using advanced AI voice synthesis (F5-TTS) and streams the audio through WebRTC to your Neko browser session. This enables:
- Voice announcements during automation tasks
- Interactive conversations through chat commands
- Live narration of automation steps
- Multiple voice personas with custom characteristics
Quick Start
1. Prerequisites
Before using YAP, ensure you have:
- A running Neko server (see Getting Started)
- GPU environment for optimal performance:
nix develop .#gpu
- Basic voice files in the
./voices
directory
2. Start YAP Service
# Connect to local Neko server
export NEKO_URL="http://localhost:8080"
export NEKO_USER="user"
export NEKO_PASS="password"
uv run src/yap.py
You should see:
[12:34:56] yap INFO - WS connected
[12:34:56] yap INFO - RTC answer sent; audio track live.
[12:34:56] yap INFO - Voices reloaded (1 entries)
3. Test Voice Output
In your Neko browser chat, type:
/yap Hello! I am your voice assistant.
You should hear the text spoken through the browser audio.
Voice Commands
YAP responds to commands in the Neko chat interface:
Immediate Speech
Speak text immediately:
/yap Good morning! Ready to start automation.
/yap The task has been completed successfully.
Streaming Mode
For longer conversations or live narration:
/yap:begin
I'm starting the automation task now...
Navigating to the website...
Filling out the form...
Submitting the data...
/yap:end
In streaming mode, YAP processes text incrementally as you type, enabling natural conversation flow.
Stop/Clear Queue
Cancel current speech and clear the queue:
/yap:stop
Voice Management
Default Voice Setup
YAP needs at least one voice configured. Create a basic setup:
-
Create voices directory:
mkdir -p voices
-
Add a reference audio file:
# Record or copy a 3-10 second WAV file cp your-voice-sample.wav voices/default.wav
-
YAP will auto-create voices.json on first run with default settings.
Adding New Voices
Add voices through chat commands:
/yap:voice add --spk alice --ref ./voices/alice.wav --ref-text "Hello, my name is Alice" --styles "friendly,calm"
Parameters:
--spk
: Voice ID/name--ref
: Path to reference audio file (WAV format, 3-10 seconds)--ref-text
: Transcript of the reference audio--styles
: Comma-separated style tags--rate
: Speech speed (0.5-2.0, default 1.0)--pitch
: Pitch shift in semitones (-12 to +12, default 0.0)
Switching Voices
Change the active voice and parameters:
/yap:voice set --spk alice
/yap:voice set --spk bob --rate 1.2 --pitch -0.5
/yap:voice set --rate 0.8
Reload Voice Configuration
After manually editing voices/voices.json
:
/yap:voice reload
Configuration
Basic Settings
Set these environment variables before starting YAP:
# Connection (choose one method)
export YAP_WS="wss://demo.neko.com/api/ws?token=your_token"
# OR
export NEKO_URL="https://demo.neko.com"
export NEKO_USER="username"
export NEKO_PASS="password"
# Voice directory
export YAP_VOICES_DIR="./voices"
export YAP_SPK_DEFAULT="default"
Audio Quality
# Audio format (recommended settings)
export YAP_SR=48000 # Sample rate (Hz)
export YAP_AUDIO_CHANNELS=1 # Channels (1=mono, 2=stereo)
export YAP_FRAME_MS=20 # WebRTC frame size
# Processing
export YAP_PARALLEL=2 # TTS worker threads
export YAP_MAX_CHARS=350 # Max characters per chunk
export YAP_OVERLAP_MS=30 # Audio crossfade overlap
Performance Tuning
# For faster response (lower quality)
export YAP_MAX_CHARS=200
export YAP_PARALLEL=4
# For better quality (higher latency)
export YAP_MAX_CHARS=500
export YAP_OVERLAP_MS=50
# Buffer management
export YAP_JITTER_MAX_SEC=6.0 # Audio buffer size
Voice Configuration File
YAP stores voice settings in voices/voices.json
:
{
"default": {
"ref_audio": "./voices/default.wav",
"ref_text": "This is my default voice sample.",
"styles": ["neutral"],
"params": {
"rate": 1.0,
"pitch": 0.0
}
},
"alice": {
"ref_audio": "./voices/alice.wav",
"ref_text": "Hello, my name is Alice and I sound friendly.",
"styles": ["friendly", "energetic"],
"params": {
"rate": 1.1,
"pitch": 0.2
}
},
"narrator": {
"ref_audio": "./voices/narrator.wav",
"ref_text": "I will be narrating the automation process.",
"styles": ["professional", "clear"],
"params": {
"rate": 0.9,
"pitch": -0.3
}
}
}
Voice Parameters
- ref_audio: Path to reference WAV file (3-10 seconds, clear speech)
- ref_text: Exact transcript of the reference audio
- styles: Descriptive tags (friendly, professional, calm, energetic)
- rate: Speech speed multiplier (0.5=slow, 1.0=normal, 2.0=fast)
- pitch: Pitch adjustment in semitones (negative=lower, positive=higher)
Usage Scenarios
Automation Announcements
Start YAP alongside your automation agent:
# Terminal 1: Start YAP
uv run src/yap.py
# Terminal 2: Run automation with voice (audio is enabled by default)
uv run src/agent.py --task "Fill out contact form"
The agent can announce progress:
/yap Starting automation task: Fill out contact form
/yap Navigating to the website...
/yap Form submitted successfully!
Interactive Sessions
Use YAP for live interaction during manual control:
# Terminal 1: Start YAP
uv run src/yap.py
# Terminal 2: Manual control
python src/manual.py
Then control both automation and voice through chat:
!click 100 200
/yap Clicked on the submit button
!type "Hello World"
/yap Entered text in the field
Multi-Voice Conversations
Set up different voices for different purposes:
# Set up voices
/yap:voice add --spk system --ref ./voices/system.wav --ref-text "System notification" --rate 0.9
/yap:voice add --spk user --ref ./voices/user.wav --ref-text "User interaction" --rate 1.1
# Use in conversation
/yap:voice set --spk system
/yap System initialization complete.
/yap:voice set --spk user
/yap Thank you for the update!
Troubleshooting
No Audio Output
Check browser audio permissions:
- Click the browser's address bar lock icon
- Ensure "Sound" is allowed
- Check browser volume settings
Verify WebRTC connection:
- Open browser developer tools (F12)
- Go to Console tab
- Look for WebRTC connection messages
- Check for audio stream indicators
Test connection:
uv run src/yap.py --healthcheck
Poor Audio Quality
Check reference audio:
- Use high-quality WAV files (16-bit, 22kHz+)
- 3-10 second samples with clear speech
- No background noise or music
- Single speaker only
Adjust processing:
# Increase overlap for smoother transitions
export YAP_OVERLAP_MS=50
# Reduce chunk size for lower latency
export YAP_MAX_CHARS=250
High Latency
Optimize for speed:
# Increase parallel workers
export YAP_PARALLEL=4
# Reduce chunk size
export YAP_MAX_CHARS=200
# Reduce buffer size
export YAP_JITTER_MAX_SEC=3.0
Check GPU usage:
# Monitor GPU usage while YAP is running
nvidia-smi -l 1
Connection Issues
WebSocket connection failed:
# Test WebSocket endpoint
curl -i -N -H "Connection: Upgrade" -H "Upgrade: websocket" \
http://localhost:8080/api/ws
Authentication failed:
# Test REST login
curl -X POST http://localhost:8080/api/login \
-H "Content-Type: application/json" \
-d '{"username":"user","password":"password"}'
Check firewall/network:
- Ensure ports 8080 (HTTP) and WebRTC ports are accessible
- Test with STUN server connectivity
- Check for corporate proxy/firewall blocking WebRTC
Debug Mode
Enable detailed logging:
export YAP_LOGLEVEL=DEBUG
export YAP_LOG_FORMAT=json
uv run src/yap.py 2>&1 | jq .
Look for:
- WebSocket connection status
- WebRTC negotiation progress
- TTS processing times
- Audio buffer status
Advanced Usage
Custom Voice Training
For better voice quality, record multiple reference samples:
-
Record varied samples:
# Different emotions/styles voices/alice-happy.wav voices/alice-serious.wav voices/alice-excited.wav
-
Test different samples:
/yap:voice set --spk alice-happy /yap I'm so excited about this automation! /yap:voice set --spk alice-serious /yap Please review these results carefully.
Integration with Automation
Modify automation scripts to include voice feedback:
# In your automation script
import requests
def announce(text):
"""Send voice announcement to YAP"""
requests.post(f"{neko_url}/api/chat", json={
"message": f"/yap {text}"
})
# Use in automation
announce("Starting login process")
agent.click_login_button()
announce("Login successful, proceeding to dashboard")
Docker Deployment
Deploy YAP as a container service:
FROM python:3.9-slim
# Install system dependencies
RUN apt-get update && apt-get install -y ffmpeg
# Install Python dependencies
COPY requirements.txt .
RUN pip install -r requirements.txt
# Copy application and voices
COPY src/ /app/src/
COPY voices/ /app/voices/
WORKDIR /app
# Configuration
ENV YAP_VOICES_DIR=/app/voices
ENV YAP_SR=48000
ENV YAP_PARALLEL=2
ENTRYPOINT ["python", "src/yap.py"]
Next Steps
- Learn about Training Data Capture to improve voice models
- Explore Core Agent for automation integration
- Read TTS Service Technical Details for advanced configuration
- Check Neko Integration for server setup options
Related Guides
- Getting Started - Initial system setup
- Training Data Capture - Data collection for improvements
- Manual Control CLI - Interactive testing
- Architecture Overview - System design