Nix Flake Development Environment
The Neko Agent project uses a sophisticated Nix flake to provide reproducible, cross-platform development environments with specialized configurations for different use cases. This document provides comprehensive documentation of all flake features, development shells, and usage patterns.
Overview
The flake (flake.nix) is designed around multiple specialized development environments that cater to different aspects of the project:
- AI/ML Development - GPU-accelerated environments with CUDA support
- Documentation - Publishing and development of project documentation
- Container Operations - Docker and Neko server management
- Performance Optimization - CPU-optimized builds with architecture-specific flags
- TEE Deployment - Trusted Execution Environment deployment with attestation
- Registry Management - Multi-registry container deployment support
- Cross-Platform Support - Works on x86_64-Linux and aarch64-Darwin (Apple Silicon)
graph TB
subgraph "Nix Flake Architecture"
Flake[flake.nix]
Inputs[External Inputs]
Overlays[Custom Overlays]
Shells[Development Shells]
Packages[Docker Images]
Apps[Utility Apps]
end
subgraph "External Dependencies"
Nixpkgs[nixpkgs/nixos-unstable]
MLPkgs[nixvital/ml-pkgs]
end
subgraph "Custom Overlays"
WebRTC[WebRTC Stack]
ML[ML Libraries]
Audio[Audio Processing]
Optimization[CPU Optimization]
end
subgraph "Development Shells"
Default[default]
GPU[gpu]
AI[ai]
Neko[neko]
Docs[docs]
CPUOpt[cpu-opt]
GPUOpt[gpu-opt]
end
Flake --> Inputs
Flake --> Overlays
Flake --> Shells
Flake --> Packages
Flake --> Apps
Inputs --> Nixpkgs
Inputs --> MLPkgs
Overlays --> WebRTC
Overlays --> ML
Overlays --> Audio
Overlays --> Optimization
Shells --> Default
Shells --> GPU
Shells --> AI
Shells --> Neko
Shells --> Docs
Shells --> CPUOpt
Shells --> GPUOpt
Shells --> TEE
Flake Inputs
External Dependencies
inputs = {
nixpkgs.url = "github:NixOS/nixpkgs/nixos-unstable";
ml-pkgs.url = "github:nixvital/ml-pkgs";
};
| Input | Source | Purpose |
|---|---|---|
nixpkgs | nixos-unstable | Latest packages and system libraries |
ml-pkgs | nixvital/ml-pkgs | Specialized ML/AI packages (PyTorch, CUDA) |
Why nixos-unstable?
- Latest packages - Access to newest versions of AI/ML libraries
- CUDA support - Most recent NVIDIA driver and toolkit support (CUDA 12.8)
- Python ecosystem - Up-to-date Python packages for transformers and WebRTC
- Security updates - Timely security patches for all dependencies
Build Metadata and Reproducibility
The flake includes comprehensive build metadata for reproducible builds and attestation:
buildInfo = rec {
timestamp = "${year}-${month}-${day}T${hour}:${minute}:${second}Z";
revision = self.rev or self.dirtyRev or "unknown";
shortRev = builtins.substring 0 8 revision;
version = if (self ? rev) then shortRev else "${shortRev}-dirty";
nixpkgsRev = nixpkgs.rev or "unknown";
imageMetadata = {
"org.opencontainers.image.title" = "Neko Agent";
"org.opencontainers.image.created" = timestamp;
"org.opencontainers.image.revision" = revision;
"dev.neko.build.reproducible" = "true";
};
};
Custom Overlay System
The flake uses a comprehensive overlay system to provide packages not available in standard Nixpkgs:
WebRTC and Media Stack
nekoOverlays = [
(import ./overlays/pylibsrtp.nix) # Secure RTP protocol
(import ./overlays/aioice.nix) # Async ICE implementation
(import ./overlays/aiortc.nix) # WebRTC for Python
# ... more overlays
];
| Overlay | Package | Purpose |
|---|---|---|
pylibsrtp.nix | pylibsrtp | Secure Real-time Transport Protocol for WebRTC |
aioice.nix | aioice | Asynchronous ICE (Interactive Connectivity Establishment) |
aiortc.nix | aiortc | WebRTC implementation for Python with media support |
AI/ML and Audio Processing
| Overlay | Package | Purpose |
|---|---|---|
streaming.nix | streaming | MosaicML Streaming for training data |
f5-tts.nix | f5-tts | F5-TTS voice synthesis model |
vocos.nix | vocos | Neural vocoder for audio generation |
ema-pytorch.nix | ema-pytorch | Exponential Moving Average for PyTorch |
transformers-stream-generator.nix | transformers-stream-generator | Streaming text generation |
bitsandbytes.nix | bitsandbytes | 8-bit optimizers for PyTorch |
Pi-Zero PyTorch Dependencies
The flake includes comprehensive packaging for pi-zero-pytorch and its dependencies:
| Overlay | Package | Purpose |
|---|---|---|
pi-zero-pytorch/pi-zero-pytorch.nix | pi-zero-pytorch | Main π0 implementation in PyTorch |
pi-zero-pytorch/einx.nix | einx | Universal tensor operations with Einstein notation |
pi-zero-pytorch/x-transformers.nix | x-transformers | Transformer architectures library |
pi-zero-pytorch/rotary-embedding-torch.nix | rotary-embedding-torch | Rotary positional embeddings |
pi-zero-pytorch/accelerated-scan.nix | accelerated-scan | Accelerated scan operations |
pi-zero-pytorch/bidirectional-cross-attention.nix | bidirectional-cross-attention | Cross-attention mechanisms |
pi-zero-pytorch/hl-gauss-pytorch.nix | hl-gauss-pytorch | Gaussian operations for ML |
pi-zero-pytorch/evolutionary-policy-optimization.nix | evolutionary-policy-optimization | Evolution strategies |
Performance Optimization
| Overlay | Package | Purpose |
|---|---|---|
cached-path.nix | cached-path | Efficient file caching utilities |
znver2-flags.nix | nekoZnver2Env | AMD Zen2 CPU optimization flags |
vmm-cli.nix | vmm-cli | Virtual machine management CLI |
Example Znver2 Optimization:
# Generated environment variables for AMD Zen2 CPUs
export NIX_CFLAGS_COMPILE="-O3 -pipe -march=znver2 -mtune=znver2 -fno-plt"
export RUSTFLAGS="-C target-cpu=znver2 -C target-feature=+sse2,+sse4.2,+avx,+avx2,+fma,+bmi1,+bmi2"
External ML Packages
ml-pkgs.overlays.torch-family # Provides torch-bin, torchvision-bin, etc.
Benefits:
- Pre-compiled binaries - Faster setup without compilation
- CUDA integration - Proper CUDA toolkit linkage
- Consistent versions - Matching PyTorch ecosystem versions
Development Shells
1. Default Shell (default)
Purpose: Basic Python development with CPU-only PyTorch.
Usage:
nix develop
# or
nix develop .#default
Includes:
- Python Environment: PyTorch CPU, Transformers, WebRTC stack
- System Tools: FFmpeg, Git, Curl, Just, pkg-config
- Node.js Ecosystem: Node 20, NPM for AI tools
- AI CLI Tools: OpenAI Codex, Anthropic Claude Code (auto-installed)
Python Packages:
# Core ML/AI
transformers
torch (CPU)
torchvision
pillow
accelerate
# WebRTC and networking
websockets
av (PyAV for video processing)
pylibsrtp
aioice
aiortc
# Data and streaming
streaming (MosaicML)
f5-tts
numpy
scipy
zstandard
xxhash
tqdm
# Monitoring
prometheus-client
When to Use:
- Initial project setup and exploration
- Development on systems without NVIDIA GPUs
- Testing compatibility with CPU-only environments
- CI/CD pipelines where GPU access is unavailable
2. GPU Shell (gpu)
Purpose: GPU-accelerated development with CUDA 12.8 support.
Usage:
nix develop .#gpu
NVIDIA hosts: When running outside NixOS you will typically need
nixGLto expose the system GPU. Use:NIXPKGS_ALLOW_UNFREE=1 nix run --impure github:nix-community/nixGL#nixGLNvidia -- nix develop .#gpuThis wraps the GPU shell with the right OpenGL/EGL libraries from the host driver.
Additional Features over Default:
- CUDA Toolkit 12.8 - Complete CUDA development environment
- cuDNN and NCCL - Optimized neural network and communication libraries
- GPU-enabled PyTorch - Tensor operations on NVIDIA GPUs
- Environment Variables - Automatic CUDA path and library configuration
CUDA Environment Setup:
# Automatically configured
export CUDA_HOME=/nix/store/.../cuda-12.8
export CUDA_PATH=$CUDA_HOME
export PATH=$CUDA_HOME/bin:$PATH
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$LD_LIBRARY_PATH
# GPU control
export NVIDIA_VISIBLE_DEVICES=all
export NVIDIA_DRIVER_CAPABILITIES=compute,utility
export CUDA_MODULE_LOADING=LAZY
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
Verification Commands:
# Check CUDA installation
nvidia-smi
nvcc --version
# Test PyTorch GPU support
python -c "import torch; print(f'CUDA available: {torch.cuda.is_available()}')"
python -c "import torch; print(f'CUDA devices: {torch.cuda.device_count()}')"
When to Use:
- AI model inference and training
- GPU-accelerated image/video processing
- Development requiring CUDA libraries
- Performance-critical workloads
3. AI Shell (ai)
Purpose: Lightweight environment focused on AI development tools.
Usage:
nix develop .#ai
Includes:
- Core System Tools - FFmpeg, Git, networking utilities
- Node.js Environment - Node 20, NPM
- AI CLI Tools - Automatic installation of OpenAI and Anthropic CLIs
- Minimal Footprint - No heavy ML libraries, faster startup
AI Tools Installed:
# OpenAI Codex CLI
npm install -g @openai/codex
# Anthropic Claude Code CLI
npm install -g @anthropic-ai/claude-code
Environment Setup:
# NPM global packages in project directory
export NPM_CONFIG_PREFIX=$PWD/.npm-global
export PATH=$NPM_CONFIG_PREFIX/bin:$PATH
When to Use:
- AI-assisted development workflows
- Code generation and review tasks
- Integration with AI development services
- Quick environment for AI tool testing
4. Neko Shell (neko)
Purpose: Container and Neko server management.
Usage:
nix develop .#neko
Container Stack:
- Colima - Lightweight Docker runtime for macOS/Linux
- Docker & Docker Compose - Container orchestration
- Docker Buildx - Multi-platform image building
- Networking Tools - curl, jq for API interaction
Custom Scripts:
# Neko service management script
neko-services up # Start Neko server
neko-services down # Stop services
neko-services logs # View container logs
neko-services status # Check service status
neko-services restart # Restart services
neko-services update # Pull latest images and restart
Colima Configuration:
# Automatically configured VM
colima start --vm-type vz --cpu 2 --memory 4 \
--mount-type sshfs --mount "~:w"
Docker Environment:
# Automatic Docker socket configuration
export DOCKER_HOST="unix://$HOME/.colima/default/docker.sock"
When to Use:
- Neko server development and testing
- Container image building and deployment
- Docker-based development workflows
- Local testing of production deployments
5. Documentation Shell (docs)
Purpose: Documentation development, building, and publishing.
Usage:
nix develop .#docs
Documentation Stack:
- mdBook - Rust-based documentation generator
- mdBook Extensions:
mdbook-mermaid- Diagram supportmdbook-linkcheck- Link validationmdbook-toc- Table of contents generation
- Sphinx - Python documentation with reStructuredText support
- Node.js - For additional tooling and preprocessing
Python Documentation Tools:
sphinx # Documentation generator
sphinx-rtd-theme # Read the Docs theme
myst-parser # Markdown support for Sphinx
sphinxcontrib-mermaid # Mermaid diagrams in Sphinx
Available Commands:
# From inside docs/
mdbook serve --open # Development server with live reload
mdbook build # Build static documentation
mdbook test # Test code examples and links
# Sphinx alternative
sphinx-build -b html source build/
When to Use:
- Writing and editing project documentation
- Building documentation for deployment
- Testing documentation changes locally
- Contributing to API reference and guides
6. CPU-Optimized Shell (cpu-opt)
Purpose: Performance-optimized CPU development.
Usage:
nix develop .#cpu-opt
Optimization Features:
- Architecture-Specific Compilation - Znver2 flags for AMD CPUs
- Optimized Python Environment - Performance-tuned package builds
- Compiler Optimizations - -O3, -march=znver2, -mtune=znver2
Generated Optimization Flags (Linux only):
# Compiler flags
export NIX_CFLAGS_COMPILE="-O3 -pipe -march=znver2 -mtune=znver2 -fno-plt"
# Rust flags
export RUSTFLAGS="-C target-cpu=znver2 -C target-feature=+sse2,+sse4.2,+avx,+avx2,+fma,+bmi1,+bmi2 -C link-arg=-Wl,-O1 -C link-arg=--as-needed"
When to Use:
- Performance-critical CPU workloads
- Benchmarking and optimization work
- Production builds targeting specific CPU architectures
- Environments where every bit of CPU performance matters
7. GPU-Optimized Shell (gpu-opt)
Purpose: Maximum performance GPU development with optimizations.
Usage:
nix develop .#gpu-opt
Combined Optimizations:
- All GPU features - CUDA 12.8, cuDNN, NCCL
- CPU optimizations - Znver2 flags for host code
- PyTorch optimizations - Optimized builds with CPU and GPU acceleration
- Memory optimizations - Advanced CUDA memory management
GPU-Specific Optimizations:
# Target specific GPU architecture (configurable)
export TORCH_CUDA_ARCH_LIST=8.6 # RTX 30xx series
# Memory allocation strategy
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
Performance Verification:
# Check optimizations are active
echo $NIX_CFLAGS_COMPILE # Should show znver2 flags
echo $TORCH_CUDA_ARCH_LIST # Should show target GPU architecture
# Benchmark performance
python -c "
import torch
import time
x = torch.randn(1000, 1000, device='cuda')
start = time.time()
torch.mm(x, x)
print(f'GPU matrix multiply: {time.time() - start:.4f}s')
"
When to Use:
- Maximum performance AI inference
- GPU-accelerated training workloads
- Performance benchmarking and optimization
- Production deployments requiring peak performance
8. TEE Shell (tee)
Purpose: Trusted Execution Environment deployment and attestation.
Usage:
nix develop .#tee
TEE Deployment Stack:
- Phala Cloud CLI - Modern CLI for TEE deployments
- Legacy VMM CLI - Compatible with older dstack systems
- Docker & Docker Compose - Container orchestration
- Bun Runtime - Fast JavaScript runtime
- Reproducible Image Builder - Attestation-ready container building
Available Commands:
# Modern Phala CLI
phala auth login <api-key> # Authenticate with Phala Cloud
phala status # Check authentication status
phala cvms list # List Confidential VMs
phala nodes # List available TEE nodes
# Legacy VMM CLI (if needed)
vmm-cli lsvm # List virtual machines
vmm-cli lsimage # List available images
vmm-cli lsgpu # List available GPUs
# Reproducible builds
nix run .#build-images # Build reproducible images
nix run .#deploy-to-tee # Deploy with attestation metadata
nix run .#verify-attestation # Verify TEE attestation
Multi-Registry Support:
# Deploy to ttl.sh (ephemeral registry)
NEKO_REGISTRY=ttl.sh NEKO_TTL=1h nix run .#deploy-to-tee
nix run .#deploy-to-ttl 24h
# Deploy to GitHub Container Registry
NEKO_REGISTRY=ghcr.io/your-org nix run .#deploy-to-tee
# Deploy to Docker Hub
NEKO_REGISTRY=docker.io/your-org nix run .#deploy-to-tee
# Deploy to local registry
NEKO_REGISTRY=localhost:5000/neko nix run .#deploy-to-tee
When to Use:
- Deploying to Trusted Execution Environments
- Creating attestable, reproducible deployments
- Multi-registry container management
- TEE-based inference deployments
- Confidential computing workloads
Docker Images and Packages
The flake builds optimized Docker images for production deployment:
Available Images
The flake now builds multiple specialized images for different components:
# Build all images
nix run .#build-images
# Agent images
nix build .#neko-agent-docker-generic
nix build .#neko-agent-docker-opt
# Capture images
nix build .#neko-capture-docker-generic
nix build .#neko-capture-docker-opt
# YAP (TTS) images
nix build .#neko-yap-docker-generic
nix build .#neko-yap-docker-opt
# Train images
nix build .#neko-train-docker-generic
nix build .#neko-train-docker-opt
1. Generic CUDA Image (neko-agent-docker-generic)
Target: neko-agent:cuda12.8-generic
Features:
- Portable CUDA - Includes PTX for forward compatibility
- CUDA 12.8 - Full toolkit and libraries
- Python Environment - All dependencies with torch-bin
- Broad GPU Support - Works on any CUDA 8.6+ GPU
Configuration:
# Environment variables
CUDA_HOME=/nix/store/.../cuda-12.8
LD_LIBRARY_PATH=$CUDA_HOME/lib64:$CUDA_HOME/lib
CUDA_MODULE_LOADING=LAZY
TORCH_CUDA_ARCH_LIST=8.6+PTX # Forward compatibility
Use Cases:
- Multi-GPU deployment environments
- Cloud platforms with varying GPU types
- Development and testing across different hardware
2. Optimized Image (neko-agent-docker-opt)
Target: neko-agent:cuda12.8-sm86-v3
Features:
- Specific GPU targeting - Optimized for RTX 30xx series (sm_86)
- CPU optimizations - Znver2 architecture flags
- Smaller size - No PTX, specific architecture only
- Maximum performance - All available optimizations enabled
Configuration:
# Optimized environment
TORCH_CUDA_ARCH_LIST=8.6 # Specific architecture only
NIX_CFLAGS_COMPILE="-O3 -pipe -march=znver2 -mtune=znver2 -fno-plt"
RUSTFLAGS="-C target-cpu=znver2 ..." # Rust optimizations
Use Cases:
- Production deployments with known hardware
- Performance-critical applications
- Cost-optimized cloud instances
Image Building System
# Helper function for consistent container structure
mkRoot = paths: pkgs.buildEnv {
name = "image-root";
inherit paths;
pathsToLink = [ "/bin" ];
};
# Generic image build
neko-agent-docker-generic = pkgs.dockerTools.buildImage {
name = "neko-agent:cuda12.8-generic";
created = "now";
copyToRoot = mkRoot ([
runnerGeneric
pyEnvGeneric
cuda.cudatoolkit
cuda.cudnn
cuda.nccl
pkgs.bashInteractive
] ++ commonSystemPackages);
config = {
Env = baseEnv ++ [
"CUDA_HOME=${cuda.cudatoolkit}"
"LD_LIBRARY_PATH=${cuda.cudatoolkit}/lib64:${cuda.cudnn}/lib"
"TORCH_CUDA_ARCH_LIST=8.6+PTX"
];
WorkingDir = "/workspace";
Entrypoint = [ "/bin/neko-agent" ];
};
};
Utility Apps
The flake provides comprehensive utility applications for common tasks:
Documentation Apps
# Build documentation
nix run .#docs-build
# Serve documentation with live reload
nix run .#docs-serve
# Check documentation for issues
nix run .#docs-check
Build and Deployment Apps
# Build all Docker images with attestation metadata
nix run .#build-images
# TEE deployment with multi-registry support
nix run .#deploy-to-tee
nix run .#deploy-to-ttl 24h # Quick ttl.sh deployment
nix run .#push-to-ttl 1h # Just push to ttl.sh
# Attestation verification
nix run .#verify-attestation <app-id> <expected-hash>
Container Registry Apps
# Local registry management
nix run .#start-registry # HTTP registry with auth
nix run .#start-registry-https # HTTPS with Tailscale certs
nix run .#stop-registry
# Public exposure
nix run .#start-tailscale-funnel # Expose via Tailscale Funnel
nix run .#start-cloudflare-tunnel # Expose via Cloudflare Tunnel
Registry Configuration Examples:
# Environment variables for registry customization
NEKO_REGISTRY_PORT=5000
NEKO_REGISTRY_USER=neko
NEKO_REGISTRY_PASSWORD=pushme
NEKO_REGISTRY_DATA_DIR=$PWD/registry-data
NEKO_REGISTRY_AUTH_DIR=$PWD/auth
NEKO_REGISTRY_CERTS_DIR=$PWD/certs
# Tailscale Funnel setup
NEKO_REGISTRY=your-device.tail-scale.ts.net/neko
# Cloudflare Tunnel setup
NEKO_CF_TUNNEL_NAME=neko-registry
NEKO_CF_HOSTNAME=registry.example.com
Common Development Workflows
Initial Setup
# Clone repository
git clone <repo-url>
cd neko-agent
# Enter development environment
nix develop .#gpu # or .#default for CPU-only
# Verify setup
python -c "import torch; print(torch.cuda.is_available())"
AI Development Workflow
# 1. Enter GPU environment
nix develop .#gpu
# 2. Load environment variables (if .env exists)
# Automatically loaded by shell hook
# 3. Test model loading
python -c "
from transformers import Qwen2VLForConditionalGeneration
model = Qwen2VLForConditionalGeneration.from_pretrained('showlab/ShowUI-2B')
print('Model loaded successfully')
"
# 4. Run agent
uv run src/agent.py --task "Navigate to google.com"
Documentation Development
# 1. Enter docs environment
nix develop .#docs
# 2. Start development server
nix run .#docs-serve
# Opens browser to http://localhost:3000
# 3. Edit files in docs/src/
# Changes automatically reload in browser
# 4. Build for deployment
nix run .#docs-build
Container Development
# 1. Enter container environment
nix develop .#neko
# 2. Start Neko server
neko-services up
# 3. Check status
neko-services status
# 4. View logs
neko-services logs neko
# 5. Test connection
curl http://localhost:8080/health
Performance Optimization
# 1. Use optimized environment
nix develop .#gpu-opt
# 2. Verify optimizations
echo $NIX_CFLAGS_COMPILE
echo $TORCH_CUDA_ARCH_LIST
# 3. Run performance benchmarks
python benchmarks/inference_speed.py
# 4. Build optimized container
nix build .#neko-agent-docker-opt
TEE Deployment Workflow
# 1. Enter TEE environment
nix develop .#tee
# 2. Build reproducible images
nix run .#build-images
# 3. Deploy to TEE (with registry choice)
# Option A: Use ttl.sh for testing
nix run .#deploy-to-ttl 1h
# Option B: Use GitHub Container Registry
NEKO_REGISTRY=ghcr.io/your-org nix run .#deploy-to-tee
# Option C: Use local registry (start it first)
nix run .#start-registry # In another terminal
NEKO_REGISTRY=localhost:5000/neko nix run .#deploy-to-tee
# 4. Verify attestation (inside TEE)
nix run .#verify-attestation <app-id> <compose-hash>
# 5. Check deployment status
phala cvms list # Modern CLI
# or
vmm-cli lsvm # Legacy CLI
Multi-Registry Development
# Setup local registry for testing
nix run .#start-registry
# Push images to multiple registries
docker tag neko-agent:latest localhost:5000/neko/agent:v1
docker push localhost:5000/neko/agent:v1
# Use Tailscale for team access
nix run .#start-tailscale-funnel
# Use Cloudflare for public access
nix run .#start-cloudflare-tunnel
Environment Variables and Configuration
Automatic .env Loading
All development shells automatically load .env files:
# .env file example
NEKO_WS=ws://localhost:8080/api/ws
NEKO_LOGLEVEL=DEBUG
CUDA_VISIBLE_DEVICES=0
TORCH_CUDA_ARCH_LIST=8.6
Common Environment Variables
| Variable | Purpose | Default | Set By |
|---|---|---|---|
CUDA_HOME | CUDA installation path | Auto-detected | GPU shells |
CUDA_VISIBLE_DEVICES | GPU selection | all | User configurable |
PYTORCH_CUDA_ALLOC_CONF | Memory strategy | expandable_segments:True | GPU shells |
NPM_CONFIG_PREFIX | NPM global location | $PWD/.npm-global | All shells |
NIX_CFLAGS_COMPILE | Compiler optimizations | Znver2 flags | Optimized shells |
Shell-Specific Variables
GPU Shells:
export CUDA_MODULE_LOADING=LAZY
export NVIDIA_DRIVER_CAPABILITIES=compute,utility
export LD_LIBRARY_PATH=$CUDA_HOME/lib64:$CUDA_HOME/lib
Documentation Shell:
# No specific variables, uses standard tool defaults
Container Shell:
export DOCKER_HOST="unix://$HOME/.colima/default/docker.sock"
Cross-Platform Support
Supported Systems
supportedSystems = [ "x86_64-linux" "aarch64-darwin" ];
Platform-Specific Features
x86_64-Linux:
- Full GPU support - NVIDIA CUDA, Docker GPU passthrough
- CPU optimizations - Znver2, Intel architecture targeting
- Container building - Docker images with CUDA support
aarch64-Darwin (Apple Silicon):
- Metal Performance Shaders - GPU acceleration via MPS
- Rosetta compatibility - x86_64 dependencies when needed
- Native performance - ARM64-optimized packages
Platform Detection
# Conditional features based on platform
${pkgs.lib.optionalString pkgs.stdenv.isLinux ''
source ${znver2File}
echo "[cpu-opt] Using znver2 flags: $NIX_CFLAGS_COMPILE"
''}
Troubleshooting
Common Issues
CUDA Not Detected:
# Check NVIDIA drivers
nvidia-smi
# Verify CUDA environment
echo $CUDA_HOME
echo $LD_LIBRARY_PATH
# Test PyTorch CUDA
python -c "import torch; print(torch.cuda.is_available())"
Solution: Ensure NVIDIA drivers are installed and compatible with CUDA 12.8.
Docker Issues on macOS:
# Check Colima status
colima status
# Restart if needed
colima stop
colima start --vm-type vz --cpu 2 --memory 4
Slow Package Installation:
# Use binary cache
echo "substituters = https://cache.nixos.org https://cuda-maintainers.cachix.org" >> ~/.config/nix/nix.conf
echo "trusted-public-keys = cache.nixos.org-1:6NCHdD59X431o0gWypbMrAURkbJ16ZPMQFGspcDShjY= cuda-maintainers.cachix.org-1:0dq3bujKpuEPiCgBv7/11NEBpCcEKUzZzUNjRgPTOOA=" >> ~/.config/nix/nix.conf
Memory Issues
GPU Memory:
# Monitor GPU memory
nvidia-smi -l 1
# Optimize PyTorch memory
export PYTORCH_CUDA_ALLOC_CONF=max_split_size_mb:128,expandable_segments:True
System Memory:
# Check available memory
free -h
# Monitor during development
htop
Performance Issues
Check Optimizations:
# Verify CPU flags
cat /proc/cpuinfo | grep flags
# Check compiler optimizations
echo $NIX_CFLAGS_COMPILE
# Benchmark inference
python -c "
import torch
import time
device = 'cuda' if torch.cuda.is_available() else 'cpu'
x = torch.randn(1000, 1000, device=device)
start = time.time()
result = torch.mm(x, x)
print(f'{device} time: {time.time() - start:.4f}s')
"
Advanced Usage
Custom Overlays
Create project-specific overlays in overlays/:
# overlays/custom-package.nix
final: prev: {
custom-package = prev.python3Packages.buildPythonPackage {
pname = "custom-package";
version = "1.0.0";
src = prev.fetchFromGitHub {
owner = "owner";
repo = "repo";
rev = "v1.0.0";
sha256 = "...";
};
propagatedBuildInputs = with prev.python3Packages; [
numpy
torch
];
};
}
Custom Development Shells
Add new shells to the flake:
# Add to devShells
experimental = pkgs.mkShell {
buildInputs = commonSystemPackages ++ [
# Custom packages
];
shellHook = ''
echo "Experimental environment loaded"
# Custom setup
'';
};
Environment Specialization
Create environment-specific configurations:
# .env.gpu
CUDA_VISIBLE_DEVICES=0
TORCH_CUDA_ARCH_LIST=8.6
# .env.multi-gpu
CUDA_VISIBLE_DEVICES=0,1,2,3
NCCL_DEBUG=INFO
# Load specific environment
cp .env.gpu .env
nix develop .#gpu
Contributing to the Flake
Adding New Packages
- Create overlay in
overlays/new-package.nix - Add to overlay list in
nekoOverlays - Include in appropriate shells
- Test across platforms
- Update documentation
Testing Changes
# Test specific shell
nix develop .#shell-name --command python -c "import new_package"
# Test all shells
for shell in default gpu ai neko docs cpu-opt gpu-opt; do
echo "Testing $shell..."
nix develop .#$shell --command echo "✓ $shell loads successfully"
done
# Test image builds
nix build .#neko-agent-docker-generic
nix build .#neko-agent-docker-opt
Performance Considerations
- Binary caches - Use Cachix for custom packages
- Layer optimization - Minimize Docker image layers
- Dependency management - Avoid unnecessary dependencies
- Build reproducibility - Pin package versions when needed
This comprehensive flake system provides a robust, reproducible development environment that scales from local development to production deployment while maintaining consistency across different platforms and use cases.