chevron_right Voice design (override selected voice)
chevron_right Advanced settings
chevron_right Round-trip test
TTS then ASR — tests full pipeline.
Voice Cloning
Record a short sample, then generate speech in your cloned voice.
Speak naturally for 5-15 seconds. For inspiration:
- The quick brown fox jumps over the lazy dog near the riverbank on a warm summer afternoon.
- I believe that technology, when used thoughtfully, can make the world a better place for everyone.
- Den lilla röda stugan vid sjön är ett av mina finaste barndomsminnen från sommaren.
- Jag tror att framtiden tillhör dem som vågar tänka stort och arbeta tillsammans med andra.
Auto-filled from your recording. Edit if needed.
API Documentation
Drop-in replacement for ElevenLabs. Use any ElevenLabs SDK — just change the base URL.
- Get an API key (purchase Lab Access or contact sales@moln.ai)
- Set
base_urltohttps://samtal.moln.ai - Use your API key with the
xi-api-keyheader
voice_id=auto.Params: output_format, num_steps, speed, p_temperature, c_temperature, include_word_timestamps, style
Body: text, language_code (2- or 3-letter, e.g. sv/swe), model_id, style, speed. Use voice_id="auto" to auto-detect language from text.
style and speed can be passed either as query parameters (?style=...&speed=1.2) or inside the JSON body. Body fields take precedence when both are set.
style options: conversational_short (2–3 sentences), conversational_long (5–10), conversational_verbose (~80% of details). Spells out numbers/dates as words for natural speech. The rewritten text is exposed via the xi-style and xi-spoken-text-b64 response headers.
Accepts: .wav, .mp3, .m4a, .mp4, .webm, .ogg, .opus, .flac, .aac (transcoded server-side via ffmpeg).
Returns: text, language_code, language_probabilities (top languages with confidence 0–1), words[] with start/end timestamps. Supports 25 European languages.
49 premade voices across European, Slavic, Asian, and Western languages (TTS supports all; ASR supports 25). Reference voices by ID like spectra-sv-default. Voice-cloning reference audio ships with the API — clients only need the voice ID. Custom cloned/designed voices are also supported.
ElevenLabs-compatible: create-from-file · create-from-text · create-from-url. Use the elevenlabs SDK or call the routes directly.
file, optional name){text, name}){url, name})
ElevenLabs-compatible: accepts both the native flat config and the wrapped conversation_config shape.
Config: name, system_prompt, first_message, language, tts, asr, turn, conversation, knowledge_base_ids, mcp_server_ids, custom_llm.
Events: user_audio_chunk, user_text, user_transcript, agent_response, audio, interruption, agent_response_metadata, ping/pong
Knowledge Base
Upload documents for RAG-powered voice agents
Drag & drop a document here, or click to browse
PDF, DOCX, TXT, MD, HTML, CSV, XLSX
MCP Skills
Connect external tools via Model Context Protocol
Custom Voices
Clone voices from audio or design new ones
Upload or record 5-15 seconds of speech. The audio will be trimmed and transcribed automatically.
Create a voice by describing characteristics — no audio needed.
Agentic Dialogues
Create voice agents with knowledge & skills
Lab Access
Fixed-price entry for development & experimentation on samtal.moln.ai
- check_circle Personal API key with full access
- check_circle All 13 voices + voice cloning
- check_circle TTS & ASR endpoints (ElevenLabs-compatible)
- check_circle Development & experimentation use
- check_circle Key delivered to your email instantly
For B2B invoicing or private hosting plans, contact sales@moln.ai