Samtal Samtal
Contact us
API Key Using built-in demo key
Text to speak
Voice
chevron_right Voice design (override selected voice)
Quality
20
FastHigh quality
chevron_right Advanced settings
StableVaried
GreedyCreative
SlowFast
Speech-to-Text
chevron_right Round-trip test

TTS then ASR — tests full pipeline.

Voice Cloning

Record a short sample, then generate speech in your cloned voice.

Step 1 — Record reference

Speak naturally for 5-15 seconds. For inspiration:

  • The quick brown fox jumps over the lazy dog near the riverbank on a warm summer afternoon.
  • I believe that technology, when used thoughtfully, can make the world a better place for everyone.
  • Den lilla röda stugan vid sjön är ett av mina finaste barndomsminnen från sommaren.
  • Jag tror att framtiden tillhör dem som vågar tänka stort och arbeta tillsammans med andra.
Step 2 — Verify transcript

Auto-filled from your recording. Edit if needed.

Step 3 — Generate

API Documentation

Drop-in replacement for ElevenLabs. Use any ElevenLabs SDK — just change the base URL.

Quick Start
  1. Get an API key (purchase Lab Access or contact sales@moln.ai)
  2. Set base_url to https://samtal.moln.ai
  3. Use your API key with the xi-api-key header
Legend & Categories
GET read POST create / action PATCH update DELETE remove WSS WebSocket
Text-to-Speech Speech-to-Text Voices Knowledge Base Agents Voice Conversations (WSS) MCP Skills
Code examples in:
Text-to-Speech
POST /v1/text-to-speech/{voice_id}
POST /v1/text-to-speech/{voice_id}/stream
POST /v1/text-to-speechSamtal extension: auto-detects language and picks a matching voice. Same effect as voice_id=auto.

Params: output_format, num_steps, speed, p_temperature, c_temperature, include_word_timestamps, style

Body: text, language_code (2- or 3-letter, e.g. sv/swe), model_id, style, speed. Use voice_id="auto" to auto-detect language from text.

style and speed can be passed either as query parameters (?style=...&speed=1.2) or inside the JSON body. Body fields take precedence when both are set.

style options: conversational_short (2–3 sentences), conversational_long (5–10), conversational_verbose (~80% of details). Spells out numbers/dates as words for natural speech. The rewritten text is exposed via the xi-style and xi-spoken-text-b64 response headers.

Speech-to-Text
POST /v1/speech-to-text

Accepts: .wav, .mp3, .m4a, .mp4, .webm, .ogg, .opus, .flac, .aac (transcoded server-side via ffmpeg).

Returns: text, language_code, language_probabilities (top languages with confidence 0–1), words[] with start/end timestamps. Supports 25 European languages.

Voices
GET /v1/voices
GET /v1/voices/{voice_id}

49 premade voices across European, Slavic, Asian, and Western languages (TTS supports all; ASR supports 25). Reference voices by ID like spectra-sv-default. Voice-cloning reference audio ships with the API — clients only need the voice ID. Custom cloned/designed voices are also supported.

Knowledge Base

ElevenLabs-compatible: create-from-file · create-from-text · create-from-url. Use the elevenlabs SDK or call the routes directly.

POST /v1/convai/knowledge-base/file — upload file (multipart: file, optional name)
POST /v1/convai/knowledge-base/text — create from text (JSON: {text, name})
POST /v1/convai/knowledge-base/url — create from URL (JSON: {url, name})
GET /v1/convai/knowledge-base — list documents
GET /v1/convai/knowledge-base/{document_id} — get document
DELETE /v1/convai/knowledge-base/{document_id}
POST /v1/convai/knowledge-base/foldersSamtal extension: folder organization
Conversational Agents

ElevenLabs-compatible: accepts both the native flat config and the wrapped conversation_config shape.

POST /v1/convai/agents/create
GET /v1/convai/agents
GET /v1/convai/agents/{agent_id}
GET /v1/convai/agents/{agent_id}/versionsSamtal extension: versioning
PATCH /v1/convai/agents/{agent_id}
DELETE /v1/convai/agents/{agent_id}

Config: name, system_prompt, first_message, language, tts, asr, turn, conversation, knowledge_base_ids, mcp_server_ids, custom_llm.

Voice Conversations (WebSocket)
GET /v1/convai/conversation/get-signed-url
WSS wss://samtal.moln.ai/v1/convai/conversation?agent_id=...&token=...

Events: user_audio_chunk, user_text, user_transcript, agent_response, audio, interruption, agent_response_metadata, ping/pong

MCP Skills
POST /v1/convai/skills — add MCP server
POST /v1/convai/skills/{id}/test — discover tools
POST /v1/convai/skills/{id}/call — call tool

Knowledge Base

Upload documents for RAG-powered voice agents

cloud_upload

Drag & drop a document here, or click to browse

PDF, DOCX, TXT, MD, HTML, CSV, XLSX

Query Knowledge Base

MCP Skills

Connect external tools via Model Context Protocol

Custom Voices

Clone voices from audio or design new ones

mic Clone from Audio

Upload or record 5-15 seconds of speech. The audio will be trimmed and transcribed automatically.

tune Design a Voice

Create a voice by describing characteristics — no audio needed.

Agentic Dialogues

Create voice agents with knowledge & skills

Samtal

Lab Access

Fixed-price entry for development & experimentation on samtal.moln.ai

990 SEK
incl. 25% VAT (792 SEK excl.) · one-time
  • check_circle Personal API key with full access
  • check_circle All 13 voices + voice cloning
  • check_circle TTS & ASR endpoints (ElevenLabs-compatible)
  • check_circle Development & experimentation use
  • check_circle Key delivered to your email instantly

For B2B invoicing or private hosting plans, contact sales@moln.ai

Connecting...
memory --