ULTRA Text-to-Speech API

Move beyond standard synthesis. Our High-Definition (HD) Generative Tier offers voices that breathe, pause, and emote naturally.
Use this API from your AI agent via MCP
Works with OpenClaw, Claude Code/Desktop, Cursor, Windsurf, Cline and any MCP-compatible AI client.
Docs & setup
Create a skill by wrapping this MCP: https://mcp.zylalabs.com/mcp?apikey=YOUR_ZYLA_API_KEY

🚀 Core Capabilities

1. Hyper-Real "GenAI" Voices

Move beyond standard synthesis. Our High-Definition (HD) Generative Tier offers voices that breathe, pause, and emote naturally.

  • Context-Aware Delivery: The engine analyzes the text to understand if it should whisper a secret, shout a warning, or deliver news with authority.

  • Natural Disfluencies: Capable of inserting realistic human elements like "ums," "uhs," and breaths for conversational agents that sound genuinely spontaneous.

  • Affective Intelligence: Dynamically adjusts emotional weight (joy, sorrow, urgency) based on the sentiment of your script.

2. Director-Level Style Control

Stop relying on rigid code tags. Control the voice using natural language prompts.

  • Prompt-to-Speech: Simply tell the API: *"Read this like a tired storybook narrator"* or *"Speak this quickly and excitedly like a sports commentator."*

  • Granular Pacing: Fine-tune the rhythm of speech down to the millisecond. Stretch pauses for dramatic effect or speed up specific phrases to mimic fast-paced banter.

3. Multi-Speaker "Dialogue" Engine

Generate complex audio scenes with a single API call.

  • Seamless Turn-Taking: Simulate podcasts, interviews, or customer service roleplays where multiple distinct voices interact.

  • Unified Context: The system maintains the tone and flow of the conversation across different speakers, ensuring no jarring transitions.


🌍 Global Reach & Scale

Our infrastructure is designed for global deployment, ensuring your application speaks your customers' language—literally.

Feature Specification
Voice Portfolio Access 380+ distinct voice personas across all tiers.
Language Coverage Native support for 80+ languages and variants (locales).
Regional Accents Deep support for regional nuances (e.g., 5+ variants of English, 3+ variants of Spanish and French).
Studio Tier specialized voices recorded by professional voice actors for long-form content (audiobooks/news) to eliminate listener fatigue.

⚡ Technical Specifications

Built for developers who demand reliability and flexibility.

  • Ultra-Low Latency: "Flash" model architecture delivers audio in <300ms, enabling real-time, interruptible voice conversations for AI agents.

  • High-Fidelity Audio:

  • Studio Quality: Up to 48 kHz sample rate.

  • Compressed Output: (MP3) for post-production.

  • Input Flexibility: Accepts Plain Text and Natural Language Prompts.

  • Bidirectional Streaming: Playback begins instantly while the rest of the sentence is still being generated.


🎯 Ideal Use Cases

  • Interactive AI Agents: Power customer support bots that sound empathetic and human, not robotic.

  • Content Production: Automate audiobook narration, podcast creation, and video dubbing at a fraction of the cost of a studio.

  • EdTech & E-Learning: Generate dynamic language learning lessons with perfect native pronunciation in 80+ languages.

  • Gaming & VR: Create dynamic NPCs (Non-Player Characters) that can generate unique dialogue on the fly without pre-recorded lines.

API Documentation

Endpoints


Get list of voices



                                                                            
GET https://pr140-testing.zylalabs.com/api/11558/ultra+text-to-speech+api/21834/list+of+voices
                                                                            
                                                                        

Test Endpoint

API EXAMPLE RESPONSE

       
                                                                                                        
                                                                                                                                                                                                                                                                                                                                        {
	"data": [
		{
			"gender": "FEMALE",
			"language_code": "en-US",
			"language_name": "English (US)",
			"type": "Premium",
			"voice_id": "en-US-News-L"
		}
	],
	"message": "success",
	"success": true
}
                                                                                                                                                                                                                    
                                                                                                    

List of voices - CODE SNIPPETS


curl --location --request GET 'https://zylalabs.com/api/11558/ultra+text-to-speech+api/21834/list+of+voices' --header 'Authorization: Bearer YOUR_API_KEY' 


    

Generate text-to-speech



                                                                            
POST https://pr140-testing.zylalabs.com/api/11558/ultra+text-to-speech+api/21835/create+text-to-speech
                                                                            
                                                                        

Create text-to-speech - Endpoint Features

Object Description
Request Body [Required] Json
Test Endpoint

API EXAMPLE RESPONSE

       
                                                                                                        
                                                                                                                                                                                                                                                                                                                                        {"data":"https://s3.us-east-1.amazonaws.com/invideo-uploads-us-east-1/speechen-US-News-L17664032245720.mp3","message":"success","success":true}
                                                                                                                                                                                                                    
                                                                                                    

Create text-to-speech - CODE SNIPPETS


curl --location --request POST 'https://zylalabs.com/api/11558/ultra+text-to-speech+api/21835/create+text-to-speech' --header 'Authorization: Bearer YOUR_API_KEY' 

--data-raw '{
	"gender": "FEMALE",
	"language_code": "en-US",
	"language_name": "English (US)",
	"voice_id": "en-US-News-L",
	"text": "Stand by... we have a major development coming into the newsroom right now. After weeks of uncertainty—and hours of intense speculation—the decision has finally been made. The result? It is absolutely not what anyone expected! Sources on the ground are describing the atmosphere as tense... yet strangely hopeful. We are working to confirm the details at this very moment, so please... do not go anywhere."
}'

    

API Access Key & Authentication

After signing up, every developer is assigned a personal API access key, a unique combination of letters and digits provided to access to our API endpoint. To authenticate with the ULTRA Text-to-Speech API simply include your bearer token in the Authorization header.
Headers
Header Description
Authorization [Required] Should be Bearer access_key. See "Your API Access Key" above when you are subscribed.

Simple Transparent Pricing

No long-term commitment. Upgrade, downgrade, or cancel anytime. Free Trial includes up to 50 requests.

🚀 Enterprise

Starts at
$ 10,000/Year


  • Custom Volume
  • Custom Rate Limit
  • Specialized Customer Support
  • Real-Time API Monitoring

Customer favorite features

  • ✔︎ Only Pay for Successful Requests
  • ✔︎ Free 7-Day Trial
  • ✔︎ Multi-Language Support
  • ✔︎ One API Key, All APIs.
  • ✔︎ Intuitive Dashboard
  • ✔︎ Comprehensive Error Handling
  • ✔︎ Developer-Friendly Docs
  • ✔︎ Postman Integration
  • ✔︎ Secure HTTPS Connections
  • ✔︎ Reliable Uptime

ULTRA Text-to-Speech API FAQs

The GET List of voices endpoint returns a list of available voice personas, including attributes like gender, language code, and voice type. The POST Create text-to-speech endpoint returns a URL link to the generated audio file along with a success message.

For the GET List of voices, key fields include "gender," "language_code," "language_name," "type," and "voice_id." For the POST Create text-to-speech, the key fields are "data" (audio URL), "message," and "success."

The POST Create text-to-speech endpoint accepts parameters such as the text to be converted and optional natural language prompts for voice modulation. Users can customize the delivery style and pacing through these prompts.

The response data for the GET List of voices is organized in a JSON format with an array of voice objects under the "data" key. The POST Create text-to-speech response includes a single object with "data," "message," and "success" keys.

Typical use cases include generating dynamic audio for interactive AI agents, automating audiobook narration, creating engaging educational content, and enhancing gaming experiences with realistic NPC dialogue.

Data accuracy is maintained through a combination of professional voice actor recordings and advanced AI algorithms that ensure high-quality voice synthesis. Continuous updates and user feedback also contribute to improving voice performance.

Users can utilize the returned audio URL from the POST Create text-to-speech response to play or store the generated audio. The voice attributes from the GET List of voices can help users select the most suitable voice for their application.

Users can expect structured JSON responses with clear success indicators. For the GET List of voices, the data will typically include multiple voice options, while the POST Create text-to-speech will return a single audio file link upon successful processing.

Users can customize their voice selection by utilizing the attributes returned in the GET List of voices. They can filter voices based on gender, language, and type to find the most suitable voice persona for their application.

The API supports audio output in MP3 format for the generated text-to-speech audio. This format is suitable for post-production and easy integration into various applications.

The API's Affective Intelligence feature dynamically adjusts the emotional weight of the speech based on the sentiment of the input text, allowing for a more engaging and contextually appropriate delivery.

The "data" field in the POST Create text-to-speech response contains the URL link to the generated audio file. Users can use this link to play or download the audio for their applications.

The Multi-Speaker "Dialogue" Engine allows the API to simulate conversations with distinct voices, maintaining unified context and tone, which is essential for creating realistic interactions in podcasts or customer service scenarios.

Natural language prompts enable users to control voice delivery style intuitively, allowing for creative expressions like "speak excitedly" or "read slowly." This flexibility enhances the audio's emotional impact and engagement.

The API offers deep support for regional accents, providing multiple variants for languages like English, Spanish, and French. This ensures that the generated speech resonates with local audiences and enhances relatability.

If users receive an empty response, they should check their input parameters for accuracy and completeness. Ensuring valid text and prompts can help avoid empty results and improve the likelihood of successful audio generation.

General FAQs

Zyla API Hub is like a big store for APIs, where you can find thousands of them all in one place. We also offer dedicated support and real-time monitoring of all APIs. Once you sign up, you can pick and choose which APIs you want to use. Just remember, each API needs its own subscription. But if you subscribe to multiple ones, you'll use the same key for all of them, making things easier for you.

Prices are listed in USD (United States Dollar), EUR (Euro), CAD (Canadian Dollar), AUD (Australian Dollar), and GBP (British Pound). We accept all major debit and credit cards. Our payment system uses the latest security technology and is powered by Stripe, one of the world's most reliable payment companies. If you have any trouble paying by card, just contact us at [email protected]


Additionally, if you already have an active subscription in any of these currencies (USD, EUR, CAD, AUD, GBP), that currency will remain for subsequent subscriptions. You can change the currency at any time as long as you don't have any active subscriptions.

The local currency shown on the pricing page is based on the country of your IP address and is provided for reference only. The actual prices are in USD (United States Dollar). When you make a payment, the charge will appear on your card statement in USD, even if you see the equivalent amount in your local currency on our website. This means you cannot pay directly with your local currency.

Occasionally, a bank may decline the charge due to its fraud protection settings. We suggest reaching out to your bank initially to check if they are blocking our charges. Also, you can access the Billing Portal and change the card associated to make the payment. If these does not work and you need further assistance, please contact our team at [email protected]

Prices are determined by a recurring monthly or yearly subscription, depending on the chosen plan.

API calls are deducted from your plan based on successful requests. Each plan comes with a specific number of calls that you can make per month. Only successful calls, indicated by a Status 200 response, will be counted against your total. This ensures that failed or incomplete requests do not impact your monthly quota.

Zyla API Hub works on a recurring monthly subscription system. Your billing cycle will start the day you purchase one of the paid plans, and it will renew the same day of the next month. So be aware to cancel your subscription beforehand if you want to avoid future charges.

To upgrade your current subscription plan, simply go to the pricing page of the API and select the plan you want to upgrade to. The upgrade will be instant, allowing you to immediately enjoy the features of the new plan. Please note that any remaining calls from your previous plan will not be carried over to the new plan, so be aware of this when upgrading. You will be charged the full amount of the new plan.

To check how many API calls you have left for the current month, refer to the 'X-Zyla-API-Calls-Monthly-Remaining' field in the response header. For example, if your plan allows 1,000 requests per month and you've used 100, this field in the response header will indicate 900 remaining calls.

To see the maximum number of API requests your plan allows, check the 'X-Zyla-RateLimit-Limit' response header. For instance, if your plan includes 1,000 requests per month, this header will display 1,000.

The 'X-Zyla-RateLimit-Reset' header shows the number of seconds until your rate limit resets. This tells you when your request count will start fresh. For example, if it displays 3,600, it means 3,600 seconds are left until the limit resets.

Yes, you can cancel your plan anytime by going to your account and selecting the cancellation option on the Billing page. Please note that upgrades, downgrades, and cancellations take effect immediately. Additionally, upon cancellation, you will no longer have access to the service, even if you have remaining calls left in your quota.

You can contact us through our chat channel to receive immediate assistance. We are always online from 8 am to 5 pm (EST). If you reach us after that time, we will get back to you as soon as possible. Additionally, you can contact us via email at [email protected]

To give you the opportunity to experience our APIs without any commitment, we offer a 7-day free trial that allows you to make up to 50 API calls at no cost. This trial can be used only once, so we recommend applying it to the API that interests you the most. While most of our APIs offer a free trial, some may not. The trial concludes after 7 days or once you've made 50 requests, whichever occurs first. If you reach the 50 request limit during the trial, you will need to "Start Your Paid Plan" to continue making requests. You can find the "Start Your Paid Plan" button in your profile under Subscription -> Choose the API you are subscribed to -> Pricing tab. Alternatively, if you don't cancel your subscription before the 7th day, your free trial will end, and your plan will automatically be billed, granting you access to all the API calls specified in your plan. Please keep this in mind to avoid unwanted charges.

After 7 days, you will be charged the full amount for the plan you were subscribed to during the trial. Therefore, it's important to cancel before the trial period ends. Refund requests for forgetting to cancel on time are not accepted.

When you subscribe to an API free trial, you can make up to 50 API calls. If you wish to make additional API calls beyond this limit, the API will prompt you to perform an "Start Your Paid Plan." You can find the "Start Your Paid Plan" button in your profile under Subscription -> Choose the API you are subscribed to -> Pricing tab.

Payout Orders are processed between the 20th and the 30th of each month. If you submit your request before the 20th, your payment will be processed within this timeframe.


Related APIs


You might also like