Audio To Text - Livepeer Docs

POST

audio-to-text

Audio To Text

curl --request POST \
  --url https://dream-gateway.livepeer.cloud/audio-to-text \
  --header 'Authorization: Bearer <token>' \
  --header 'Content-Type: multipart/form-data' \
  --form audio='@example-file' \
  --form model_id= \
  --form return_timestamps=true \
  --form 'metadata={}'

{
  "text": "<string>",
  "chunks": [
    {
      "timestamp": [
        "<unknown>"
      ],
      "text": "<string>"
    }
  ]
}

The default Gateway used in this guide is the public Livepeer.cloud Gateway. It is free to use but not intended for production-ready applications. For production-ready applications, consider using the Livepeer Studio Gateway, which requires an API token. Alternatively, you can set up your own Gateway node or partner with one via the ai-video channel on Discord.

Please note that the exact parameters, default values, and responses may vary between models. For more information on model-specific parameters, please refer to the respective model documentation available in the audio-to-text pipeline. Not all parameters might be available for a given model.

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

multipart/form-data

audio

file

required

Uploaded audio file to be transcribed.

model_id

string

default:""

Hugging Face model ID used for transcription.

return_timestamps

string

default:true

Return timestamps for the transcribed text. Supported values: 'sentence', 'word', or a string boolean ('true' or 'false'). Default is 'true' ('sentence'). 'false' means no timestamps. 'word' means word-based timestamps.

metadata

string

default:{}

Additional job information to be passed to the pipeline.

Response

Successful Response

Response model for text generation.

text

string

required

The generated text.

chunks

Chunk · object[]

required

The generated text chunks.

Show child attributes

Last modified on March 18, 2026

Introduction

Image To Image

⌘I

AI Video

Documentation Index

Authorizations

Body

Response