Skip to main content
Version: MVP

Speech-to-text adapter contract

2 min readReferenceUpdated 2026-05-22

What you'll find here

The three records that compose a speech-to-text adapter result, the validation rules, and the redaction boundary.

Audio in, transcript out — never raw payloads in receipts.

Speech-to-text results must not persist raw audio, audio bytes, waveforms, raw payloads, private transcript metadata, credentials, tokens, or private local state. Transcript text, segment text, and metadata all pass through the shared redaction boundary before storage.

Records

Record
Captures
Fields
SpeechToTextInputMetadata
input
Media artifact id · MIME type · duration (ms) · language hint · channel count · redacted metadata.
SpeechToTextTranscript
output
Transcript text · language · confidence · transcript segments · redacted metadata · redaction status.
SpeechToTextResult
envelope
Result id · task id · adapter id · status (completed / partial / failed) · input metadata · transcript · errors · policy envelope id · evidence ids · receipt ids · redacted paths · creation timestamp.

Validation

Completed / partial

Require a transcript.

Failed

Require at least one error.

All

Require policy envelope, evidence, and receipt links.

Segment offsets

When both present, end ≥ start.

What's next