Version: MVP

Speech-to-text adapter contract

2 min readReferenceUpdated 2026-05-22

What you'll find here

The three records that compose a speech-to-text adapter result, the validation rules, and the redaction boundary.

Audio in, transcript out — never raw payloads in receipts.

Speech-to-text results must not persist raw audio, audio bytes, waveforms, raw payloads, private transcript metadata, credentials, tokens, or private local state. Transcript text, segment text, and metadata all pass through the shared redaction boundary before storage.

Records

Record

Captures

Fields

SpeechToTextInputMetadata

input

Media artifact id · MIME type · duration (ms) · language hint · channel count · redacted metadata.

SpeechToTextTranscript

output

Transcript text · language · confidence · transcript segments · redacted metadata · redaction status.

SpeechToTextResult

envelope

Result id · task id · adapter id · status (completed / partial / failed) · input metadata · transcript · errors · policy envelope id · evidence ids · receipt ids · redacted paths · creation timestamp.

Validation

Completed / partial

Require a transcript.

Failed

Require at least one error.

All

Require policy envelope, evidence, and receipt links.

Segment offsets

When both present, end ≥ start.

What's next

ReferenceVoice postureThe decision that authorizes adapter calls.ReferenceText-to-speech adaptersThe output-direction counterpart.ReferenceMultimodal artifact referencesHow adapters cite audio without raw payloads.

Records​

Validation​