Q: Does DeliverCC caption dialogue and ad-libs, or only sung lyrics?

Both. Paste everything you want captioned: lyrics, ad-libs, and any spoken dialogue, and DeliverCC aligns all of it. A broadcast caption file has to carry every word, sung and spoken, so captioning the spoken parts is what makes your delivery complete and compliant. Because the tool aligns the text you provide, anything you want captioned has to be in what you paste. An ad-lib that is not in your lyric sheet will not appear unless you add it.

Question 1

What caption formats does DeliverCC export, and which one do I use?

Accepted Answer

DeliverCC outputs four formats from a single generation: SRT (universal subtitle format for YouTube, Vimeo, social media), VTT (web video standard for HTML5 players), SCC (Scenarist Closed Captions for US broadcast TV), and TTML (Apple Music synced-lyrics dialect, the line-level file labels send to Apple Music to enable karaoke-style lyric highlighting in the app).

Question 2

Does DeliverCC caption dialogue and ad-libs, or only sung lyrics?

Accepted Answer

Both. Paste everything you want captioned: the lyrics, ad-libs, and any spoken dialogue, and DeliverCC aligns all of it. A broadcast caption file has to carry every word, sung and spoken, so captioning the spoken parts is what makes your delivery complete and compliant.

Because the tool aligns the text you provide, anything you want captioned has to be in what you paste. An ad-lib that is not in your lyric sheet will not appear unless you add it.

Question 3

What's the difference between video captions and Apple Music synced lyrics?

Accepted Answer

They go to two different places. Video captions (SRT, VTT, and SCC) ride with your video. They display text over the picture, synced to everything audible, and they work wherever the video plays: SRT and VTT for YouTube, Vimeo, and social, and SCC for US broadcast TV. Apple Music synced lyrics ride with the song instead. They are the lyrics that scroll and highlight line by line inside the Apple Music app while the track plays. Same timed text underneath, two different destinations, and they are not interchangeable. One paints words on a video; the other powers the lyrics view in the streaming app.

TTML (Timed Text Markup Language) is a W3C standard for timed text. DeliverCC emits the Apple Music synced-lyrics dialect of TTML, the line-level format Apple Music uses for lyrics that highlight in time with playback. It is the file your label or distributor submits to Apple, through Transporter or iTunes Connect, to turn on synced lyrics for a release. It is not a general video-caption TTML and it is not a video subtitle. For video, use the SRT, VTT, or SCC output.

Question 4

Can I use DeliverCC for Spotify lyrics?

Accepted Answer

Not as a file, because Spotify does not accept one. Spotify's synced lyrics are powered entirely by Musixmatch. The only way to add them is to verify an artist or label account in Musixmatch and sync the lyrics inside Musixmatch's own tool, which then pushes them to Spotify. No tool can hand Spotify a finished lyrics file.

Apple Music is different: it accepts a time-synced TTML lyrics file submitted directly by the rights holder or distributor, which is the file DeliverCC produces. So DeliverCC serves the destination that takes a file and leaves the one that requires manual work in a separate tool. Instagram, Amazon Music, and Tidal run through Musixmatch the same way Spotify does.

Question 5

What languages does DeliverCC support?

Accepted Answer

DeliverCC supports twenty-one alignment languages: English, Spanish, Portuguese, Korean, Japanese, French, German, Italian, Arabic, Danish, Dutch, Finnish, Hindi, Indonesian, Norwegian, Polish, Russian, Swedish, Thai, Turkish, and Chinese. Each language uses the highest-quality alignment model available. For non-Latin script languages (Korean, Japanese, Arabic, Hindi, Thai, Chinese), lyrics must be provided in the native script of the song, not romanized transliteration.

Question 6

Why forced alignment instead of speech-to-text?

Accepted Answer

Music vocals break speech-to-text. Mumbled delivery, ad-libs, harmonies, autotune, non-lexical sounds, all of it degrades transcription accuracy to the point where what comes out doesn't match what was actually sung.

DeliverCC takes a different approach. You provide the lyrics that are correct, the artist-approved version, and the system aligns those lyrics to the audio rather than guessing what was sung. The captions say exactly what the lyric sheet says, with word-level timing accuracy that holds even on the hardest vocal performances.

Question 7

Do I provide the lyrics, or does DeliverCC transcribe them?

Accepted Answer

You provide the lyrics. DeliverCC is built around the lyric sheet being the source of truth, not a transcription. This matches the workflow most labels already use: captions go out matched to the official lyrics, not to whatever an AI thinks it heard in the recording. DeliverCC handles the timing, you control what the words say.

Question 8

How long does a generation take?

Accepted Answer

Typical generation takes 30 to 60 seconds from clicking Generate to captions appearing. The first request on a fresh worker can take around 90 seconds while infrastructure spins up; subsequent requests on warm workers are consistently faster. Most users see sub-60-second times in normal use.

Question 9

Can I edit the alignment manually after generation?

Accepted Answer

Yes. Every generation lands in the timeline editor with a waveform view, draggable block edges, per-block text editing, and full undo and redo. Most songs need zero edits. When edits are needed (usually for ad-libs or intro instrumentals), the fix takes seconds. Edits are baked into the exported caption file in whatever format you select.

Question 10

How does DeliverCC handle ad-libs, mumbled vocals, and producer tags?

Accepted Answer

Forced alignment handles ad-libs, producer tags, and mumbled vocals better than transcription tools. DeliverCC aligns to the lyrics you provide: if your lyric sheet includes the ad-lib, it gets timed alongside the vocal automatically. If your lyric sheet skips it (which is normal for filler "yeah" and "mmm" sounds), the surrounding words still align correctly. If you want to add or remove an ad-lib after generation, the timeline editor lets you edit any block text and adjust timing manually.

Question 11

What can I upload, and is there a size limit?

Accepted Answer

DeliverCC accepts standard audio formats (MP3, WAV, FLAC, AAC, M4A, OGG) and video formats (MP4, MOV, M4V, WebM, AVI, MKV). Uploads are capped at 500 MB and 15 minutes of duration. DeliverCC automatically extracts the audio from video uploads. For music video editors: raw video exports can be many gigabytes (over the 500 MB cap), so exporting audio-only from your editor is the faster path. A 5-minute MP3 is typically under 10 MB.

Question 12

What happens to my audio after I generate captions?

Accepted Answer

Audio files are automatically deleted from DeliverCC's storage approximately 14 days after upload, a window that covers the project review and revision phase.

Generated caption files stay in your account until you delete them. Nothing about your audio or your lyrics is used to train any model. The full retention policy is in the privacy policy.

Question 13

How does the credit system work?

Accepted Answer

One credit equals one caption generation. You get all four export formats with that single credit, generated from the same alignment data.

Monthly plans grant credits at the start of each billing period and reset each month: Creator gets 5, Studio gets 12, Label gets 30. Pay-as-you-go credits are bought one at a time and never expire. If you run out mid-month, you can buy a Pay-as-you-go credit or upgrade your plan. There are no overage charges and no per-format fees.

Format	What it is	Where to use it
SRT	The universal subtitle format. Plain text, simple timecodes	YouTube, Vimeo, Facebook, Instagram, TikTok, most video editors
VTT	Web video standard. WebVTT format	HTML5 video players, web embeds
SCC	Scenarist Closed Captions. CEA-608 broadcast standard	US broadcast TV (CBS, NBC, ABC, Fox)
TTML	Timed Text Markup Language. Apple Music synced-lyrics dialect (line-level)	Apple Music synced song lyrics. The file labels send via their distributor to power karaoke-style highlighting in the Apple Music app

How DeliverCC works.

Technical & Format

Workflow

Upload & File Formats

Trust & Business