Why it's the default (and where it isn't)
On TikTok, sound-off is the working assumption: feed context favors silence, the user is multitasking, the ad has 3 seconds to land without audio. On Instagram Reels, sound-on is more common (~40%) because Stories audio carries over and the user is more likely on their home WiFi than on the bus. On YouTube Shorts, sound-on is the dominant mode (~75%) — viewers come from a sound-on context (long- form YouTube). But the silent 25% on Shorts still includes most mobile-feed-scroll moments, so on-screen text remains the floor.
The Ad Bench weighs sound-off comprehension into native feel on the same 0–100 scale across platforms. The CALIBRATION of what counts as "strong" shifts: a TikTok ad with no captions caps at 50; a Shorts ad with no captions caps closer to 65 (because the higher sound-on rate gives voiceover more lift). On-screen text always helps; on TikTok it's existential.
The three failure modes (universal)
- —No captions, voiceover-only. Most common failure mode across all three platforms. Script is great, audio is great, but the muted viewer sees a person mouthing words. They scroll before the hook lands.
- —Captions exist but lag. Auto-captions that come in 1–2 seconds late blow the hook window. The viewer needs the words landing in real-time on the first frame to read it before scroll instinct kicks in.
- —Captions exist but compete. Caption text in a small grey box at the bottom of the frame, fighting with the platform UI. The eye doesn't find it. Compare with sticky text in a high-contrast color, mid-frame, treated as a visual element — that reads instantly.
What strong sound-off design looks like
The test: turn the ad on, mute it, hide the audio waveform, and ask someone who's never seen it to tell you what the product is and what the offer is in 5 seconds. If they can, you're passing. If they can't, the ad has a sound-off comprehension problem regardless of how good the script is.
Concrete tactics that score well in the production-analysis section:
- —Sticky on-screen texton every frame. Hook line on frame 1. Product name visible by frame 3. CTA text on the closing frame even when it's also voiced over.
- —High-contrast caption styling. White or yellow on a black or dark-magenta background, never grey-on-busy. Each platform's native caption picker is fine; just don't leave it on the default settings.
- —Visual proof, not described proof. Don't say "before and after" in voiceover — show the before and after, side-by-side. Sound-off comprehension hates ads that describe what an audio version would have shown.
- —Treat the soundtrack as a bonus. Trending audio, voiceover, jingles all add lift on top of a sound-off-coherent ad. They don't compensate for one that doesn't read muted. The ad has to work without them.
What about voiceover, then?
Voiceover still matters — once a viewer with sound on engages, voice quality drives hold rate and CTA conversion. On Shorts specifically, where sound-on is the dominant context, voiceover quality carries closer-to-equal weight with on-screen text. The point isn't "don't voice your ads." It's "design as if voiceover doesn't exist, then layer voiceover on top."
The Ad Bench rubric weighs voiceover style (creator / professional / AI synthetic) into native feel. A creator-voiced UGC read scores higher native feel than a studio-recorded VO of the same script on every platform. Synthetic AI voices score lowest — the audience clocks them and the trust drops.
Platform-specific calibration
- —TikTok (~85% mute): On-screen text is mandatory. Voiceover-only caps the rubric category at 50. The 3-second hook window is decided silently.
- —Reels (~60% mute): On-screen text still required for the silent majority. Music + licensed-library audio carries more weight than TikTok — the Reels algorithm rewards trending audio more directly. Voiceover-only caps the category at 55.
- —Shorts (~25% mute): On-screen text is still the floor for the silent quarter, but voiceover quality matters more here than on TikTok or Reels. A great VO with weak captions can still score above 65 if the audio carries the watch-completion signal. Repeat- views — the #1 Shorts algorithm signal — depend on the loop feeling smooth, which sound-off design enables.