Did they train on my voice? How to check and opt out

You usually cannot prove a model trained on your voice, but you are not without moves. There are four concrete steps: search the surfaces you can, read any hit for exactly what it is worth, opt out so compliant trainers skip you from now on, and keep the clean recordings that prove the work is yours even when training use cannot be proven. None of it is a courtroom result, and together it is still the strongest position a voice actor or speaker can hold today.

Step 1: search what is actually searchable

Start with what a public index can see. Spawning’s “Have I Been Trained” searches LAION-5B for images, so run your headshots, promotional stills, and any video frames of you through it, and log every hit with the file, the date, and a screenshot. A match is evidence that a copy of that image sat in a public training set. For your actual voice recordings there is no equivalent public search, so this step covers your visual surfaces and stops at the edge of audio. That boundary is not a failure of effort on your part; the tool for sound simply does not exist yet, and for a voice professional whose face and voice travel together in the same press kit, a photo hit is often the only searchable thread you have. A dated log is worth far more later than a recollection, so start the record now even if you take no other step.

Step 2: read a hit for what it is worth

A dataset hit is suggestive, not conclusive. It shows presence in one public set, not that a specific model trained on your file, and it says nothing about closed commercial systems. The statistical alternative does not close that gap either. Membership inference, introduced by Shokri, Stronati, Song and Shmatikov at IEEE S&P 2017 to, “given a data record and black-box access to a model, determine if the record was in the model’s training dataset,” exists for speech, and Tsaprazlis, Lertpetchpun, Feng, Karimireddy and Narayanan show with VoxGuard at ICASSP 2026 that it can “recover gender and accent with near-perfect accuracy even after anonymization.” But a 2025 position paper by Zhang, Das, Kamath and Tramèr at IEEE SaTML 2025 is titled, plainly, “Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data,” because those tests lack a provably low false-positive rate on production models. So if anyone sells you an upload-your-voice proof service, ask for its false-positive rate before you believe the result, and treat any number with no stated error rate as a lead rather than a finding.

Step 3: opt out going forward

Opting out does not undo past training, but it is the one lever that changes what happens next. It has two parts. Register your works with an opt-out list such as Spawning’s Do-Not-Train registry, which participating trainers check before ingesting data, and set your own crawl rules with a robots.txt and an ai.txt that name the training crawlers, including GPTBot, Google-Extended, and CCBot, because a compliant bot reads robots.txt before it fetches and skips the disallowed paths. The mechanics of both are walked through in anti-scrape data poisoning and opting out. The limit is built into the design: the Nightshade authors, Shan, Ding, Passananti, Wu, Zheng and Zhao at IEEE S&P 2024, built their poison specifically for the scrapers that “ignore opt-out/do-not-crawl directives,” so opt-out only binds the crawlers that were going to behave anyway. It is necessary and worth doing, and it is not sufficient on its own.

Step 4: protect future recordings and keep the master private

Because training use is so hard to prove, protect what you can and prove what you can. Cloak new voice uploads with a tool like AntiFake from Yu, Zhai and Zhang at ACM CCS 2023, which reports “over 95% protection rate even to unknown black-box models,” and keep your clean recordings off the public web, because protection can be stripped. Fan, Chen, Liu, Zhang and Yu show with De-AntiFake at ICML 2025 that “existing purification methods can neutralize a considerable portion of the protective perturbations,” restoring a clone on VoiceGuard-protected speech to a verification score of 0.830 against 0.762 for AntiFake, on a scale where higher means a better clone. So a cloak buys deterrence, not a lock. The uncloaked master is both the file a cloner most wants and your strongest evidence of authorship.

Step	What it gives you	What it cannot give you
Search surfaces	Evidence an image of you was scraped	Anything about your voice recordings, or closed models
Opt out	Fewer future ingests by compliant trainers	Removal from models already trained
Keep provenance	Provable ownership	Proof of training use

The realistic plan is layered and modest. Search the image surfaces you have, read any hit as a lead rather than a verdict, opt out to shape what future models ingest, and bank the clean recordings that survive every argument about training. Work the four steps in order and keep the paper trail, because a dated record of what was scraped and when is the part that holds up long after a specific test result stops meaning much. What you cannot do today is prove, cleanly and publicly, that a given model learned from your voice, so build around suspicion managed well. For whether you can tell at all, see was my voice used to train AI; for the active-defense pairing, anti-scrape poisoning and opt-out for your voice.

Sources

Shokri, Stronati, Song, Shmatikov (2017). Membership Inference Attacks against Machine Learning Models. IEEE S&P 2017.
Tsaprazlis, Lertpetchpun, Feng, Karimireddy, Narayanan (2026). VoxGuard: Evaluating User and Attribute Privacy in Speech via Membership Inference Attacks. ICASSP 2026.
Zhang, Das, Kamath, Tramèr (2025). Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data. IEEE SaTML 2025.
Shan, Ding, Passananti, Wu, Zheng, Zhao (2024). Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. IEEE S&P 2024.
Yu, Zhai, Zhang (2023). AntiFake: Using Adversarial Audio to Prevent Unauthorized Speech Synthesis. ACM CCS 2023.
Fan, Chen, Liu, Zhang, Yu (2025). De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks. ICML 2025.