Did they train on my music? How to check and opt out

You usually cannot prove a model trained on your songs, but you are not without moves. There are four concrete steps: search the surfaces you can, read any hit for exactly what it is worth, opt out so compliant trainers skip you from now on, and protect and document what you release next. None of it is a courtroom result, and together it is still the strongest position a musician can hold today.

Step 1: search the surfaces you can

Start with what is actually searchable, which is your images rather than your audio. Spawning’s “Have I Been Trained” indexes LAION-5B, a public image-text dataset, so it can surface your cover art, promotional stills and music-video frames, but it does not index sound. Run every visual asset tied to a release through it and log each hit with the file name, the date, and a screenshot. A match does not show that a model learned your track, but it does establish that your material was scraped into a public set, which is worth knowing on its own. For the audio itself there is no consumer dataset-search tool, so this step stops at the edge of sound, and that gap is a limit of the tooling, not of your diligence.

It helps to treat a single release as many separate assets. The master, the stems, the cover art, the press photos, and the video frames are all distinct files, and public search tools mostly see the visual ones. Check every image surface you have, because those are the assets an index can actually find, and accept that the recording itself is the part you cannot yet audit this way.

Step 2: read a hit for what it is worth

A dataset hit is a lead, not a verdict. Membership inference, the general technique behind any claim that your data was used, was introduced by Shokri, Stronati, Song and Shmatikov (IEEE S&P 2017), who framed the problem as, “given a data record and black-box access to a model, determine if the record was in the model’s training dataset.” The catch is that on real systems this cannot be made reliable. A 2025 position paper by Zhang, Das, Kamath and Tramèr (IEEE SaTML 2025) is titled, flatly, “Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data,” because such tests lack a provably low false-positive rate on production models. Mukherjee, Xu, Trivedi, Patowary and Lavista Ferres (PoPETs 2021) showed with privGAN that generative adversarial networks, the family behind several music generators, do leak a membership signal, but that remains a lab demonstration rather than a proof tool you can point at a commercial system. So if a service offers to upload your song and prove training use, ask for its false-positive rate, the model it tested, and the access it had before you trust the answer. There is one narrow exception worth knowing: data extraction, as Carlini, Tramèr, Wallace, Jagielski, Herbert-Voss, Lee, Roberts, Brown, Song, Erlingsson, Oprea and Raffel (USENIX Security 2021) demonstrated by pulling memorized text out of a language model, which for music would mean a system reproducing your specific recording rather than merely generating in your style. That is rare evidence, not a routine audit path.

Step 3: opt out going forward

Opting out changes nothing about past training, but it is the one lever that shapes what future models ingest. It has two parts. Register your catalogue with a Do-Not-Train list such as Spawning’s registry, which participating trainers query before they ingest data, and set crawl rules on any site you control with a robots.txt and an ai.txt that name the training crawlers, including GPTBot, Google-Extended, and CCBot (Common Crawl). A compliant bot reads robots.txt before it fetches and skips the disallowed paths. The built-in limit is honest: Shan, Ding, Passananti, Wu, Zheng and Zhao (IEEE S&P 2024) built Nightshade precisely for the scrapers that “ignore opt-out/do-not-crawl directives,” so opt-out only binds the crawlers that were going to behave. The mechanics of both steps are walked through in anti-scrape data poisoning and opting out.

Step 4: protect and document what you release next

Because training use is so hard to prove after the fact, protect the next release and keep the evidence of ownership. For instrumental music, Meerza, Sun and Liu (IEEE S&P 2025) built HarmonyCloak to make tracks unlearnable by injecting error-minimizing noise, dropping a generator’s target-genre accuracy from 89.2% to 32.3% in their own tests. Treat that as raising the cost of copying, not a guarantee, because cloaks are removable: LightShed (Foerster, Behrouzi, Rieger, Jadliwala, Sadeghi, USENIX Security 2025) detects poisoned images at 99.98% true-positive before stripping them, and De-AntiFake (Fan, Chen, Liu, Zhang, Yu, ICML 2025) does the parallel job for voice. So cloak new work, but keep the clean master and stems off the public web, since the uncloaked file is both what a scraper most wants and your strongest proof of authorship.

Move	What it gives you	What it cannot give you
Search image surfaces	Evidence a visual asset was in a public set	Anything about your audio or closed models
Opt out	Fewer future ingests by compliant trainers	Removal from models already trained
Cloak and keep the master	Higher copying cost and provable ownership	Proof a model trained on your songs

The realistic plan is layered and modest. Search the image surfaces, read any hit as a lead rather than a result, opt out to shape what future models take, cloak new releases, and bank the clean master that proves the work is yours. What you cannot do today is show, cleanly and publicly, that a given model learned from your songs, so build around suspicion managed well. For whether you can tell at all, see was my music used to train AI; for the poison-and-opt-out pairing, anti-scrape poisoning and opt-out for your music.

Sources

Shokri, Stronati, Song, Shmatikov (2017). Membership Inference Attacks against Machine Learning Models. IEEE S&P 2017.
Zhang, Das, Kamath, Tramèr (2025). Membership Inference Attacks Cannot Prove that a Model Was Trained On Your Data. IEEE SaTML 2025.
Mukherjee, Xu, Trivedi, Patowary, Lavista Ferres (2021). PrivGAN: Protecting GANs from Membership Inference Attacks at Low Cost. PoPETs 2021.
Carlini, Tramèr, Wallace, Jagielski, Herbert-Voss, Lee, Roberts, Brown, Song, Erlingsson, Oprea, Raffel (2021). Extracting Training Data from Large Language Models. USENIX Security 2021.
Meerza, Sun, Liu (2025). HarmonyCloak: Making Music Unlearnable for Generative AI. IEEE S&P 2025.
Shan, Ding, Passananti, Wu, Zheng, Zhao (2024). Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. IEEE S&P 2024.
Foerster, Behrouzi, Rieger, Jadliwala, Sadeghi (2025). LightShed: Defeating Perturbation-based Image Copyright Protections. USENIX Security 2025.
Fan, Chen, Liu, Zhang, Yu (2025). De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks. ICML 2025.