poisoning.ai
Scraping

Anti-scrape poisoning and opting out for your music

By The Poisoning.ai team
5 min read
Contents

Two independent levers fight a model training on your music, and each defends against a different opponent: you can cloak what a scraper takes so a generator learns the wrong thing, and you can opt out so a scraper that respects rules skips you. Cloaking does nothing to a crawler that already honors your opt-out, and opt-out does nothing to a crawler that ignores it, so real protection uses both and treating either one as sufficient leaves half the threat uncovered.

Lever A: cloak what a scraper takes

Active protection embeds an imperceptible, engineered perturbation that corrupts what a model would learn from your track. For instrumental music the tool is HarmonyCloak, from Meerza, Sun and Liu (IEEE S&P 2025), which makes a recording unlearnable by injecting error-minimizing noise and, in its own tests, drops a generator’s target-genre accuracy from 89.2% to 32.3%. That statistic is the reason HarmonyCloak belongs in a music workflow rather than an image or voice one: it gives musicians a concrete content-side lever, so a model that trains on the cloaked copy gets a worse learning signal. The idea is borrowed from the image world, where Shan, Cryan, Wenger, Zheng, Hanocka and Zhao (USENIX Security 2023) built Glaze to apply “style cloaks,” which they describe as “barely perceptible perturbations to images, and when used as training data, mislead generative models that try to mimic a specific artist.” The concept-poison lineage runs through Nightshade on the image side and, for the speaking voice, through AntiFake from Yu, Zhai and Zhang (ACM CCS 2023), so a music cloak is one member of a well-studied family rather than a one-off trick.

Lever B: opt out so compliant trainers skip you

Passive protection is a request that well-behaved crawlers honor, and it has two parts. First, a robots.txt on any site you control that names the training crawlers, including GPTBot, Google-Extended, and CCBot (Common Crawl); a compliant bot reads it before it fetches and skips the disallowed paths. Second, a Do-Not-Train registry such as Spawning’s, which participating trainers query to drop opted-out works before training begins. One is read by the crawler and the other is queried by the trainer, and neither alters the audio itself; each is a request and instruction to compliant crawlers before they collect the file. The full registry sign-up and robots.txt walkthrough is in anti-scrape data poisoning and opting out.

Keep the clean master and stems private

Neither lever protects a file you never should have released in the clear. A cloak only guards the copy it is applied to, and an opt-out only speaks for data you control, so the clean master, the raw stems, and the unprocessed exports are the assets worth keeping off the public web entirely. They are the material a purifier most wants, and the one set of copies no tool can retract once they are out. Treat the protected public file and the clean production file as different assets: publish the protected copy when you choose to publish audio, and if you need to send clean assets to collaborators, use a channel that is not indexed as a public release surface.

The hinge: why neither is enough alone

The two levers cover mirror-image blind spots. Shan, Ding, Passananti, Wu, Zheng and Zhao (IEEE S&P 2024) are explicit that their Nightshade poison, “a prompt-specific poisoning attack optimized for potency,” exists for “web scrapers that ignore opt-out/do-not-crawl directives,” which is exactly the opponent an opt-out can never reach. Opt-out, in turn, handles the compliant trainer that a cloak never needs to touch. If you also leave clean masters, stems, or unprocessed exports in public folders, the cloak on the official release does not protect those files.

LeverBites whomExample toolsGap it leaves
CloakScrapers that take your track anywayHarmonyCloakNo effect on a crawler that already skips you
Opt-outScrapers that obey rulesrobots.txt, Spawning registryNo effect on a crawler that ignores rules

The limits

Treat a music cloak as raising the cost of copying, not guaranteeing exclusion. There is no published music-specific break of HarmonyCloak, so do not read one in, but the parallel attacks in neighbouring media are real and point the same way. Foerster, Behrouzi, Rieger, Jadliwala and Sadeghi (USENIX Security 2025) present LightShed as “a generalizable depoisoning attack that effectively identifies poisoned images and removes adversarial perturbations,” reporting Nightshade detection at 99.98% true-positive before stripping it and demonstrating the same against Glaze, and Fan, Chen, Liu, Zhang and Yu (ICML 2025) show with De-AntiFake that purification can neutralize a considerable portion of a voice cloak and leave a speaker with a “false sense of security.” Since removal has been demonstrated for both images and voice, the prudent assumption is that a music cloak raises cost rather than settling the matter. Opt-out has the opposite weakness, honored only voluntarily, so a non-compliant scraper simply ignores it. The two levers are complementary because their failure modes are opposite: the non-compliant scraper your opt-out can never stop is the one your cloak is built for, and the compliant trainer your opt-out already handles is the one your cloak never has to reach. Layer both, keep your cleanest master and stems off the public web, and expect to reapply protection as attacks evolve. For whether the music tools hold up, see does music AI protection actually work; for the auditing side, did they train on my music; and for the wider picture, do AI poisoning tools actually work.

Sources

  • Meerza, Sun, Liu (2025). HarmonyCloak: Making Music Unlearnable for Generative AI. IEEE S&P 2025.
  • Shan, Cryan, Wenger, Zheng, Hanocka, Zhao (2023). GLAZE: Protecting Artists from Style Mimicry by Text-to-Image Models. USENIX Security 2023.
  • Shan, Ding, Passananti, Wu, Zheng, Zhao (2024). Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models. IEEE S&P 2024.
  • Yu, Zhai, Zhang (2023). AntiFake: Using Adversarial Audio to Prevent Unauthorized Speech Synthesis. ACM CCS 2023.
  • Foerster, Behrouzi, Rieger, Jadliwala, Sadeghi (2025). LightShed: Defeating Perturbation-based Image Copyright Protections. USENIX Security 2025.
  • Fan, Chen, Liu, Zhang, Yu (2025). De-AntiFake: Rethinking the Protective Perturbations Against Voice Cloning Attacks. ICML 2025.
#data-poisoning#opt-out#scraping#music#harmonycloak#robots-txt
Get new protection tests & guides

New protection tests, breakdowns and how-long-does-it-hold checks. No spam, unsubscribe anytime.