Accompanying website to the paper Francisco Messina, Francesca Ronchini, Luca Comanducci, Paolo Bestagini, and Fabio Antonacci. "Mitigating data replication in text-to-audio generative diffusion models through anti-memorization guidance." arXiv preprint arXiv:2509.14934 (2025).
Abstract
A persistent challenge in generative audio models is data replication, where the model unintentionally generates parts of its training data during inference. In this work, we address this issue in text-to-audio diffusion models by exploring the use of anti-memorization strategies. We adopt Anti-Memorization Guidance (AMG), a technique that modifies the sampling process of pre-trained diffusion models to discourage memorization. Our study explores three types of guidance within AMG, each designed to reduce replication while preserving generation quality. We use Stable Audio Open as our backbone, leveraging its fully open-source architecture and training dataset. Our comprehensive experimental analysis suggests that AMG significantly mitigates memorization in diffusion-based text-to-audio generation without compromising audio fidelity or semantic alignment.
Additional material
In the following we will report data relative to generations with or without Anti-Memorization Guidance (AMG) for five prompts, specifically for each of these we report:
- Original audio/prompt.
- Audio generated without and with AMG (three each).
- Spectrograms of the original audio and of the ones generated with and without AMG.
- Similarity matrices computed between original audio and the audio files generated with and without AMG.
Example 1 (ID 1980)
Prompt: 126bpm 4/4. 4 measures with a fill. recorded with a pair of Neumann TLM 103s into protools.
Audio
-
Original
-
Generated Without AMG
- Generated With AMG
Spectrograms
-
Original
-
Generated Without AMG
Similarity Matrices
- Generated Without AMG
Example 2 (ID 4567)
Prompt: This is loop 52 in a series of 135 loops that belong together. They all have a deep dubspace feel in 1 bar 4/4 at 60 bpm and belong to the "Convoloops pack 02 - bare bone dubspace 60 bpm" sample pack. They all have the same name: "convoluted bare bone loop 60 bpm" with three numbers as suffix. The first of the three numbers indicates a group of samples with a similar sound and feel. The most rhythmic ones are these with 1 till 4 as last number in the suffix and these with 5 till 8 are more effects. Finally number 9 as the last number in the suffix is the start of the delay and/or reverb tail of the previous loop. The second number separates variations in pitch of the initial loop before any processing is applied. Of these variations number 5 is more granular & experimental. All loops were created using the microtonik VSTi. I took the bare bones preset and convoluted it with some variations of itself. After this process I added some more convolution with another SIR, some diffusion, delay & chorus with the Fusion Reflector ensemble from Native Instruments Reaktor, and some more character with (you never guess it) the excellent Character plugin from my TC powercore firewire. All this was done within Cubase SX. After that I mastered these loops within Wavelab using several plugins: fades, equalising, multiband compressing, limiting & dither.
Audio
-
Original
-
Generated Without AMG
- Generated With AMG
Spectrograms
-
Original
-
Generated Without AMG
Similarity Matrices
- Generated Without AMG
Example 3 (ID 5131)
Prompt: "ATTACK loop 140 bpm-00.wav" till "ATTACK loop 140 bpm-31.wav" are all part of the "ATTACK LOOP 6" sample package and belong together as they are all variations on the same 1 measure 4/4 140 bpm drumloop. The loop has a techno-trance feel. The first four loops (00 till 03) contain some variations of the pure drumloop, where 00 is the most minimal and 03 the fullest. All other variations add other sound effects, some of them being sounds with a certain pitch, mostly C. These loop are suitable for your trance and techno productions. They were created using the Waldorf Attack VSTi within Cubase SX. Mastering (EQ, Stereo Enhancer, Multi-Band expand/compress/limit, dither, fades at start and/or end) done within Wavelab.
Audio
-
Original
-
Generated Without AMG
- Generated With AMG
Spectrograms
-
Original
-
Generated Without AMG
Similarity Matrices
- Generated Without AMG
Example 4 (ID 5375)
Prompt: Recorded direct with a Peavey Dynabass in passive mode, active mode EQ is nice but noisy as hell so I never use it. Ran the bass through my Zoom bass processor and played all notes on E string up to 16th fret then went the rest of the way up the strings and onto highest fret on G string.
Audio
-
Original
-
Generated Without AMG
- Generated With AMG
Spectrograms
-
Original
-
Generated Without AMG
Similarity Matrices
- Generated Without AMG
Example 5 (ID 6197)
Prompt: Multisamples created with subsynth. Eerie horror film sound in middle and higher registers. Normalized and converted to AIFF in cool edit 96. File name indicates frequency for example: HORROC04.aif= C4, where last 3 characters are "C04"
Audio
-
Original
-
Generated Without AMG
- Generated With AMG
Spectrograms
-
Original
-
Generated Without AMG
Similarity Matrices
- Generated Without AMG