Generate audio samples using a diffusion model
S-KEY: Self-Supervised Learning of Major and Minor Keys