This tool was created for CEE 4803 (Art & Generative AI) at the Georgia Institute of Technology.
It pairs with our Libre textbook, AI Fundamentals.
AI-Generated Music in Your Browser
A transformer is a type of neural network that's revolutionizing artificial intelligence. Think of it as a super-smart pattern recognition system that can understand relationships between pieces of information, no matter how far apart they are in a sequence.
Our music transformer works in simple steps:
This exact same architecture powers ChatGPT and other large language models! Instead of musical notes, ChatGPT uses transformers to understand and generate text. The "attention mechanism" helps it understand context - for example, knowing that "bank" means something different in "river bank" versus "savings bank."
Here's the magic happening behind the scenes:
At the heart of every transformer is the attention mechanism, which uses these mathematical operations:
Attention(Q, K, V) = softmax(QKT⁄√dk) V
                    Breaking down softmax:
                    softmax(zi) = ezi⁄Σj=1n ezj
                    where z = QKT⁄√dk
                
What does this mean in plain English?
The softmax function is the "decision maker" that turns raw scores into probabilities. Here's how it works:
Why is this magical? Imagine you're at a party and trying to decide who to listen to. Softmax is like your brain calculating: "Person A is pretty interesting (score: 0.5), Person B is fascinating (score: 2.0), Person C is boring (score: -0.3)." The softmax converts these into percentages: you'll pay 20% attention to A, 70% to B, and 10% to C.
In music, this means when predicting the next note, the transformer doesn't just pick the "best" note—it creates a probability distribution across all possible notes. A note that fits the harmony might get 40% probability, a consonant note gets 30%, and dissonant notes get only 5% each. This creates musically intelligent randomness—the AI can explore creative options while staying mostly in tune!
Key matrices explained:
This formula calculates how much "attention" each note should pay to every other note in the sequence, creating weighted relationships that capture musical patterns!
You might notice all three instruments (Robo, 80s, Old Nokia) produce similar piano-like tones. This is a fundamental limitation of running transformers in the browser. Unlike large-scale AI models that can generate realistic human voices and complex timbres, our browser-based transformer is constrained by:
Think of it this way: our transformer is like a smart musician playing a basic synthesizer, choosing which notes to play intelligently, but limited to simple electronic sounds. Professional AI music models (like OpenAI's Jukebox or Google's MusicLM) process millions of audio samples and require powerful servers—far beyond what runs in your browser!
Every time you click "Generate Music," the transformer creates a unique 96-note sequence by repeatedly predicting the next best note. The result? AI-generated music that sounds pleasant because it's learned the basic rules of harmony and melody!