Neural Music Generator

🧠 How Does This Work?

What is a Transformer?

A transformer is a type of neural network that's revolutionizing artificial intelligence. Think of it as a super-smart pattern recognition system that can understand relationships between pieces of information, no matter how far apart they are in a sequence.

How Does It Generate Music?

Our music transformer works in simple steps:

Input Layer: Takes in the previous musical notes (like C, D, E, etc.)
Attention Layers: The "brain" of the system - it learns which notes sound good together and which patterns create melody. It pays "attention" to the relationships between notes, like how a catchy chorus relates to the verse.
Output Layer: Predicts what the next note should be based on everything it's learned

The Same Technology Powers ChatGPT!

This exact same architecture powers ChatGPT and other large language models! Instead of musical notes, ChatGPT uses transformers to understand and generate text. The "attention mechanism" helps it understand context - for example, knowing that "bank" means something different in "river bank" versus "savings bank."

How Are We Making Music With Code?

Here's the magic happening behind the scenes:

Rust Code: The transformer runs in highly efficient Rust code, compiled to WebAssembly (WASM) so it runs at near-native speed in your browser
Attention Weights: Mathematical weights determine how much each previous note influences the next one
Probability Distribution: Instead of random notes, the AI calculates which notes are most "musical" based on consonance (notes that sound good together)
Audio Synthesis: The selected notes are converted into actual sound waves using different waveforms for Piano, Guitar, and Robo sounds

The Math Behind Transformers

At the heart of every transformer is the attention mechanism, which uses these mathematical operations:

Attention(Q, K, V) = softmax(^{QK^T}⁄_{√d_k}) V

Breaking down softmax:

softmax(z_i) = ^{e^z_i}⁄_{Σ_j=1ⁿ e^z_j}

where z = ^{QK^T}⁄_{√d_k}

What does this mean in plain English?

The softmax function is the "decision maker" that turns raw scores into probabilities. Here's how it works:

e^z_i (Euler's number to the power of each score) - Amplifies differences between scores. High scores become much higher, low scores stay low.
Σ (Sigma) - Sums up all the amplified scores across all positions (j=1 to n)
Division - Each amplified score is divided by the total, guaranteeing all values sum to exactly 1.0 (a perfect probability distribution)

Why is this magical? Imagine you're at a party and trying to decide who to listen to. Softmax is like your brain calculating: "Person A is pretty interesting (score: 0.5), Person B is fascinating (score: 2.0), Person C is boring (score: -0.3)." The softmax converts these into percentages: you'll pay 20% attention to A, 70% to B, and 10% to C.

In music, this means when predicting the next note, the transformer doesn't just pick the "best" note—it creates a probability distribution across all possible notes. A note that fits the harmony might get 40% probability, a consonant note gets 30%, and dissonant notes get only 5% each. This creates musically intelligent randomness—the AI can explore creative options while staying mostly in tune!

Key matrices explained:

Q (Query) = "What am I looking for?" (the current musical context)
K (Key) = "What do I have available?" (all previous notes in memory)
V (Value) = "What's the actual musical information?" (the note frequencies and patterns)
d_k = Dimension of the key vectors (prevents numbers from getting too large)

This formula calculates how much "attention" each note should pay to every other note in the sequence, creating weighted relationships that capture musical patterns!

Why Does Everything Sound Piano-Like?

You might notice all three instruments (Robo, 80s, Old Nokia) produce similar piano-like tones. This is a fundamental limitation of running transformers in the browser. Unlike large-scale AI models that can generate realistic human voices and complex timbres, our browser-based transformer is constrained by:

Limited computational power: Full audio synthesis transformers require massive GPU processing
Simple waveform generation: We use basic sine waves with harmonics rather than complex audio modeling
Sequence-based prediction: The transformer predicts note sequences (like MIDI), not raw audio waveforms
Browser constraints: WebAssembly and Web Audio API have performance limits compared to native audio engines

Think of it this way: our transformer is like a smart musician playing a basic synthesizer, choosing which notes to play intelligently, but limited to simple electronic sounds. Professional AI music models (like OpenAI's Jukebox or Google's MusicLM) process millions of audio samples and require powerful servers—far beyond what runs in your browser!

Every time you click "Generate Music," the transformer creates a unique 96-note sequence by repeatedly predicting the next best note. The result? AI-generated music that sounds pleasant because it's learned the basic rules of harmony and melody!

About

Main Developer:

Overseeing Professor:

Source Code: