🔙

About

This tool was created for CEE 4803 (Art & Generative AI) at the Georgia Institute of Technology.

It pairs with our Libre textbook, AI Fundamentals.

Main Developer:

Kenneth (Alex) Jenkins

Overseeing Professor:

Dr. Francesco Fedele

Source Code:

ML Visualizer (GPLv3)

🎵 Neural Music

AI-Generated Music in Your Browser

120 BPM
?
Random Spacing - Code Implementation
Controls whether the transformer inserts random silence (rests) between musical phrases.
🎵 ON (Default): Adds 1-2 rest notes every 4-8 notes (60% probability), creating natural-sounding phrases with breathing room
🎹 OFF: Continuous stream of notes with no gaps - one long uninterrupted melody
When enabled, rests are inserted as special REST_NOTE values (999) that generate silence in the audio output.
?
Melodic Mode - Code Implementation
Controls the transformer's consonance weights applied to interval distances when predicting the next note.
🎹 Harmonic (OFF): Favors octaves (95%), fourths (95%), unison (100%), thirds (90%)
🎵 Melodic (ON): Strongly prefers steps 1-2 notes apart (100%), penalizes large leaps (30%), reduces unison to 80% to avoid repetition
These weights multiply with attention scores to create the probability distribution for note selection. Higher weight = higher chance of being chosen.
44100 Hz
12s
Neural Transformer
Layers: 4
Neurons: 56
Attention: Multi-Head
Sequence: 0/96
Active: 0
Transformer Parameters
Tempo (BPM): 120
Instrument: Robo
Mode: Harmonic
Sample Rate: 44100 Hz
Sequence Length: 0
Duration: 0.0s
Note Duration: 0.0s
Total Samples: 0
Notes (Scale): 8
Target Duration: 12.0s
Waveform
0:00 / 0:12
Ready to generate music

🧠 How Does This Work?

Transformer Architecture for Music Generation Input 8 Notes Previous notes Attention 1 16 Neurons Weighted relationships Attention 2 16 Neurons Pattern recognition Output 8 Notes Next note Analyze Learn Predict

What is a Transformer?

A transformer is a type of neural network that's revolutionizing artificial intelligence. Think of it as a super-smart pattern recognition system that can understand relationships between pieces of information, no matter how far apart they are in a sequence.

How Does It Generate Music?

Our music transformer works in simple steps:

  • Input Layer: Takes in the previous musical notes (like C, D, E, etc.)
  • Attention Layers: The "brain" of the system - it learns which notes sound good together and which patterns create melody. It pays "attention" to the relationships between notes, like how a catchy chorus relates to the verse.
  • Output Layer: Predicts what the next note should be based on everything it's learned

The Same Technology Powers ChatGPT!

This exact same architecture powers ChatGPT and other large language models! Instead of musical notes, ChatGPT uses transformers to understand and generate text. The "attention mechanism" helps it understand context - for example, knowing that "bank" means something different in "river bank" versus "savings bank."

How Are We Making Music With Code?

Here's the magic happening behind the scenes:

  • Rust Code: The transformer runs in highly efficient Rust code, compiled to WebAssembly (WASM) so it runs at near-native speed in your browser
  • Attention Weights: Mathematical weights determine how much each previous note influences the next one
  • Probability Distribution: Instead of random notes, the AI calculates which notes are most "musical" based on consonance (notes that sound good together)
  • Audio Synthesis: The selected notes are converted into actual sound waves using different waveforms for Piano, Guitar, and Robo sounds

The Math Behind Transformers

At the heart of every transformer is the attention mechanism, which uses these mathematical operations:

Attention(Q, K, V) = softmax(QKT√dk) V

Breaking down softmax:

softmax(zi) = eziΣj=1n ezj

where z = QKT√dk

What does this mean in plain English?

The softmax function is the "decision maker" that turns raw scores into probabilities. Here's how it works:

  • ezi (Euler's number to the power of each score) - Amplifies differences between scores. High scores become much higher, low scores stay low.
  • Σ (Sigma) - Sums up all the amplified scores across all positions (j=1 to n)
  • Division - Each amplified score is divided by the total, guaranteeing all values sum to exactly 1.0 (a perfect probability distribution)

Why is this magical? Imagine you're at a party and trying to decide who to listen to. Softmax is like your brain calculating: "Person A is pretty interesting (score: 0.5), Person B is fascinating (score: 2.0), Person C is boring (score: -0.3)." The softmax converts these into percentages: you'll pay 20% attention to A, 70% to B, and 10% to C.

In music, this means when predicting the next note, the transformer doesn't just pick the "best" note—it creates a probability distribution across all possible notes. A note that fits the harmony might get 40% probability, a consonant note gets 30%, and dissonant notes get only 5% each. This creates musically intelligent randomness—the AI can explore creative options while staying mostly in tune!

Key matrices explained:

  • Q (Query) = "What am I looking for?" (the current musical context)
  • K (Key) = "What do I have available?" (all previous notes in memory)
  • V (Value) = "What's the actual musical information?" (the note frequencies and patterns)
  • dk = Dimension of the key vectors (prevents numbers from getting too large)

This formula calculates how much "attention" each note should pay to every other note in the sequence, creating weighted relationships that capture musical patterns!

Why Does Everything Sound Piano-Like?

You might notice all three instruments (Robo, 80s, Old Nokia) produce similar piano-like tones. This is a fundamental limitation of running transformers in the browser. Unlike large-scale AI models that can generate realistic human voices and complex timbres, our browser-based transformer is constrained by:

  • Limited computational power: Full audio synthesis transformers require massive GPU processing
  • Simple waveform generation: We use basic sine waves with harmonics rather than complex audio modeling
  • Sequence-based prediction: The transformer predicts note sequences (like MIDI), not raw audio waveforms
  • Browser constraints: WebAssembly and Web Audio API have performance limits compared to native audio engines

Think of it this way: our transformer is like a smart musician playing a basic synthesizer, choosing which notes to play intelligently, but limited to simple electronic sounds. Professional AI music models (like OpenAI's Jukebox or Google's MusicLM) process millions of audio samples and require powerful servers—far beyond what runs in your browser!

Every time you click "Generate Music," the transformer creates a unique 96-note sequence by repeatedly predicting the next best note. The result? AI-generated music that sounds pleasant because it's learned the basic rules of harmony and melody!