I tried Google’s artificial intelligence (AI) music generation model Lyria 3. I made songs using sheet music, chord progressions and natural-language descriptions. In short, Lyria 3 shows clear signs of having learned music through text.
Lyria 3 is a music generation AI model developed by Google DeepMind. It has been built into the Gemini app since February. If you enter text or an image, it produces about three minutes of audio that includes vocals and lyrics. You can also specify song structure such as an intro, chorus or transitions via prompts. Audio quality is 44.1 kHz stereo, the default output for audio streaming platforms such as Spotify.
For a first try, I generated a track in the style of Kendrick Lamar’s HUMBLE. I first asked it to break the song down into elements that could be used in a music-generation prompt. It returned a prompt describing hardcore hip-hop and trap, with a tempo of 150 beats per minute, 808 bass, a distortion piano and other musical characteristics. Open sources such as NamuWiki and Wikipedia formed the basis for the analysis.
To avoid copying, I asked it to change only the key from the original E-flat minor to A minor. It produced a trap hip-hop track built around a distorted grand piano, a punchy snare and sharp hi-hats. The repeated single-note low-register piano line made me think, "Anyone would say this sounds like HUMBLE."
It also added rap lyrics. It was hard to understand precisely, but I could not clearly hear the rhythmic flow or rhymes typical of rap. The vocal tone also differed from Kendrick, who is Black. It felt closer to Eminem, a white male rapper. Google has stressed that Lyria 3 "creates tracks through broad inspiration," which seems to mean it follows genre rules while avoiding an artist’s signature traits such as voice.
I tried another variation. I told it to modulate to E-flat major after the first verse chorus, and to add female vocal ad-libs in the style of U.S. R&B singer H.E.R. and an orchestral accompaniment. The transition from intense hip-hop to lyrical orchestra flowed without a jarring mismatch. Its arranging ability exceeded expectations.
◆Can it read sheet music, too?
This time I attached an image of sheet music for the well-known jazz standard Misty. Jazz lead sheets show only melody and chords. I asked for a jazz trio arrangement at a ballad tempo of 80 beats per minute, and for piano improvisation to start right from the chorus.
In a positive sense, it produced an 1980s Korean pop song in a "Yoo Jae-ha style." It also added lyrics: "A single name left on old paper spreads like a coffee stain." At a glance, the meter seemed to fit, but it felt detached from the melody. From the lyrics alone, the meter matched, but it seemed unable to pinpoint where emphasis should fall depending on note placement.
I wanted a more precise test of how well it interpreted sheet music. I directly entered a 24-bar chord progression. "maj7" is a major seventh chord, "Eb" is E-flat, and vertical bars (|) mark measures. I asked it to play in 4/4 time.
The result sounded exactly like a "student who has studied jazz piano for six months." I graduated from a music college and have experience teaching students preparing for a major. After about six months, applicants learn jazz chords that include tensions and modal scales, which are basic scales. If you make them improvise having learned only those two things, you get exactly this kind of result. The notes were not harmonically wrong, but it felt somehow unsophisticated.
I thought through why music made by Lyria 3 gives that impression. The seventh note was repeatedly emphasized in the melody.
The seventh is one of the chord tones, so it is not harmonically incorrect. The problem is that it feels unnatural if the seventh stands out on the beat. Along with the third, it is a tone that forms the backbone of the chord, so the chord color becomes too prominent. In particular, in swing eighth notes, the basic jazz rhythm, a chord tone on the beat provides stability, but it seems to feel that way because the opposite case is mixed in.
◆Creative in its own way, but it needs to overcome the sense of mismatch
Lee Sedol (이세돌), a ninth-dan who played AlphaGo a decade ago, said he was shocked by an early 3-3 move. It is a move you are taught not to play from childhood. Within the rules, AlphaGo played a move a human would not play, and it won.
Unlike Go, where results are proven by wins and losses, music has to be proven by the ear. Lyria 3 showed a degree of creativity within harmonic rules, but an awkward computer-like feel came through in places.
The biggest disappointment was the lack of a motif. A motif is key to making music sound like music. If you think of Mozart’s "Twinkle, Twinkle, Little Star" variations, it repeats a simple motif by varying it with trills, broken chords and fast figures. Music made by Lyria 3 gave the impression of listing notes at random without a motif. There was no narrative arc.
Still, it seems useful as the kind of YouTube background music that avoids copyright. Depending on the case, low-budget commercial advertising music could also be possible.