SongGLM: Lyric-to-Melody Generation with 2D Alignment Encoding and Multi-Task Pre-Training


Paper      Code

Authors

* Corresponding Author

Abstract

Lyric-to-melody generation aims to automatically create melodies based on given lyrics, requiring the capture of complex and subtle correlations between them. However, previous works usually suffer from two main challenges: 1) lyric-melody alignment modeling, which is often simplified to one-syllable/word-to-one-note alignment, while others have the problem of low alignment accuracy; 2) lyric-melody harmony modeling, which usually relies heavily on intermediates or strict rules, limiting model's capabilities and generative diversity. In this paper, we propose SongGLM, a lyric-to-melody generation system that leverages 2D alignment encoding and multi-task pre-training based on the General Language Model (GLM) to guarantee the alignment and harmony between lyrics and melodies. Specifically, 1) we introduce a unified symbolic song representation for lyrics and melodies with word-level and phrase-level (2D) alignment encoding to capture the lyric-melody alignment; 2) we design a multi-task pre-training framework with hierarchical blank infilling objectives (n-gram, phrase, and long span), and incorporate lyric-melody relationships into the extraction of harmonized n-grams to ensure the lyric-melody harmony. We also construct a large-scale lyric-melody paired dataset comprising over 200,000 English song pieces for pre-training and fine-tuning. The objective and subjective results indicate that SongGLM can generate melodies from lyrics with significant improvements in both alignment and harmony, outperforming all the previous baseline methods.

Lyric-to-Melody Generation Challenges

SongGLM Overview

Given the paired lyric-melody dataset, we first establish two relationships between lyrics and melodies based on their representative features, and incorporate these relationships into n-gram extraction to select the most harmonized n-grams. Then, we introduce a unified symbolic song representation with 2D alignment encoding and adopt a multi-task pre-training framework that employs hierarchical blank infilling objectives for lyric-to-melody generation.

Detailed Framework

Lyric-Melody Dataset

Public Corpus Website Raw Processed
NES https://www.kaggle.com/datasets/imsparsh/nes-mdb-dataset
https://github.com/chrisdonahue/nesmdb
可点击下载的图片 可点击下载的图片
POP909 https://github.com/music-x-lab/POP909-Dataset 可点击下载的图片 可点击下载的图片
MTCL https://www.liederenbank.nl/mtc 可点击下载的图片 可点击下载的图片
Wikifonia http://www.wikifonia.org
http://www.synthzone.com/files/Wikifonia/Wikifonia.zip
可点击下载的图片 可点击下载的图片
Session https://thesession.org 可点击下载的图片 可点击下载的图片
LMD https://colinraffel.com/projects/lmd 可点击下载的图片 可点击下载的图片
SymphonyNet https://symphonynet.github.io 可点击下载的图片 可点击下载的图片
MetaMIDI https://zenodo.org/record/5142664 可点击下载的图片 可点击下载的图片

Web Collections Website Raw Processed
MuseScore https://musescore.org
https://github.com/Xmader/musescore-dataset
可点击下载的图片 可点击下载的图片
Hooktheory https://www.hooktheory.com
https://github.com/wayne391/lead-sheet-dataset
可点击下载的图片 可点击下载的图片
BitMidi https://bitmidi.com 可点击下载的图片 可点击下载的图片
FreeMidi https://freemidi.org
https://github.com/josephding23/Free-Midi-Library
可点击下载的图片 可点击下载的图片
KernScores http://kern.ccarh.org 可点击下载的图片 可点击下载的图片
Kunstderfuge https://www.kunstderfuge.com 可点击下载的图片 可点击下载的图片
ABC Notation https://abcnotation.com 可点击下载的图片 可点击下载的图片