Pianoroll-Event: A Novel Score Representation for Symbolic Music

Lekai Qian^*, Haoyu Gu^*, Dehan Li^*, Boyu Cao^†, Qi Liu^†

School of Future Technology, South China University of Technology

^*Equal contribution. ^†Corresponding authors.

ICASSP 2026

Abstract

Symbolic music representation is a fundamental challenge in computational musicology. While grid-based representations effectively preserve pitch-time spatial correspondence, their inherent data sparsity leads to low encoding efficiency. Discrete-event representations achieve compact encoding but fail to adequately capture structural invariance and spatial locality. To address these complementary limitations, we propose Pianoroll-Event, a novel encoding scheme that describes pianoroll representations through events, combining structural properties with encoding efficiency while maintaining temporal dependencies and local spatial patterns. Specifically, we design four complementary event types: Frame Events for temporal boundaries, Gap Events for sparse regions, Pattern Events for note patterns, and Musical Structure Events for musical metadata. Pianoroll-Event strikes an effective balance between sequence length and vocabulary size, improving encoding efficiency by 1.36× to 7.16× over representative discrete sequence methods. Experiments across multiple autoregressive architectures show models using our representation consistently outperform baselines in both quantitative and human evaluations.

Pianoroll-Event Data Representation

The process of converting pianoroll representation into pianoroll-events. Through frame segmentation, partitioning, and compression operations, the pianoroll is transformed into a sequence of pianoroll-events containing diverse event types.

Encoding Efficiency Comparison

Method	ℓ (Length)	V (Vocab)	BDI ↓	vs. Ours ↓
Ours	749.8	347	1.048	1.00×
REMI	1339.7	330	3.261	3.11×
MIDILike	1398.9	448	4.143	3.96×
REMI-BPE	317.8	20,000	1.429	1.36×
ABC Notation	2575.0	128	7.504	7.16×

Table 1. Encoding efficiency comparison. BDI (Budget-Aware Difficulty Index) = ℓ² × √V captures both computational complexity and vocabulary size. Lower is better.

Generated Songs

The following musical pieces are generated by our best-performing Transformer decoder model with Llama architecture, conditioned only on time signature and BPM. The results demonstrate the considerable potential of our representation method.

Pianoroll-Event: A Novel Score Representation for Symbolic Music

Abstract

Pianoroll-Event Data Representation

The process of converting pianoroll representation into pianoroll-events. Through frame segmentation, partitioning, and compression operations, the pianoroll is transformed into a sequence of pianoroll-events containing diverse event types.

Encoding Efficiency Comparison

Generated Songs

MIDI Demo

LSTM

GPT SMALL

GPT LARGE

LLAMA