Edit-aware incremental decoding
for notated symbolic music generation

A small Transformer that writes sheet music, a browser editor that lets you change a single note in real time, and a KV-cache trick that keeps the model's state coherent without recomputing the prefix.

Abstract

We present ScoreCompose-AI, a system for AI-assisted music composition whose primary output is notation—a Common Western Music score—rather than raw audio or piano-roll. The system couples a small decoder-only Transformer trained on a REMI-like tokenization of MIDI with a browser-based score editor (OpenSheetMusicDisplay) that supports note-level editing in real time.

Our central contribution is edit-aware incremental decoding: by treating each visible note as a contiguous span of the underlying token stream, we can localize any user edit to a specific token offset, truncate the model's KV-cache exactly at that offset, and replay only the changed sub-sequence. Local edits therefore do not require recomputing the model over the entire prefix, and continuation after an edit only re-decodes the suffix.

How it works

  1. 1. Tokenize

    Each visible note maps to a 4- or 5-token span: (⟨bar⟩) POSₚ PITCHₓ DURd VELv. Vocabulary size 124, sixteenth-note grid.

  2. 2. Generate with a small Transformer

    6 layers, 6 heads, d=384. Trained from scratch on MAESTRO v3 in ~6h on a Colab T4. Each forward pass updates a per-layer KV-cache.

  3. 3. Render to notation

    Tokens → music21 stream → MusicXML → OpenSheetMusicDisplay in the browser. Real notation, not piano-roll.

  4. 4. Edit in the score

    The user edits a note. The system re-tokenizes, finds the first differing token index, truncates every layer's KV-cache to exactly that index, and replays the changed tokens — typically <10 — through the model. The state is fully consistent without a cold pass.

  5. 5. Continue or export

    Sample more tokens from the (now-updated) state to extend the score after an edit. Export as MusicXML, MIDI (pretty_midi), or WAV (FluidSynth).

  ┌──────────────────┐  edit op   ┌──────────────────┐  diff   ┌─────────────────┐
  │  OSMD editor     │ ─────────► │  EditEngine      │ ──────► │  KV-cache       │
  │  (in browser)    │            │  (truncate ▸ N)  │         │  truncate(N)    │
  └────────▲─────────┘            └────────┬─────────┘         └────────┬────────┘
           │ rendered MusicXML             │ replay (≈ 5 tokens)        │
           │                               ▼                            ▼
  ┌────────┴─────────┐            ┌──────────────────┐         ┌─────────────────┐
  │  music21 stream  │ ◄───────── │  Note list       │ ◄────── │  ScoreLM        │
  │  + MIDI / WAV    │            │  (source of      │         │  (Transformer)  │
  │                  │            │   truth)         │         │                 │
  └──────────────────┘            └──────────────────┘         └─────────────────┘
    

Latency

Wall-clock latency on an RTX 4060 laptop GPU. Cold = full forward over the prefix (the baseline you'd get without our trick). Replay = edit-aware reconcile after replacing a single note. Continuation = sampling 32 new tokens after the edit.

Sequence lengthCold (ms)Replay (ms)SpeedupContinuation 32 tok (ms)
32 notes 38 9 4.2× 78
128 notes 154 11 14.0× 80
512 notes 618 14 44.1× 83
1024 notes 1290 28 46.1× 91

Measured by scripts/benchmark_edits.py. Reproduce with !python scripts/benchmark_edits.py on Colab.

In-page demo

This is the OSMD renderer used by the live editor, loaded with a short demo score. Try the buttons to transpose locally — the same operation that the live system pipes through the model's KV-cache truncation.

The page-side demo edits a static MusicXML in JavaScript so it runs on GitHub Pages. The full system additionally runs the Transformer's KV-cache truncation server-side; clone the repo and start python -m src.server to see model continuation after edits.

Cite

@misc{park2026scorecompose,
  title  = {ScoreCompose-AI: Edit-Aware Incremental Decoding for
            Notated Symbolic Music Generation with Real-Time Score
            Editing and Audio Synthesis},
  author = {Park, Eun-Ji},
  year   = {2026},
  url    = {https://github.com/rosyrosys/score_compose_ai},
  note   = {v0.1}
}