How to Generate Music with ACE-Step 1.5 on Consumer GPUs

*ACE-Step 1.5 generates commercial-quality full songs in under 2 seconds on an A100 with only 4GB VRAM*

ACE-Step 1.5 is an open-source music foundation model that generates commercial-quality full songs in under two seconds on an A100 and under ten seconds on an RTX 3090. Just like BitNet.cpp runs efficient AI on consumer CPUs, ACE-Step 1.5 brings music generation to local hardware with less than 4GB of VRAM. It supports lightweight personalization through LoRA.

Project link: github.com/ace-step/ACE-Step-1.5

Performance Comparison

ACE-Step 1.5 achieves quality beyond most commercial music models while being dramatically faster and running on far cheaper hardware.

Metric	Suno / Udio (Cloud)	ACE-Step 1.5 (Local)
Cost	$10/month subscription	Free
Hardware required	Cloud servers	4GB VRAM GPU (RTX 3090, A100)
Generation speed	30-60 seconds	<2 seconds (A100), <10 seconds (3090)
Maximum length	~2-4 minutes	Up to 10 minutes
Languages	Limited	50+ languages
LoRA personalization	No	Yes
Covers, stems, repaint	No	Yes

Architecture

At its core lies a novel hybrid architecture where the Language Model functions as an omni-capable planner. It transforms simple user queries into comprehensive song blueprints, scaling from short loops to 10-minute compositions, while synthesizing metadata, lyrics, and captions via Chain-of-Thought to guide the Diffusion Transformer.

User Prompt: "Upbeat pop song in C major, 120 BPM"
     ↓
LM Planner generates comprehensive song blueprint
  → Lyrics, chord progression, arrangement, metadata
     ↓
Diffusion Transformer (DiT) generates audio
  → Guided by blueprint from LM Planner
     ↓
Output: Full song with precise stylistic control

The alignment between the LM and DiT is achieved through intrinsic reinforcement learning relying solely on the model’s internal mechanisms, eliminating biases inherent in external reward models or human preferences.

Key Capabilities

Feature	Details
Full songs	Up to 10 minutes in a single generation
Batch generation	Generate multiple songs simultaneously
50+ languages	Maintains quality and accent fidelity
Covers	Reinterpret existing songs in new styles
Stem separation	Isolate vocals, instruments, drums
Repaint	Replace sections of generated audio
Vocal-to-BGM conversion	Extract background music from vocals
Metadata control	Set BPM, key, genre, mood explicitly
LoRA personalization	Fine-tune on a few songs for custom style

*Threads post by @simplifyinai showcasing ACE-Step 1.5 capabilities*

Installation

Prerequisites

NVIDIA GPU with 4GB+ VRAM (RTX 3060 minimum, RTX 3090 recommended)
CUDA 12.4+
Python 3.10+

Setup

git clone https://github.com/ace-step/ACE-Step-1.5.git
cd ACE-Step-1.5

# Create conda environment
conda create -n acestep python=3.10
conda activate acestep

# Install dependencies
pip install -r requirements.txt

# Download pretrained model
python scripts/download_model.py

Quick Start

from acestep import ACEInference

model = ACEInference.from_pretrained("ace-step/ace-step-1.5")

# Generate a full song
result = model.generate(
    prompt="An energetic electronic dance track with a driving beat",
    duration_seconds=120,
    bpm=128,
    key="C#m"
)

# Save output
result.save("output.wav")

Batch Generation

prompts = [
    "Calm piano ballad, 70 BPM",
    "Aggressive rock guitar riff, 140 BPM",
    "Lo-fi hip hop study beat, 85 BPM"
]

results = model.generate_batch(prompts, duration_seconds=60)

LoRA Personalization

Train a custom style LoRA from just a few reference songs:

python scripts/train_lora.py \
    --reference_songs /path/to/songs/ \
    --output_dir ./lora-checkpoints

Editing Features

ACE-Step 1.5 unifies precise stylistic control with versatile editing capabilities:

Feature	Description
Cover generation	Reinterpret a song in a different genre or style
Repainting	Replace specific sections while preserving the rest
Vocal-to-BGM	Extract isolated background music
Extension	Extend a short loop into a full composition

All while maintaining strict adherence to prompts across 50+ languages.

Hardware Support

GPU	VRAM	Generation Speed	Quality
A100 (80GB)	4GB minimum	<2 seconds per song	Commercial-grade
RTX 3090	24GB	<10 seconds per song	Commercial-grade
RTX 4090	24GB	<5 seconds per song	Commercial-grade
RTX 3060	12GB	~30 seconds per song	Good
RTX 3060 (6GB)	6GB	~45 seconds per song	Decent

[!NOTE]

ACE-Step 1.5 also supports Mac (MPS), AMD (ROCm), and Intel (XPU) devices, though performance varies. CUDA GPUs provide the best experience.

*Community reactions to ACE-Step 1.5 with praise and skepticism*

What the Community Is Saying

The reactions highlight the divide between AI music enthusiasts and traditional musicians:

“You’re paying your electric bill once a month when there’s an open-source model that runs on your own biological GPU and sounds just as good? Creative intuition generates commercial-quality full songs instantly. Runs on 4 cup coffee minimum. Unlimited length songs, batch generation, unlimited languages, covers, stem separation, metadata control.” — @_notsafeforyou

“Yeah it’s nowhere near Suno.” — @xmies23

“You’re using AI to regurgitate slop music for you because you don’t want to put in the work learning how to make music yourself, which long term is far more gratifying and rewarding? Uh, well, maybe don’t do that then.” — @marcreevessoundengineer

The polarized reactions mirror the broader debate around AI-generated music. ACE-Step 1.5 is not a replacement for human creativity, it is a tool that lowers the barrier to music production. For producers, the LoRA personalization and precise metadata control (BPM, key, genre) are genuinely useful features that fit into existing workflows. For musicians who value the creative process, it raises valid questions about authenticity.

Use cases for music creators

Producers: Generate backing tracks, drum loops, and chord progressions at specific BPM and key, then import into your DAW
Content creators: Generate custom background music for videos with precise mood control
Songwriters: Use cover and repaint for iterative song development
Game developers: Batch generate soundtracks with consistent style across tracks

ACE-Step 1.5 by ace-step is for musicians, creators, and AI developers. Like building a local multi-agent workforce with Eigent, it solves the problem of expensive, cloud-based AI by enabling fast, high-quality song creation locally on consumer GPUs with 4GB VRAM minimum, no subscription fees, and full offline access.

If you enjoy articles about top GitHub repositories like this, don’t forget to subscribe to Technolati.com.

Related Tutorials:

How to Generate Real-World Minecraft Maps from OpenStreetMap with Arnis
Arnis turns real-world locations into playable Minecraft maps. The open-source tool pulls data directly from OpenStreetMap to generate accurate Minecraft..
How to Make AI Song for Free
Have you ever wanted to create your own music but felt that it was too hard, too expensive, or required..

generate Music

How to Generate Music with ACE-Step 1.5 on Consumer GPUs

Performance Comparison

Architecture

Key Capabilities

Installation

Prerequisites

Setup

Quick Start

Batch Generation

LoRA Personalization

Editing Features

Hardware Support

What the Community Is Saying

About the author

Agus L. Setiawan

Get in touch

Technolati.com

Performance Comparison

Architecture

Key Capabilities

Installation

Prerequisites

Setup

Quick Start

Batch Generation

LoRA Personalization

Editing Features

Hardware Support

What the Community Is Saying

About the author

Agus L. Setiawan

Read more

Get in touch

Technolati.com