How Do People Create Music with AI?: A Glimpse into AI Music Creation through Multimodal Data
1. Intro
Ever had that itch to make music? It's a universal muse that many of us entertain at least once. But why do so few of us act on it? Likely, the daunting path of mastering instruments and grappling with music theory, harmony, and MIDI seems too steep.
Yet, the narrative is shifting. AI is bridging the gap between us and music. Gone are the days of wrestling with an F chord or sprouting musical notes from scratch. If you can articulate your musical vision, AI can bring it to life. Missing the romance? Consider the pure desire to create music – isn't that the essence of romance?
Let's delve into the changing landscapes of music creation, leveraging the rich data from MixAudio.
As of February 2024, x, which made its beta debut in November 2023, stands at the forefront of AI music creation tools. It pioneers a multimodal approach, enabling music generation through prompts, images, and audio inputs. It boasts over 12,000 users and generates over 360,000 tracks without prominent advertising, demonstrating its appeal.
2. Multimodal Music Creation: How It's Done
MixAudio offers three primary avenues for music creation: prompts, images, and audio inputs. Think of prompts as a ChatGPT for music, crafting tunes from textual descriptions. Images let users upload or link to visuals, inspiring AI to compose matching music. Audio inputs, whether through files or URLs, serve as reference tracks for AI to analyze and emulate in mood or genre. Let's explore how users are leveraging these tools.
Usage Frequency: The Preferred Method

The verdict? Prompts take the lead. Despite the mix-and-match of methods, prompts alone accounted for 60% of the usage. They were followed by a combination of prompts and audio, at about 9%. Familiarity with natural language-based AIs like ChatGPT might steer this preference, making prompts the go-to for music generation.
User Satisfaction: What Works Best?

Usage doesn't always correlate with satisfaction. An intriguing twist in our data reveals that satisfaction peaks with single audio inputs. This is closely followed by combining all three inputs and then pairing text and images. This suggests that multimodal inputs fine-tune the alignment with users' expectations, particularly when the music they seek is hard to pin down in words alone.
3. What Kind of Music Are People Making?
The essence of the music being created raises curiosity. What prompts are popular? What kind of images are frequently used? And what about audio references? Let's dive into the data.
Prompt Keywords: What's Being Asked For?

The keywords 'Piano' and 'Calm' dramatically lead the pack, followed by 'Jazz', 'Quiet', 'Slow', highlighting a preference for serene and mellow music. This trend seems driven by users seeking functional music for activities like studying or sleeping, where the music serves a background purpose.
Image Inputs(Caption): What Visuals Inspire Music?

Images processed by MixAudio give us a glimpse into user intentions, with frequent keywords like 'sitting', 'woman', 'man', indicating a demand for music that complements human-centric visuals and scenarios.
Audio References(Caption): What Sounds Inspire Creation?

Audio inputs show a diverse array of genre and mood keywords, unlike the more focused trends seen with prompts and images. This variety suggests users have specific sounds in mind, often for content creation that demands a precise musical match.
4. When Is Music Being Made?

Analysis reveals peak creation times on Tuesdays, Wednesdays, and Thursdays, especially between 2-5 PM and 9-11 PM. This pattern suggests a significant use of MixAudio for enhancing productivity during work hours, underscoring the tool's practicality in business applications.
5. Outro
We've crossed into an era where creating music with AI is as simple as having a conversation, sharing an image, or playing a tune. Why not explore this new landscape and craft your unique playlist or soundtrack for your videos and games? All it takes is imagination and an openness to what AI, like MixAudio, offers. While it may not be perfect from the start, the journey promises to be liberating, especially for those battling an F chord. Welcome to the new age of music creation - where everyone can be a musician. Let’s go and try!