The Google Research team has published a paper for MusicLM, a machine learning model that generates high-fidelity music from text prompts, and it works extremely well. But they won't release it to the public, at least not yet.
You can browse and play through the examples to listen to results obtained by the research team for a wide variety of text-to-music tasks, including audio generation from rich captions, long generation, story mode, text and melody conditioning, painting caption conditioning, 10s audio generation from text, and generation diversity,
I'm particularly surprised by the text and melody conditioning examples, where a text prompt—say, "piano solo," "string quarter," or "tribal drums"—can be combined with a melody prompt—say "bella ciao - humming"—generating accurate results.
Even when they don't release the model, Google Research has publicly released MusicCaps to support future research, "a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts."