StoryForge: AI-Powered Narrative with Dynamic Imagery and Voice

Dthirunavukkarasu, M

doi:10.55041/ijsrem32973

Abstract

StoryForge is an AI-powered narrative generation system that utilizes a combination of OpenAI’s GPT-3.5 Turbo model [1], Hugging Face’s Diffusion-based text-to-image gen- erative model Stable Diffusion XL (SDXL) [4], and Hugging Face’s Text-to-Speech Bark model [10] to create dynamic and engaging stories tailored to user-specified constraints. Users can input parameters such as genre, number of characters, character details, key events, mood, tone, point of view, and reader age to guide the narrative generation process. The system leverages the power of GPT-3.5 Turbo to generate the story text [1], SDXL to generate images for key events [4], and Bark to synthesize speech for the narrative [10], providing a comprehensive and immersive storytelling experience. A Streamlit-based user interface enables seamless interaction with the system, allowing users to input constraints, view generated text and images, and listen to the synthesized audio. It transforms user inputs into a multisen- sory narrative experience, marking a significant stride in the intersection of artificial intelligence and imaginative storytelling. StoryForge demonstrates the potential of AI to revolutionize storytelling, empowering users to create personalized and cap- tivating narratives with ease and embark on a journey where words, images, and voices converge to create truly one-of-a-kind tales. Index Terms—StoryForge, GPT-3.5 Turbo, Bark, Stable Dif- fusion XL, Streamlit, Hugging Face, Generative Model, Story- telling, Multisensory

Full Text