Machine Learning Audio Books

I really enjoy audiobooks. I grew up on them and find that they are a great way to continue learning and to be entertained. However, there is a huge body of literature that isn't available in that format.

When my son was born, I got the idea to record fairy tales, and short stories for his enjoyment, and that idea grew into a program that enabled me to import and quickly edit short stories from public domain sources. Then, to convert them to audio using Google Text-to-Speech API, and finally to format slides, and arrange the text and audio as a video which could be uploaded to YouTube.

The application was written in PHP using the Laravel framework. It managed the imported stories, and the editing progress from raw text to a script, then handled conversion to audio, and arrangement on slides.

One of the fun challenges to this project was handling heteronyms, and abbreviations.

Special consideration and design were applied to identify those words and selecting the proper pronunciation.

Abbreviations required similar attention. As an example, is "No." a response, or an abbreviation for a number? JavaScript and a custom API allowed me to access a created directory of abbreviations and heteronyms and suitable options for substitution.

With these considerations designed for, it made the editing process much simpler, and faster.

The final steps were to convert the lines to audio, match the length of the audio clip to the display time of the slide, and, finally, generate a video file that could be played back.

While the ML generated audio does leave some room for improvement, the ability to convert any body of text to audio makes up for it.

This project is at a stage of completion, but ongoing. I have plans to return to it later for some additional features.

Audio Books YouTube Channel

2022