Mary Had a Little Lamb*

The PublishingLab is currently researching an experimental branch of the Hybrid Publishing Workflow: digital audiobooks. Nowadays, these are basically composed of audio files and meta tags and allow bookmarking, meaning that the reader can resume from the point they left the last reading session. Audiobooks make written work accessible to people with vision impairment but are also used by commuters and sport practitioners, as an alternative to reading.

phonograph Audiobooks are not recent: recording and reproducing spoken words is possible since the invention of the phonograph (by Thomas Edison, in 1877). However, the amount of recording time in a phonograph cylinder was no longer than a few minutes. It was not before the 1930s that it became possible to record novels. Back then, they were called talking books.

*Edison’s 1877 tinfoil recording of Mary Had a Little Lamb while testing the phonograph is reported to be the first instance of recorded verse. This recording no longer exists. During the Golden Jubilee of the Phonograph ceremony, in 1927, Edison makes a reference to the original recording. You can listen to it in the Internet Archive.

Jumping back to the present: digital reading experiences exist in a myriad of formats. Some are responsive or fluid – they (can) change or adapt according to external factors (be it a user preference, a device setting, etc) – others are not. EPUB and HTML are examples of fluid formats. PDF, on the other hand, has fixed formatting: whether you print it or see it on screen – big or small – you will see exactly the same output. In this sense, audiobooks can be seen as the audio counterpart of pdfs: readers will get the same output independently from the device or any user setting (like choosing a different voice or changing the narration speed). In short, the workflow to create an audiobook would consist of the following steps:

  • text-to-speech conversion – this step will produce audio files. Ideally, one file per chapter
  • audio files concatenation – this step will merge the files produced in the previous step into one file, adding to it a text file containing metadata like chapters titles and durations
  • file renaming – this step will produce a m4b file (bookmarkable audio file format, MPEG-4)

In practice, these steps will be unfolded into more specific tasks but the steps above can already give an overview of the process. The first step – converting text to speech – is where the design decisions will be made. Using the pdf comparison again, this would be the moment when the font-family, the font-size, the color palette and the margins are defined. In the audiobook it would mean, for instance, choosing the voice used to narrate the text and defining how to deal with images, links and references. The idea of adding an output to an automated workflow indicates already that we intend to use synthetic voices – instead of hiring human narrators. But more on this topic in a future blog post.

A brief online search about the state of the market for audiobooks indicates that it is growing:

Audiobook Titles Produced between 2010 and 2014

2010 2011 2012 2013 2014
6200 7237 16309 24755 25787


These reports indicate that two pillars of the current growth in the audiobook market seem to be the employment of celebrities as narrators and the genre (fiction). It is worth mentioning that none of them apply to the audiobooks we will make. Nevertheless, it can be an interesting experiment and can yield important insights on this alternative mode of reading.

Tags: ,