Researching existing tools

The first step while trying to figure out how to approach the subject was to understand how Wikimedia dealt with generating books from its articles. We discovered a wide range of existing tools and decided to test them with pages from various Wikimedia projects. We discovered that Wikimedia first concerned itself with printing content at the end of 2007, when they partnered with the German software development company PediaPress to provide a print on demand service.

This technology is of key strategic importance to the cause of free education world-wide. ” Sue Gardner, former Executive Director of the Wikimedia Foundation, said at the time.

From the tools we found, we chose six that exported to epub, three of them being mediawiki extensions and the rest being external to the wiki software: the Book Creator on Wikieducator, WSexport, EPubExport, Grabmybooks, Pandoc and the Sausage Machine. The result of the conversion of wiki pages into EPUBs was sometimes close to our expectations, for example the experience we had with WSexport, which structures exported EPUB files quite well and GrabmyBooks, a tool that allows multiple websites to be turned into EPUBs. Nonetheless, each tool had its shortcomings. Below is a description of our findings.

WSexport was developed by the user Tpt, an administrator of the French Wikisource projects. He has already conducted many tests on this tool, which is currently hosted by the Wikimedia labs. The interface is similar to its counterpart, the Book Creator on Wikipedia, but faces the same issues regarding visibility of the option and integration into the wiki.


Example of WSexport EPUB


Grabmybooks is a html to EPUB converter. The generated ebooks exclude headers, footers and menus from the export. The Firefox plugin functions by grabbing the content from the tabs open in the browser. There are still some problems at the structural level of the outcome, for example when it comes to tables and the size of the images. Although the tool is not optimised for Wikimedia content, it’s the only one that we found which allows the user to collect articles from all the wiki projects. One of the features that we found interesting, but wasn’t available with the other tools was the option to upload a book cover.


Example of GrabMyBooks EPUB

Pandoc is the “swiss-army knife” of markup document conversion. Pandoc can generate an EPUB file from an XML file, but the formatting is not ideal and it cannot embed images in the exported EPUB.

Although Wikieducator is not part of the Wikimedia organisation, it is the place where the Book Creator tool was originally implemented. We believe that the tool as it is currently is the same version of the Wikipedia Book Creator which was replaced in 2014. It largely shares the same problems as the other Book Creator versions: usability, placement, difficulty rendering tables.

The Sausage Machine was made by Gottfried Haider during his time working at the PublishingLab. It’s a pipeline between Pandoc and Git using the markdown language and makefiles. Due to the fact that markdown is a specific instance of markup language, which is different from the mediawiki markup, we ran into some expected difficulties exporting the file. Perhaps by adjusting it to mediawiki markup, the Sausage Machine could be an interesting tool to build upon.

EPubExport is a mediawiki extension that has not been updated since 2010. From the existing documentation, it seems like the possibilities were limited to only exporting one article at a time.

What we were missing in all of them was an ability to make the experience of choosing specific content within an article closer to the preferences of the user.  Many questions derived from testing the tools that are now shaping the potential outcome of this research. How can the tool improve the experience of navigating wikis and retaining information? How can it help customize Wikimedia to specific user needs? Should the tool be hosted on the WMF servers? How can we make the experience of putting the articles together easier?


Location of the Book Creator extension


Moving on to the de facto book generator tool on Wikipedia, Book Creator, we were curious about why the EPUB function had been dropped. Was there no interest from the community? Or was there a lack of technical support?

The Book Creator is a tool available on most Wikimedia projects which allows both registered and unregistered users to collect a number of articles and export them as a PDF file. It’s currently deprecated with a sign announcing the indefinite state of the existing issues.



As recounted by the last remaining project maintainer, C. Scott Ananian, the Book Creator was an initiative of the PediaPress, an online service for creating customized printed material from Wikimedia content. The partnership was announced in a press release on the 13th of December 2007. The research and development were funded by the Open Society Institute and the product was released as open source to be implemented in any wiki.

The purpose of the tool was to provide learning assistance for residents of areas without an internet connection. It was first released on the WikiEducator wiki platform, which is based on the same mediawiki software as all the Wikimedia projects, and was the ultimately installed on Wikipedia in 2008.In 2012 the ePub and ZIM export options were made possible by brainbot technologies, the same software company behind PediaPress.

The Book Creator ran both on the servers of the WMF (Wikimedia Foundation) in Tampa and on the PediaPress servers, connecting printing requests from Wikimedia projects to the publisher. When the WMF moved their servers to a new location, the specifics of the configuration were not known and so the tool could not be replicated again.However, the efforts of multiple Wikimedians, of which Ananian was a part of, lead to the reissue of the code in 2014 that would comply with the then latest server architecture. Unfortunately, due to lack of technical support the ePub and ZIM functions were dropped.


The tool as it was until 2014 is still active on the WikiEducator site.