Wednesday, January 22, 2014

Stackdump, an offline browser for StackExchange sites

Inline image 1

Stackdump was conceived for those who work in environments that do not have easy access to the StackExchange family of websites. It allows you to host a read-only instance of the StackExchange sites locally, accessible via a web browser.

Stackdump comprises of two components – the search indexer (Apache Solr) and the web application (written in Python). It uses the StackExchange Data Dumps as its source of data.

Stackdump (the application, not the content) is licensed under the MIT License. The content is obviously licensed under the cc-wiki license.

System requirements
Stackdump was written in Python (2.5 or later but not 3) and leverages Apache Solr, which requires Java (6 or later). It was written and tested on CentOS, but should work on other Linux distributions too. It should also work on Windows and OSX, but the start scripts will need some tweaking, particularly on Windows.

Having 3GB of RAM, at least 20-30GB of space, and around 10 hours is recommended if you're planning on importing the largest StackExchange site, Stack Overflow from August 2012. The other sites require much fewer resources, but 3GB of RAM is still recommended.

For the September 2013 data dump of Stack Overflow, it took just over 23 hours with a VM with roughly the same resources, using the latest version of Stackdump, v1.2.

Read more: