Monday, January 03, 2011

Cassandra vs MongoDB vs CouchDB vs Redis vs Riak vs HBase comparison

While SQL databases are insanely useful tools, their tyranny of ~15 years is coming to an end. And it was just time: I can't even count the things that were forced into relational databases, but never really fitted them.

But the differences between "NoSQL" databases are much bigger than it ever was between one SQL database and another. This means that it is a bigger responsibility on software architects to choose the appropriate one for a project right at the beginning.
In this light, here is a comparison of Cassandra, Mongodb, CouchDB, Redis, Riak and HBase:

CouchDB
Written in: Erlang
Main point: DB consistency, ease of use
License: Apache
Protocol: HTTP/REST
Bi-directional (!) replication,
continuous or ad-hoc,
with conflict detection,
thus, master-master replication. (!)
MVCC - write operations do not block reads
Previous versions of documents are available
Crash-only (reliable) design
Needs compacting from time to time
Views: embedded map/reduce
Formatting views: lists & shows
Server-side document validation possible
Authentication possible
Real-time updates via _changes (!)
Attachment handling
thus, CouchApps (standalone js apps)
jQuery library included
Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.
For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.

MongoDB
Written in: C++
Main point: Retains some friendly properties of SQL. (Query, index)
License: AGPL (Drivers: Apache)
Protocol: Custom, binary (BSON)
Master/slave replication
Queries are javascript expressions
Run arbitrary javascript functions server-side
Better update-in-place than CouchDB
Sharding built-in
Uses memory mapped files for data storage
Performance over features
After crash, it needs to repair tables
Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.
For example: For all things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.

Read more: Kristóf Kovács