Skip to main content

Comparison of Popular NoSql databases (MongoDb,CouchDb,Hbase,Neo4j,Cassandra)

There are many SQL databases so far.But i personally feel the 15 years history of SQL coming to an end as everyone is moving to an era of BigData. As experts say SQL databases are not a best fit for Big Data No Sql databases came into picture as a best fit for this which provides more flexibility in storing data.
I just want to compare few popular NoSql databases that are available at this point of time.Few well known NoSql databases are
NoSql databases differ each other more than the way Sql databases differ from each other.I think its one's responsibility to choose the appropriate NoSql database for their application based on their use case.Lets do a quick comparison of these databases.

MongoDb

  • Written in  :  c++
  • Main point : Retains some friendly  properties of SQL (Query, Index)
  • Licence : AGPL(Drivers : Apache)
  • Protocol : BSON (Binary JSON)
  • Replication : Master/Slave Replication  and automatic failover via Replica Sets
  • Sharding : Built-in
  • Queries are javascript expressions.
  • Runs arbitary javascript function server side.
  • Better Update-in-place than CouchDb.
  • Uses memory mapped files for data storage.
  • Performance over features.
  • Journaling (with --journal ) option turned on starting th mongod server.
  • Has Geospatial Indexing.
  • On 32-bit systems limited to 2.5GB.
  • Best used: If you need dynamic queries. If you prefer to define indexes, not map/reduce functions. If you need good performance on a big DB. If you wanted CouchDB, but your data changes too much, filling up disks.
  • For example: For most things that you would do with MySQL or PostgreSQL, but having predefined columns really holds you back.

Cassandra

  • Written in: Java
  • Main point: Best of BigTable and Dynamo
  • License: Apache
  • Protocol: Custom, binary (Thrift)
  • Tunable trade-offs for distribution and replication (N, R, W)
  • Querying by column, range of keys
  • BigTable-like features: columns, column families
  • Has secondary indices
  • Writes are much faster than reads (!)
  • Map/reduce possible with Apache Hadoop
  • All nodes are similar, as opposed to Hadoop/HBase
  • Best used: When you write more than you read (logging). If every component of the system must be in Java. ("No one gets fired for choosing Apache's stuff.")
  • For example: Banking, financial industry (though not necessarily for financial transactions, but these industries are much bigger than that.) Writes are faster than reads, so one natural niche is real time data analysis.   

HBase


  • Written in: Java
  • Main point: Billions of rows X millions of columns
  • License: Apache
  • Protocol: HTTP/REST (also Thrift)
  • Modeled after Google's BigTable
  • Uses Hadoop's HDFS as storage
  • Map/reduce with Hadoop
  • Query predicate push down via server side scan and get filters
  • Optimizations for real time queries
  • A high performance Thrift gateway
  • HTTP supports XML, Protobuf, and binary
  • Cascading, hive, and pig source and sink modules
  • Jruby-based (JIRB) shell
  • Rolling restart for configuration changes and minor upgrades
  • Random access performance is like MySQL
  • A cluster consists of several different types of nodes
  • Best used: Hadoop is probably still the best way to run Map/Reduce jobs on huge datasets. Best if you use the Hadoop/HDFS stack already.
  • For example: Analysing log data.

CouchDB


  • Written in: Erlang
  • Main point: DB consistency, ease of use
  • License: Apache
  • Protocol: HTTP/REST
  • Bi-directional (!) replication,
  • continuous or ad-hoc,
  • with conflict detection,
  • thus, master-master replication. (!)
  • MVCC - write operations do not block reads
  • Previous versions of documents are available
  • Crash-only (reliable) design
  • Needs compacting from time to time
  • Views: embedded map/reduce
  • Formatting views: lists & shows
  • Server-side document validation possible
  • Authentication possible
  • Real-time updates via _changes (!)
  • Attachment handling
  • thus, CouchApps (standalone js apps)
  • jQuery library included
  • Best used: For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where versioning is important.
  • For example: CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.
Neo4j

  • Written in: Java
  • Main point: Graph database - connected data
  • License: GPL, some features AGPL/commercial
  • Protocol: HTTP/REST (or embedding in Java)
  • Standalone, or embeddable into Java applications
  • Full ACID conformity (including durable data)
  • Both nodes and relationships can have metadata
  • Integrated pattern-matching-based query language ("Cypher")
  • Also the "Gremlin" graph traversal language can be used
  • Indexing of nodes and relationships
  • Nice self-contained web admin
  • Advanced path-finding with multiple algorithms
  • Indexing of keys and relationships
  • Optimized for reads
  • Has transactions (in the Java API)
  • Scriptable in Groovy
  • Online backup, advanced monitoring and High Availability is AGPL/commercial licensed
  • Best used: For graph-style, rich or complex, interconnected data. Neo4j is quite different from the others in this sense.
  • For example: Social relations, public transport links, road maps, network topologies

Reffered Sources : kkovacs , wikipedia

Comments

Popular posts from this blog

How MongoDB survives From SQL or Query Injection

As We know SQL injection  is one of the most famous way people try to hack the SQL based applications.I came to know about interesting thing how  MongoDB  survives from this SQL injection while reading the mongodb docs. For SQL based applications most of the drivers support accessing SQL data using query as String which makes the access vulnerable. For Example in Java we use to get the data from SQL as follows, String query = "SELECT ZipCode,State FROM zipcodes WHERE City = '+city+' AND State = '+state+'"; connection = DriverManager.getConnection(jdbcurl, username, password); Statement stmt = connection.createStatement(); ResultSet rs = stmt.executeQuery(query); In case of MongoDB there is no vulnerability because all the drivers creates a BSON object for the given Query instead of calling the DB as a string itself. For MongoDb in Java QueryBuilder is used to build Queries for accesing MongoDb Data, DBObject query = QueryBu

Three Database Revolutions

There are three database revolutions that happened so far.   The first revolution was driven by the emergence of the electronic computer. The second revolution by the emergence of the relational database. The third revolution has resulted in an explosion of non-relational database alternatives driven by the demands of modern applications that require global scope and continuous availability. Lets have a look on these three waves of database technologies and discuss the market and technology forces leading to today’s next generation databases. 1950-1972 (Pre - Relational) 1951 - Magnetic Tape 1952 - Magnetic Disk 1961 - ISAM 1965 - Hierarchical Model 1968 - IMS 1969 - Network Model 1971 - IDMS 1972 - 2005 (Relational) 1970 - Codd's Paper 1974 - System R 1978 - Oracle 1980 - Commercial Ingres 1981 - Informix 1984 - DB2 1987 - Sybase 1989 - Postgres 1989 - SQL Server 1995 - MySQL 2005 - 2015 ( The Next Generation)  2003 - MarkLogic 2004 - Map