Dr. Aly, O.
Computer Science
Introduction: For decades, the traditional database such as MySQL, PostgreSQL, SQL Server and Oracle are regarded to be a one-size-fits-all approach for data persistence and retrieval (Sakr & Gaber, 2014). However, these traditional databases are challenged by the increasing demand for scalability, the requirement for new applications, and some web-scale applications (Sakr & Gaber, 2014). The most common architecture to develop enterprise web application is based on the three-tier framework: the server layer, the application layer, and the data layer (Sakr & Gaber, 2014). The data partitioning and the data replication are two commonly used approaches to achieve the availability, scalability, and performance enhancement goals in the distributed data management. There are two main approaches to achieve scalability at the database layer to be able to handle the client requests when the load of the application is increased (Sakr & Gaber, 2014). The first approach is to scale up allocating a bigger machine to act as database servers. The second approach is to scale out replicating and partitioning data across more machines (Sakr & Gaber, 2014). However, the traditional database suffers from serious limitations. Database systems are not easy to scale as they cannot exceed a certain limit. The database systems are not easy to configure and maintain. The specialized database systems for main memory systems as the case with OLTP and column stores as the case with OLAP add more complication and cost when selecting the database system in addition to the peak provisioning unnecessary cost (Sakr & Gaber, 2014).
Thus, new systems called NoSQL started to emerge as an alternative mode for the traditional database systems to be able to deal and handle Big Data (Moniruzzaman & Hossain, 2013; Pokorny, 2013; Sakr & Gaber, 2014). Not Only SQL” known as NoSQL databases emerged to deal with Big Data. NoSQL database systems were developed by major internet companies such as Facebook, Amazon, and Google, when they were confronted with the Big Data challenges (Moniruzzaman & Hossain, 2013). NoSQL databases are found to be suitable for massive scheme-free datasets for Big Data management (Hu, Wen, Chua, & Li, 2014). NoSQL database systems are considered to be the potential data management solution for Big Data (Abbasi, Sarker, & Chiang, 2016).
NoSQL Data Storage and the Tradeoff between Consistency and Availability
The ACID properties are regarded to be one of the basic features of the traditional relational databases (Moniruzzaman & Hossain, 2013; Pokorny, 2013; Sakr & Gaber, 2014). ACID stands for “Atomicity,” “Consistency,” “Isolation,” and “Durability” (Pokorny, 2013). These ACID properties indicate “all or nothing” concept behind the traditional database (Pokorny, 2013). The relational database has been full compliance with ACID principle (Pokorny, 2013). In addition to the ACID properties, there is also CAP theorem which states that for any system sharing data it is impossible to guarantee all of these three properties simultaneously (Pokorny, 2013). These three properties of CAP include “consistency,” “availability,” and “partition tolerance” (Pokorny, 2013; Sakr & Gaber, 2014). Moreover, the traditional relational database is also characterized by a schema, where data is structured in tables, tuples, and fields (Moniruzzaman & Hossain, 2013; Sadalage & Fowler, 2013). The traditional consistency model is not adequate for distributed systems such as the Cloud environment (Sakr & Gaber, 2014).
There are two major consistency models; the strong consistency which includes the linearizability and serializability, and weak consistency which includes the eventual the causal, eventual, and timeline consistency model (Sakr & Gaber, 2014). The causal consistency model ensures total ordering between operations which have causal relations. The eventual consistency model ensures all replicas will gradually and eventually become consistent in the absence of updates (Sakr & Gaber, 2014). The timeline consistency model guarantees all replicas perform operations on one record in the same “correct order” (Sakr & Gaber, 2014).
As indicated in (Pokorny, 2013), the NoSQL database systems scale nearly linearly with the number of servers used (Pokorny, 2013). The reason for such capability for scaling nearly linearly is due to the use of “data partitioning” (Pokorny, 2013). In NoSQL database systems, the method of distributed hash tables (DHT) is often used, in which couples of (key, value) are hashed into buckets – partial storage spaces, each from that placed in a node (Pokorny, 2013). NoSQL is not characterized by a schema or structured data (Hu et al., 2014; Sadalage & Fowler, 2013). NoSQL systems are fast, highly scalable, and reliable (Hu et al., 2014). The term of “NoSQL” database indicates the loosely specified class of non-relational data stores (Pokorny, 2013). NoSQL databases mostly do not use SQL as their query language (Pokorny, 2013). They do not support operations of “JOIN” and “ORDER BY.” The reason is that because portioning data is done horizontally (Pokorny, 2013). The data in NoSQL is often organized into tables on a logical level and accessed only through the Primary Key (Pokorny, 2013).
NoSQL database systems are organized by data models (Hu et al., 2014). They are categorized as (1) key-value stores, (2) column-oriented databases, and (3) document databases (Hu et al., 2014). Most NoSQL database systems are key-value stores or big hash tables, which contain a pair of the key-value dataset (Pokorny, 2013). This approach of the key-value or big hash table increases the efficiency of the lookup (Pokorny, 2013). The key uniquely identifies a value typically a string but it can also be a pointer, where the value is stored. The value can be structured or completely unstructured typically in BLOB (binary large object) format (Pokorny, 2013). The pair of key-value can be of different types, and may not come from the same table (Pokorny, 2013). These characteristics of the NoSQL database using the pair of key-value concept, increase efficiency and scalability of the NoSQL database systems (Pokorny, 2013). Column-oriented databases store and process data by column instead of by row (Hu et al., 2014). The rows and columns will get split over multiple nodes to achieve scalability (Hu et al., 2014). The document-oriented databases support more complex data structures than the key-value stores (Hu et al., 2014). The document-oriented database systems are the most general models of the NoSQL databases (Pokorny, 2013). There are other data models for NoSQL dataset systems, including the graph databases that are not discussed in this paper. Table 1 illustrates some of these data models of the NoSQL databases reflecting the name of the databases for each model, producer, CAP option, and consistency, derived from (Hu et al., 2014).

Table 1: NoSQL Database Models. Adapted from (Hu et al., 2014)
In accordance (Hu et al., 2014; Moniruzzaman & Hossain, 2013; Pokorny, 2013), some of these NoSQL databases do not implement ACID fully, and can only be eventually consistent (Hu et al., 2014; Moniruzzaman & Hossain, 2013; Pokorny, 2013). NoSQL databases implement “eventual consistency” concept instead of “strong consistency,” where any updates and changes are replicated to the entire database eventually, but at any given time (Hu et al., 2014; Moniruzzaman & Hossain, 2013; Pokorny, 2013). The “eventually consistent” term means that the system will eventually become consistent but after some time (Hu et al., 2014; Moniruzzaman & Hossain, 2013; Pokorny, 2013). This principle of “eventual consistency” indicate more availability and greatly improve scalability at the expense of the full and immediate consistency (Hu et al., 2014; Moniruzzaman & Hossain, 2013; Pokorny, 2013). NoSQL database has particular architectures that use different possibilities of distribution, ensuring the availability of the data, and the access to the data using replication techniques. Figure 1 illustrates the characteristics of NoSQL databases, derived from (Moniruzzaman & Hossain, 2013).

Figure 1: Characteristics of NoSQL databases. Adapted from (Moniruzzaman & Hossain, 2013)
In accordance to (Chen & Zhang, 2014), the most popular NoSQL database is Apache Cassandra. Facebook in 2008 released Cassandra as open source (Chen & Zhang, 2014). Other NoSQL implementations include Apache Hadoop, MapReduce, MemcacheDB, and Voldemort (Chen & Zhang, 2014).
References
Abbasi, A., Sarker, S., & Chiang, R. (2016). Big data research in information systems: Toward an inclusive research agenda. Journal of the Association for Information Systems, 17(2), 3.
Chen, C. P., & Zhang, C.-Y. (2014). Data-intensive applications, challenges, techniques, and technologies: A survey of Big Data. Information Sciences, 275, 314-347.
Hu, H., Wen, Y., Chua, T.-S., & Li, X. (2014). Toward scalable systems for big data analytics: A technology tutorial. IEEE Access, 2, 652-687.
Moniruzzaman, A., & Hossain, S. A. (2013). NoSQL database: New era of databases for big data analytics-classification, characteristics, and comparison. arXiv preprint arXiv:1307.0191.
Pokorny, J. (2013). NoSQL databases: a step to database scalability in a web environment. International Journal of Web Information Systems, 9(1), 69-82.
Sadalage, J. P., & Fowler, M. (2013). NoSQL: A Brief Guide to the Emerging World of Polyglot Persistence (1st Edition ed.): Addison-Wesley.
Sakr, S., & Gaber, M. (2014). Large Scale and big data: Processing and Management: CRC Press.