From Data Science

SQL vs NoSQL: when to use?

There are many databases available in the market, and knowing which one to choose can be extremely tough. An excellent way to start excluding some options is to first have a clear understanding of the main differences between SQL and NoSQL databases.

In this article, we present a detailed comparison between these two different types of databases regarding structure, schema, scalability, query, and transactions.

Moreover, we explain when to use SQL or NoSQL databases and further provide a historical context for those interested in knowing how this dichotomy started.

Table of Contents

What is SQL?
    ➤  What is a relational database?
What is NoSQL?
SQL vs NoSQL: historical context
SQL vs NoSQL: comparison
    ➤  Structure
    ➤  Schema
    ➤  Scalability
    ➤  Query
    ➤  Database transactions: ACID vs BASE
SQL vs NoSQL: when to use?
Conclusion

What is SQL?

SQL is a programming language; however, it is not a general-purpose programming language like Java, Javascript, or Python. Instead, SQL follows a specific purpose: to access and manipulate data.

To be more precise, SQL stands for Structured Query Language. It is a query language that allows retrieving specific data from databases, and, in that sense, it is designed to access, store and manipulate relational databases.

What is a relational database?

A relational database is a type of database (usually organized into tables) that enables the recognition and access of data in relation to another piece of data within the same database. In other words, it stores related data across multiple tables, which are organized into columns and rows, and allow the user to query data (or information) from various tables simultaneously.

A relational database is a database that follows the relational model of data. To maintain a relational database, a Relational Database Management System (RDBMS) is used. Consequently, to operate on that system, many databases tend to utilize SQL in order to manage and query the database. Thus, SQL is a language that allows for communication with data in an RDBMS.

An important aspect to clear out is that SQL is not a database system itself. Truth be told, when comparing SQL vs NoSQL, the main differences being assessed are relational databases vs non-relational databases (as well as distributed databases).

Another aspect to consider is that SQL is not the only programming language able to query relational databases, but it is definitely the most popular one. Therefore, the terms "SQL databases" and "relational databases" are often interchangeably used. MySQL, PostgreSQL, Microsoft SQL Server and Oracle Database are among the most well-known RDBMS using SQL.

What is NoSQL?

NoSQL refers to non-relational databases and to distributed databases. NoSQL can also stand for "Not Only SQL" to highlight that some NoSQL systems may also support SQL query language. In fact, before moving on, it is important to keep in mind that NoSQL does not necessarily mean that a database does not support SQL. Instead, it means that the database is not an RDBMS.

While traditional RDBMS rely on SQL syntax to store and query data, on the other hand, NoSQL database systems use other technologies and programming languages to store structured, unstructured or semi-structured data.

SQL vs NoSQL: historical context

The relational model of data was introduced in 1970 by E.F. Codd. Four years later (1974), Raymond Boyce and Donald Chamberlin introduced SQL, which was initially developed to query IBM's System R, a database management system.

It did not take long until SQL became a massive success among relational database systems due to its incredible practicality and ability to reduce data duplication. Thus, soon, and for a long time, it was considered the predominant language of relational database systems. But then, one tiny thing happened: the World Wide Web, in 1989.

The consequences? More data. A lot more data. As we know, the Internet's growth was not slow, and as new sources and volumes of data kept disrupting our world, relational databases started struggling.

At the beginning of the 21st Century, to handle this tremendous amount of data, non-relational systems, such as Bigtable (by Google, in 2006) and Dynamo (by Amazon, in 2007), started making their own way. The focus was on scalability and rapid application.

During these years, and as more non-relational databases started showing up, the concept of NoSQL became very popular (even though the term was first coined in 1998 by Carlo Strozzi, and non-relational databases exist since the 60s).

Did the beginning of the Century represent the end of SQL and, consequently, of relational databases? - No, of course not. For two reasons:

  1. Relational databases were (and are) still incredibly useful and offered a lot of advantages. Plus, SQL is a very well developed and admired query language that keeps dominating database systems.
  2. NoSQL had its flaws. Since each NoSQL database had a different query language, there were a lot more languages to learn. Plus, some additional challenges included extra difficulty connecting databases to applications, and third-party ecosystems (to provide visualization and operational tools) were missing.

Time has taught us that SQL databases are not better or worse than NoSQL databases. They are simply prefered and more suitable for different applications regarding database management systems (DBMS).

According to the 2021 DZone Data Persistence Trend Report (image below), relational databases are the most popular DBMS. However, NoSQL databases refer to all non-relational DBMSs (including Graph, document-oriented, key-value, column-oriented, and others). Therefore, combined, NoSQL databases are currently more popular than relational databases.

Most Popular DBMS Storage Models
Source: DZone Data Persistence Trend Report

In sum, the right choice when it comes to SQL vs NoSQL depends first and foremost on knowing the type of database that fits each business or organization's purposes better. Before we move on to when to use each, let's first look at their differences.

SQL vs NoSQL: comparison

Structure

SQL databases organize and store data by tables with fixed columns and rows. Contrarily, NoSQL databases can be stored in various ways:

  • Document (JSON);
  • Wide-column (tables organized with rows and dynamic columns);
  • Key-value pairs;
  • Graph databases (organized with nodes and edges).

Schema

SQL databases require a fixed predefined schema, and all data must follow a similar structure. Consequently, a lot of preparation regarding the system is required upfront. Plus, flexibility is compromised, considering that potential modifications in the structure can be complex, highly complicated, and may disrupt the system.

In turn, NoSQL databases follow a dynamic schema for unstructured data. Since it does not require a predefined structure, modifications are easier to execute. Therefore, NoSQL databases have greater flexibility; however, flexibility may also compromise reliability despite the advantages.

Scalability

Regarding scalability, SQL databases follow a vertical approach, also known as "scale-up". In databases, this means that it is possible to increase the amount of data in a single server by adding more power to an existing machine by using, for instance, a CPU, RAM, or SSD.

On the other hand, NoSQL databases scale horizontally (also known as "scale-out") since they scale across commodity servers, which means that more servers are added to the pool of resources, and data can be distributed across those resources.

JOIN operations allow to connect and relate pieces of data. Generally, NoSQL databases (can but) are not designed to support JOINs efficiently. Objects can be on different servers in non-relational database systems without being concerned about joining tables from multiples servers.

Thus, NoSQL allows for easy scaling by sharding data and having a routing layer that can redirect the query to the appropriate shard, making the NoSQL databases highly scalable and fast to query. However, it compromises data integrity and does not follow an ACID approach.

"Scaling out" in RDBMS is generally more difficult to implement due to the ACID concepts relational databases follow. For a multi-server RDBMS to maintain data integrity across transactions, it would require a fast backend communication channel. This channel would have to synchronize all the writes and transactions, as well as prevent possible deadlocks.

Even though it is technically possible to scale out in RDBMS, these database systems typically scale up to ensure data integrity and ACID principles instead of distributing data across multiple servers.

Query

As previously mentioned, SQL has been around for a long time; thus, it is widely admired as a mature and popular language that benefits from a reliable reputation. It is incredibly efficient when it comes to querying data, manipulating and retrieving data from relational databases. Plus, it also stands out for being declarative and lightweight.

Another great SQL advantage is that it is relatively easy to learn, meaning that marketers and business analysts can use it without necessarily requiring technical staff's help.

When it comes to running NoSQL queries, it might not be as straightforward as SQL databases since it usually needs to execute extra data processing and does not have a declarative query language. Therefore, these tasks are usually performed by data scientists or developers.

All in all, how to run queries in NoSQL databases depends a lot on the database in question. For instance, in MongoDB, to request data from the JSON documents database, it is necessary to specify the documents with the properties that the results should match and apply the following function: db.collection.find(). Other popular solutions might include creating the query functionality directly into the application layer (and not into the database layer) or implementing MapReduce.

Database transactions: ACID vs BASE

SQL databases typically follow ACID properties regarding transactions. ACID stands for Atomic, Consistent, Isolated, and Durable. Let's take a closer look to understand more precisely what it means:

  • Atomic: ensures all the data in the database is necessarily validated. If each data transaction is not properly carried out, then the process returns to the initial state.
  • Consistent: ensures that a processed transaction of data does not damage the database's structural integrity.
  • Isolated: each transaction is isolated from other data transactions. Hence, a transaction cannot compromise the integrity of other transaction.
  • Durable: the data related to the processed transaction will not impact the manipulated data, even if a transaction fails.

As one can observe, the ACID model secures that a transaction is reliable and consistent. Therefore, databases that follow this model are the best fit for organizations and businesses that cannot risk invalid and interrupted data transactions or any other error (e.g. financial institutions).

Relational database management systems (such as MySQL, SQLite, PostgreSQL, etc.) are ACID compliant. However, even though the NoSQL databases approach usually goes against ACID principles, some NoSQL databases (e.g., MongoDB, IBM's Db2, and Apache's CouchDB) can also integrate and follow ACID rules.

In non-relational databases, the data reliability and consistency of being ACID compliant is usually not a number one priority, considering that it can compromise speed and high availability.

For NoSQL databases, the priority tends to focus on flexibility and on a high transaction rate. For this reason, the BASE model is followed in many NoSQL database systems. It stands for Basically Available, Soft state, and Eventually consistent.

  • Basically Available: ensure data availability by expanding and replicating data across the database cluster's nodes.
  • Soft state: developers are responsible for ensuring database consistency.
  • Eventually consistent: consistency is not immediate, but it can be achieved and meanwhile, it is still possible to read the data.

NoSQL databases usually follow the BASE model, which offers more elasticity than the ACID model. As mentioned, ACID is better for businesses and organizations that need to ensure each transaction's consistency, predictability and reliability.

Contrarily, the BASE model is more suitable for businesses that prioritize high availability, scalability and flexibility of data transactions. For instance, a social network app handles massive amounts of data that are often not very well structured; thus, in that case, a BASE model might make it easier (and faster) to store data.

SQL vs NoSQL: comparison table

DBMS SQL databases NoSQL databases

Type

Relational database.

Non-relational database.

Structure

SQL databases organize and store data by tables with fixed columns and rows

NoSQL databases can be: graph, document-oriented, key-value, column-oriented, and others.

Schema

Fixed schema.

Dynamic Schema.

Scalability

SQL databases follow a vertical approach.

NoSQL databases scale horizontally.

Query

SQL is typically the predominant query language.

How to run queries in NoSQL databases depends a lot on the database in question; no declarative query language.

Database Transactions

SQL databases typically follow ACID properties regarding transactions.

The BASE model is followed in many NoSQL database systems.

Priority

Data integrity, consistency, and stability.

Flexibility, fast queries, and scalability.

SQL vs NoSQL: when to use?

Now that we have covered the main differences between SQL and NoSQL, it is time to explain when to use one or the other. Before making a final decision, it is essential to consider the following aspects:

  1. The type of data in question;
  2. The amount of data;
  3. How will the database be managed?

When to use SQL?

Regarding the first aspect, SQL databases are a more suitable option than NoSQL when data integrity and consistency is key within an organization.

There is often the misconception that relational databases are not a good option to handle large amounts of data. That is not exactly true. Many SQL databases, such as PostgreSQL and MySQL, can indeed handle very respectful amounts of data.

However, since RDBMSs that use SQL have a fixed schema and require that the data is structured, it will probably become very challenging to keep up with the required maintenance, agility and performance that, for instance, a business handling Big Data might demand.

At first sight, it might seem that having a fixed schema is limiting. Well, again, it depends on the purpose. Having a predefined schema database also makes SQL databases the most appropriate option for handling payroll management systems or even for processing flight reservations. In fact, most banking institutions rely on a SQL database system.

As we have previously explained, relational databases are typically ACID compliant, meaning that data transactions ensure integrity, validity and reliability. Plus, SQL might limit some features, but it is also a very mature technology.

Moreover, a relational database and SQL offer a lot of support regarding ad-hoc queries. This type of databases is usually easier to manage. Since SQL is a popular query language and relatively easy to learn, it does not necessarily need a large team of engineers to maintain it.

When to use NoSQL

NoSQL databases are able to store various types of data and do not need to be as structured as SQL databases. Hence non-relational databases allow for great adaptability and flexibility, making it a more suitable choice when handling large sets of unstructured and unrelated data.

Typically, the more extensive the dataset, the more likely a NoSQL database is a better option. Non-relational databases tend to excel at scalability and availability requirements, being ideal for social networks and real-time applications (e.g., online gaming, instant messaging), for example.

NoSQL databases require programming knowledge. Unlike SQL, which can also be learned by staff from other fields such as management and marketing, NoSQL databases usually need someone with a background in coding and the ability to acquire other languages according to the database systems being used.

Conclusion

Choosing the proper database is not a straight and precise decision, even for experts. Deciding whether to go for relational or non-relational databases is a great way to start. Still, it is also essential to consider the many SQL and NoSQL options available in the market.

For instance, for a lot of unstructured data, CouchDB or MongoDB can be a good option, but maybe for high-availability, Redis and Cassandra might be more suitable. And these are all non-relational database systems!

On the other hand, SQL databases offer many advantages regarding data transactions and overall data integrity. Moreover, relational databases' relationships can be easily identified and defined, making it straightforward to identify critical insights.

Found this article useful? You might like these ones too!

At Imaginary Cloud, we simplify complex systems, delivering interfaces that users love. If you’ve enjoyed this article, you will certainly enjoy our newsletter, which may be subscribed below. Take this chance to also check our latest work and, if there is any project that you think we can help with, feel free to reach us. We look forward to hearing from you!