SQL vs. NoSQL: Database differences explained

What is NoSQL, and what does it bring to the table?
June 11, 2021

In this post we will discuss what SQL and NoSQL are, and their differences. But before we do that, you must have some basic idea of how computer systems store data. Computer systems basically process data and provide results. This data could be coming from a file system or a database or streaming via the network. More often than not, you cannot process all the data instantaneously, so you have to store some of the data until you process it, and later store the processed results somewhere. 

One way to store data, such as intermediate state, final results and other states; is to store it in a database. A database is an ordered collection of data, that lets you search and query itself, to get a desired subset of its data. The database could be a strictly ordered collection, where all data items take up a certain fixed data type, have some similar properties, and are arranged in a prescribed order. The order decides the relation between the data items. The below results in a relational database. 

E.g. Actor-Movie-Reviews-Relational 



Consider the above example, we have a table of actors, another table for movies, and yet another table for reviews. All actors data has to be stored in the actors table, with the predefined imposed structure of having fields like actor_name, a system generated unique actor_id, followed by date_of_birth and so on. Similarly for other tables. These tables are then related(JOIN) via the unique IDs. Since there can be many actors in a movie and an actor could act in many movies, we can create a relationship table, actor_movies relationship table. 

When you want to know which movies and actor has acted in, you go to the actor table and get this actor's actor_id, then go to the actor_movies relationship table, take all the movie_id's for this actor, then go to the movies table and search all the movies with the given id's and then fetch all the movie details and show them to the user. You JOIN actor_movies table with the movies table and the actor table. This join is good as long as the data items are limited, i.e. number of movies, actors are not in millions. The above approach saved memory and computing time, when computing resources are expensive and scarce. If the number goes to the millions, this "JOIN" will become time consuming and inefficient. 

Another flavour of a database could be non-relational, here there is not strict order/shape, the data are grouped based on one or more similar properties; but each data item could have its own unique properties/information. Items are grouped informally and based on a few characteristics only, there are no strict rules each data item should adhere to. The above is called a non-relational database. 

E.g: We can combine movies+reviews in 1 data structure and make a coalesced JSON based data item. 

Now, even if the number of movies that the actor worked is in millions, you fetch the movies+actors data together, without the need of joining the tables, and keep it in cache if it is searched frequently. These days, computing resources are powerful, memory is cheaper and CPU's have become very powerful. This approach, with the advancement and easy availability of computing(hardware) resources, is a better approach in some applications these days. Above was a short introduction to the ways we can save data, based on the structure of our data and computing resources available. These two separate approaches to saving data, lead us to 2 separate approaches to saving+querying data, namely SQL for relational tables, and NoSQL for non-relational tables. 

What is SQL ? 

SQL is the short form of Structured Query Language. SQL is used to query databases that have a strict prescribed structure and order, i.e. relational databases. So, SQL is used to define tables and the fields(columns) within them, the data type of each of those columns and some rules(integrity constraints) that apply to the table/columns in order to maintain the semantics of the data.  

E.g. You cannot have a row in the class table without it being associated with a teacher. Next, SQL is used to insert data, query the data and to update the data in the tables. For frequently used queries and conditional computations, you can define stored procedures in SQL, and re-use them whenever the need arises. SQL is also used to create views, a subset of the data, conforming to a certain set of conditions. 

Further, you can allow/limit/restrict access to these views to a certain group of users. Also, SQL is used to create and manage indexes, functions and triggers etc. on the database. There are different dialects of SQL and ANSI also provides standard, but all dialects support at least the major commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar manner. 

What is NoSQL?

NoSQL is a collective term, used to describe non-relational databases and the language to access the data inside them. Initially NoSQL acronyms came from non-SQL, but lately most non-relational databases have also started to provide support for SQL, for ease of adaptation and compatibility. Hence, NoSQL today also means "not only SQL", i.e. for a non-relational database, you can use SQL as well as the scripting language provided by the database vendor also. 

A non-relational database stores data in a non-tabular form, it tends to aggregate all properties of an object in a single record, it's more flexible than the relational database. NoSQL these days serves as an umbrella term for non-relational databases, their storage and access methods. Before we move further, it's necessary to discuss why the need for NoSQL databases arose. Relational databases are very good and some applications require relational databases only. Applications like a library or school management system, where the data is closely related/linked to each other, mandate the use of relational databases. 

E.g. Every book will have at least 1 author and a number of copies in the library. 

So, inside the database, one can easily define and maintain an integrity constraint that before making an entry in the books table, its author(s) must exist in the authors table. A library management system generates data about issue and submission of books, apart from books inventory in the library. So, the data being generated has well defined structure, and is not generated at a hectic pace. 

With relational databases, things are good till the data storage is to the tune of a few GBs and you are sure about the structure and speed of incoming data. Compare the above scenario with some social media applications, like facebook of today. There are millions of users online on the platform at any given time, they share not only images and video, but a lot of text/documents/links/stories etc. 

There are millions of comments, likes and mentions every minute. All this data can be in many formats, and is generated at a hectic pace. Keeping all this data in a relational database, and making thousands of table JOINs every second, is simply not possible. Above is a very simplistic view of data demands that today's applications have, the real picture is much more overwhelming and complex. 

Hence, the search began for something that could scale to the demands of such applications, have a reasonable throughput even in peak hours/seasons and is flexible enough to hold data whose type will be determined at the time of its usage( not storage). Non-relational database solutions existed even in the 1960s, but were not in much use as the memory and storage costs of computers were prohibitively high in those times. The search for non-relational databases led to further advancement in such databases, and the term "noSQL" was coined. 

SQL vs NoSQL: how is SQL different from NoSQL? 

Lets now discuss some key differences between SQL and NoSQL databases. This will help you understand the paradigms better and plan your database interactions accordingly. 

1. Transactions and atomicity - 

There are 4 key properties that define a "transaction" - Atomicity, Consistency, Isolation, and Durability; or ACID in short. 

  • Atomicity refers to the property of each statement in a transaction (to read, write, update or delete data) to be treated as a single unit. So, the Atomicity principal says that either the whole of the statement is executed or none of it is executed. 

E.g. In our library management system, if you delete an "author" all his books must also be deleted. 

If somehow the system is not able to delete all his books, then the initiating transaction of deleting an author must also be retracted. Either the whole transaction takes place OR the database goes to the previous state, the state it was in before the transaction was initiated. 

  • Consistency principle demands that transactions should make changes to table(s) in predictable/predefined ways, such that the table is consistent with respect to integrity
    constraints before and after the transaction. There are no surprises and any errors in your data do not result in unforeseen consequences for the  table. 
  • Isolation - Each transaction must occur in isolation, not making any effect on other transactions happening in the system. When many transactions are taking place simultaneously, it should seem as if they take place in sequential order, and concurrent transactions don't interfere/influence one another.  
  • Durability - Durability principal demands that changes made to the database by virtue of a transaction must be durable, they must persist even after the transaction took place long ago, and even in cases of later system failure. 

ACID basically provides reliability and consistency to the database, with respect to referential integrity and some other rules. Relational databases are designed to ensure ACID compliance, whereas NoSQL does not provide such guarantees. Not to say that NoSQL databases do not provide consistency and reliability, just that it is viewed and provided in a different manner.

NoSQL as you know by now, could be distributed (parts of a database stay at different nodes) and data tables/subsets are not designed to implement referential integrity. So, NoSQL databases offer a concept of "eventual consistency", where database changes are propagated to all nodes "eventually" (typically within milliseconds), which may sometimes lead to queries giving old data ( the "transaction" could not update them and the read was executed before) leading to what we say as "stale reads". 

Modern NoSQL solutions handle the problem of "stale reads" appreciably well, but that is beyond the scope of this discussion. So, NoSQL databases follow what is known as CAP (Consistency, Availability, Partition tolerance). This much for ACID vs eventual consistency. 

2. Scalability and distributiveness  

In relational approach this is not possible as you cannot save tables and/or their parts thereof to multiple machines as the tables are strongly related to each other, i.e. if two tables have any referential integrity constraints between them, they must reside at the same disk.  Relational databases "do not allow referential integrity constraints to span databases".  

With NoSQL, you just need to add a few nodes to your cluster (each node performing the same task) of machines for horizontal scaling. Since the data chunks/tables are not closely related (not dependent on each other), if the data grows, you can add a few more machines and just set up a few pointers to scale. Thus, NoSQL databases scale horizontally, you just need to add a few more nodes to make it house more data. Relational databases scale vertically, you need to upgrade the database server to a machine with more RAM, Storage and CPU cycles; in order for the database to scale. 

This process invariably requires a migration to a more powerful computer, and thus may need a time window when transactions would not be available. Such outages are more common with Relational databases. 

3. Schemas and structure 

As discussed before, relational databases have a fixed predefined schema, you must define the datatype of all columns in a table, and all incoming data must adhere to that datatype else the inserts will be denied. If you take a bird's eye view of all the tables, the relationships between the tables, the referential integrity constraints and the dependencies must all be pre-defined. 

In NOSQL, it's not necessary to have a predefined schema, and the schema can be dynamic. It's flexible to such an extent that all the data can be of data type "text"/"string", and the application decides whether the "text" is to be treated as a number or text or as a date. The incoming data can change its data type at runtime and NoSQL will take care of it in almost all cases. 

4. Types of data well supported 

NoSQL is better suited for hierarchical data, e.g. a car of a particular type will have certain properties like restraint system/engine/chassis etc. The restraint system could have its own properties in that car's different variants. The top variant might come with 6 airbags whereas the entry level variant would come with 2 airbags only, while the make/model/manufacturer/year of both the cars are the same.  To model this scenario on SQL RDBMS you would need many tables and would need to join them to know if a car has 2 or 4 or 6 airbags. In NoSQL this can easily be stored as 2 separate JSON objects, where each object contains all properties of a particular car including the number of airbags in it. 

CarVariant1 {

VIN_number: "ASD48986y9012L...." , 

Engine: "2.2 Diesel Turbo F/INJ", 

GearBox: "6 SP Automatic", 

Year: "2021" ,

Restraint_System: " Active belts with Driver, passenger, side and curtain Airbags...", 

..... 

CarVariant2 {

VIN number: "ASD48012y9012L...." , 

Engine: "2.2 Diesel Turbo F/INJ", 

GearBox: "6 SP Automatic", 

Year: "2021" ,

Restraint_System: " Active belts with Driver, passenger, front Airbags...", 

..... 

Relational databases or SQL is not suited for hierarchical data, it's more suited for data with linear/flat relations between them.  

Note: Each legally produced car in the world has a unique Vehicle Identification Number called VIN number for short. 

5. Types of implementations 

SQL implementations are mainly for relational databases. Since SQL has been widely used for decades, it is a powerful and mature language for relational databases. As NoSQL databases do not impose any fixed structure on the data, and need to be flexible with unstructured data, NoSQL databases come in 4 major types:- 

  • Document-oriented - Records are stored as individual documents, which are indexed and tagged internal for quicker search, more on this below. 
  • Column-oriented - Instead of data being stored as rows as in a relational database, information is stored in columnar format instead, leading to faster context/conditional search. All related information is stored together. E.g BigQuery 
  • Graph-based - Information is stored as a set of nodes connected with branches that depict the relation between the data nodes, akin to a tree structure. E.g. Neo4J  
  • KeyValue store - Information is stored as key-value pairs, the key tells what the information is all about and the paired value has all the details of the object. E.g. MongoDB  

SQL vs NoSQL: side-by-side comparison

                  SQL 

vs

                          NoSQL 

Fixed or predefined schema


Dynamic(No fixed) schema

ACID compliant


follows CAP(eventual consistency, availability, partition tolerance)

Can manage only structured data


Can manage structured, unstructured and semi-structured data 

Not suited for Hierarchical data


Good for Hierarchical data

Vertical scalability


Horizontal scalability

Centralised structure


Decentralised structure 

Integrity constraints and Triggers


References and Pairing

Needs highly available storage(RAID etc.) and Network(Infiniband, Fabric Path, etc.)  to work fast


Can work on commodity storage( simple HDD) and commodity Networks( simple Ethernet) 

A brief history: SQL & NoSQL processes 

As early as the 1960s the concept of NoSQL databases, called non-SQL or non-relational at that time, was implemented and put into use. At that time the motivation for such databases was to store certain types of information in a more efficient manner, which would reduce disk reads and present all the required information in a single command/transaction rather than having to make joins/multiple disk reads. Slowly this concept moved towards an attempt to gain horizontal scalability and to have better object-relational mapping of data objects. Relational databases are not very good in storing object oriented data, and it involves inefficient object-relational mapping methods to force them to do so. 

Hence, NoSQL databases provided easy storage for all the information and state of objects, in a single file/record. With the advent of social media, tracking of user’s activities, streaming data and multi-faceted informational objects; NoSQL gained momentum in the last decade and now it has seen great advancements and improvements in NoSQL technology. 

SQL processes were also envisaged many decades ago when people found out that there are certain types of data which are intrinsically related and must co-exist to make better sense. We have discussed examples of the Library system and Actor-Movie-Reviews database at the beginning of this article. SQL was developed by IBM in the early 1970s and was commercially released in 1979. SQL was initially developed to provide CRUD ( Create, Read, Update, Delete) operations; and has since then evolved to include triggers, procedures and some caching functions. 

How SQL works  

Broadly speaking, the commands of SQL can be grouped into 4 categories:- 

DDL (data definition language) – Used to create and modify database objects like tables, users, and indices. These include CREATE/ALTER/DROP/TRUNCATE commands that work on table/view/index etc. database objects.  

  • DML (data manipulation language) – Used to add, update and data within databases. These include INSERT/UPDATE/DELETE/CALL/LOC commands that work on rows/columns that hold the actual data.

  • DCL (data control language) – Used for access control to the data items within a database. These include GRANT(Gives a privilege to users), REVOKE (takes back privileges granted from users)

  • DQL (data query language) – Used to perform reads(queries) on the data and fetch relevant information, and is composed of SELECT statements only. 

With the SELECT statement certain clauses like WHERE/HAVING/ORDER BY/JOIN are used to specify conditions and characteristics of the data we are searching for.

  • TCL (transaction control language) - This is a subset of SQL commands used to initiate, manage and revoke a transaction. It consists of BEGIN(start a transaction), COMMIT(commits a transaction), ROLLBACK(rollbacks a transaction in case of any error occurs), SAVEPOINT(sets a marker "save point" within a transaction), SET TRANSACTION(defines characteristics for the transaction) . 

Examples of SQL:-  

CREATE DATABASE schoolDB;

USE schoolDB; 

CREATE TABLE student ( studentID INT(20),

    lastName varchar(255),

    firstName varchar(255),

    address varchar(255),

    registration_Number INT(20) 

    PRIMARY KEY (studentID)  

         ) ;              - creates the student table     

ALTER TABLE student ADD local_guardian varchar(20); - adds an extra column named local_guardian after the table was made 

INSERT into student values ( 1, "Burton", "Mack", "25, River dr., Scottsdale", 256789, "John Burton" ); 

INSERT into student values ( 2, "Dias", "Sara", "1356/A, Sunset Boulevard, Scottsdale", 3601245, "Cameron Dias" );  

REVOKE update on student from role_A ; - all people with role_A cannot update the student table now, they still can read/query it though. 

REVOKE all on student from role_A ;  - all people with role_A will not be able to do anything with the student table, this table will be invisible to them in most SQL implementations. 

SELECT 

firstName, address from student WHERE studentID=1 ; returns the "Mack Burton" record. 

How NoSQL works

Let's assume we have some data of people in a city and their visits to public places. This example can be used to emulate contact tracing, i.e. knowing who met whom and where, as was done during "Corona" times. We will try and answer some questions using Neo4J as well as OrientDB. Let's assume we have some data about persons, their health status, their location and the confirmation time of the test to decide if they are healthy/sick. 



Persons data 

We also have the visits data, which tells us the time when a person visited a particular place. If the place and visit timings match, the 2 persons were at the same place together for a period of time. So the likelihood of the disease spreading from a sick to a health person increases when they are the same place 



Visits data 

There can be a lot more data about places, how congested or spacious they were and how close the contact was, but we will omit that as we just want to give you an introduction on how NoSQL works. 

Examples of NoSQL 

1. Neo4J is a graph database management system which is gaining popularity recently. The data objects are depicted as nodes and relations between them are depitcted as directed acyclic edges. A discussion on details of the query syntax and semantics is beyond the scope of this discussion, as we just want to introduce you to NoSQL queries. Using Neo4J, if I make a graph of all persons as nodes and their "meets" as an edge , you will get a graph similar to the one below. 

Healthy persons are depicted by green circles and places by pink. If a path/edge exists between 2 persons, they have met (in close contact at a place) some time or the other. The shortest path between 2 persons, let's say Hazel and Judah, will exist if they have met at some place. I will fire a query something similar to the one below:- 

MATCH =shortestPath((from:Person{name:'Hazel  Kirby'})<-[r:VISITS]->(to:Person{name:'Judah Mcdonald'}) 

This query will give me a result like the one below. 

Of course, this is a pictorial depiction, I can also get this data in textual form. I can also find out who has met whom, and in case any one of them gets infected with Corona, it's likely that they might have transmitted the disease to the other person. So, this is what we call "contact tracing". 

The above result was obtained when I fired the query below. 

MATCH (n:Visit) RETURN n LIMIT 25 

2. OrientDB - OrientDb is a Multi-Model NoSQL DBMS that combines the power of graphs and the flexibility of documents into one scalable solution. You can store your data as graphs as well as documents. We take OrientDb's example as its syntax is quite similar to SQL. It's an open source database. We use the same data as in the previous example, and try to find the shortest path which could lead to two persons meeting each other. 

If you look closely you will also find the query that gave you the desired results. Similarly we can find all places where two persons have visited at the same time, if there are multiple such places and one of them is infected, the probability of transmission increases.


 

Which is better: SQL or NoSQL? 


Both SQL and NoSQL are good, but designed for different scenarios. 

SQL is good if

  • You have intrinsically relational data with predefined simple data types 
  • ACID compliance is needed. 
  • Data storage is not huge and most of the data makes sense only when related(JOIN) with other data. 
  • Incoming data is not in millions per second 
  • When you need complex calculations and aggregations on your data, the analytics to be run are more mathematical in nature 
  • When you need to implements transactions and transactional consistency 
  • Vertical scalability is acceptable and you would not need to scale very frequently.

NoSQL is good if

  • There no fixed schema for your data
  • Your data is better represented as objects which have specific properties and objects can be differentiated based on their properties 
  • Your incoming data is in huge volumes and come in at high speeds 
  • When quick reads are more important than ACID compliance and Transactions are not needed. 
  • When you would frequently need to scale, and horizontal scalability is imperative 
  • Your analytics are more fact finding and informational in nature 
  • You need to uncover hidden trends and predict future demand/turn of events.

Conclusion 

We have discussed the evolution and differences on SQL and NoSQL approaches. We have discussed in detail when each of them are applicable, and data requirements that suit the use of  a particular flavour of database. We hope this discussion is helpful for you to make an appropriate decision for your data needs. 

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Data insights
Data insights

SQL vs. NoSQL: Database differences explained

SQL vs. NoSQL: Database differences explained

June 11, 2021
June 11, 2021
SQL vs. NoSQL: Database differences explained
What is NoSQL, and what does it bring to the table?

In this post we will discuss what SQL and NoSQL are, and their differences. But before we do that, you must have some basic idea of how computer systems store data. Computer systems basically process data and provide results. This data could be coming from a file system or a database or streaming via the network. More often than not, you cannot process all the data instantaneously, so you have to store some of the data until you process it, and later store the processed results somewhere. 

One way to store data, such as intermediate state, final results and other states; is to store it in a database. A database is an ordered collection of data, that lets you search and query itself, to get a desired subset of its data. The database could be a strictly ordered collection, where all data items take up a certain fixed data type, have some similar properties, and are arranged in a prescribed order. The order decides the relation between the data items. The below results in a relational database. 

E.g. Actor-Movie-Reviews-Relational 



Consider the above example, we have a table of actors, another table for movies, and yet another table for reviews. All actors data has to be stored in the actors table, with the predefined imposed structure of having fields like actor_name, a system generated unique actor_id, followed by date_of_birth and so on. Similarly for other tables. These tables are then related(JOIN) via the unique IDs. Since there can be many actors in a movie and an actor could act in many movies, we can create a relationship table, actor_movies relationship table. 

When you want to know which movies and actor has acted in, you go to the actor table and get this actor's actor_id, then go to the actor_movies relationship table, take all the movie_id's for this actor, then go to the movies table and search all the movies with the given id's and then fetch all the movie details and show them to the user. You JOIN actor_movies table with the movies table and the actor table. This join is good as long as the data items are limited, i.e. number of movies, actors are not in millions. The above approach saved memory and computing time, when computing resources are expensive and scarce. If the number goes to the millions, this "JOIN" will become time consuming and inefficient. 

Another flavour of a database could be non-relational, here there is not strict order/shape, the data are grouped based on one or more similar properties; but each data item could have its own unique properties/information. Items are grouped informally and based on a few characteristics only, there are no strict rules each data item should adhere to. The above is called a non-relational database. 

E.g: We can combine movies+reviews in 1 data structure and make a coalesced JSON based data item. 

Now, even if the number of movies that the actor worked is in millions, you fetch the movies+actors data together, without the need of joining the tables, and keep it in cache if it is searched frequently. These days, computing resources are powerful, memory is cheaper and CPU's have become very powerful. This approach, with the advancement and easy availability of computing(hardware) resources, is a better approach in some applications these days. Above was a short introduction to the ways we can save data, based on the structure of our data and computing resources available. These two separate approaches to saving data, lead us to 2 separate approaches to saving+querying data, namely SQL for relational tables, and NoSQL for non-relational tables. 

What is SQL ? 

SQL is the short form of Structured Query Language. SQL is used to query databases that have a strict prescribed structure and order, i.e. relational databases. So, SQL is used to define tables and the fields(columns) within them, the data type of each of those columns and some rules(integrity constraints) that apply to the table/columns in order to maintain the semantics of the data.  

E.g. You cannot have a row in the class table without it being associated with a teacher. Next, SQL is used to insert data, query the data and to update the data in the tables. For frequently used queries and conditional computations, you can define stored procedures in SQL, and re-use them whenever the need arises. SQL is also used to create views, a subset of the data, conforming to a certain set of conditions. 

Further, you can allow/limit/restrict access to these views to a certain group of users. Also, SQL is used to create and manage indexes, functions and triggers etc. on the database. There are different dialects of SQL and ANSI also provides standard, but all dialects support at least the major commands (such as SELECT, UPDATE, DELETE, INSERT, WHERE) in a similar manner. 

What is NoSQL?

NoSQL is a collective term, used to describe non-relational databases and the language to access the data inside them. Initially NoSQL acronyms came from non-SQL, but lately most non-relational databases have also started to provide support for SQL, for ease of adaptation and compatibility. Hence, NoSQL today also means "not only SQL", i.e. for a non-relational database, you can use SQL as well as the scripting language provided by the database vendor also. 

A non-relational database stores data in a non-tabular form, it tends to aggregate all properties of an object in a single record, it's more flexible than the relational database. NoSQL these days serves as an umbrella term for non-relational databases, their storage and access methods. Before we move further, it's necessary to discuss why the need for NoSQL databases arose. Relational databases are very good and some applications require relational databases only. Applications like a library or school management system, where the data is closely related/linked to each other, mandate the use of relational databases. 

E.g. Every book will have at least 1 author and a number of copies in the library. 

So, inside the database, one can easily define and maintain an integrity constraint that before making an entry in the books table, its author(s) must exist in the authors table. A library management system generates data about issue and submission of books, apart from books inventory in the library. So, the data being generated has well defined structure, and is not generated at a hectic pace. 

With relational databases, things are good till the data storage is to the tune of a few GBs and you are sure about the structure and speed of incoming data. Compare the above scenario with some social media applications, like facebook of today. There are millions of users online on the platform at any given time, they share not only images and video, but a lot of text/documents/links/stories etc. 

There are millions of comments, likes and mentions every minute. All this data can be in many formats, and is generated at a hectic pace. Keeping all this data in a relational database, and making thousands of table JOINs every second, is simply not possible. Above is a very simplistic view of data demands that today's applications have, the real picture is much more overwhelming and complex. 

Hence, the search began for something that could scale to the demands of such applications, have a reasonable throughput even in peak hours/seasons and is flexible enough to hold data whose type will be determined at the time of its usage( not storage). Non-relational database solutions existed even in the 1960s, but were not in much use as the memory and storage costs of computers were prohibitively high in those times. The search for non-relational databases led to further advancement in such databases, and the term "noSQL" was coined. 

SQL vs NoSQL: how is SQL different from NoSQL? 

Lets now discuss some key differences between SQL and NoSQL databases. This will help you understand the paradigms better and plan your database interactions accordingly. 

1. Transactions and atomicity - 

There are 4 key properties that define a "transaction" - Atomicity, Consistency, Isolation, and Durability; or ACID in short. 

  • Atomicity refers to the property of each statement in a transaction (to read, write, update or delete data) to be treated as a single unit. So, the Atomicity principal says that either the whole of the statement is executed or none of it is executed. 

E.g. In our library management system, if you delete an "author" all his books must also be deleted. 

If somehow the system is not able to delete all his books, then the initiating transaction of deleting an author must also be retracted. Either the whole transaction takes place OR the database goes to the previous state, the state it was in before the transaction was initiated. 

  • Consistency principle demands that transactions should make changes to table(s) in predictable/predefined ways, such that the table is consistent with respect to integrity
    constraints before and after the transaction. There are no surprises and any errors in your data do not result in unforeseen consequences for the  table. 
  • Isolation - Each transaction must occur in isolation, not making any effect on other transactions happening in the system. When many transactions are taking place simultaneously, it should seem as if they take place in sequential order, and concurrent transactions don't interfere/influence one another.  
  • Durability - Durability principal demands that changes made to the database by virtue of a transaction must be durable, they must persist even after the transaction took place long ago, and even in cases of later system failure. 

ACID basically provides reliability and consistency to the database, with respect to referential integrity and some other rules. Relational databases are designed to ensure ACID compliance, whereas NoSQL does not provide such guarantees. Not to say that NoSQL databases do not provide consistency and reliability, just that it is viewed and provided in a different manner.

NoSQL as you know by now, could be distributed (parts of a database stay at different nodes) and data tables/subsets are not designed to implement referential integrity. So, NoSQL databases offer a concept of "eventual consistency", where database changes are propagated to all nodes "eventually" (typically within milliseconds), which may sometimes lead to queries giving old data ( the "transaction" could not update them and the read was executed before) leading to what we say as "stale reads". 

Modern NoSQL solutions handle the problem of "stale reads" appreciably well, but that is beyond the scope of this discussion. So, NoSQL databases follow what is known as CAP (Consistency, Availability, Partition tolerance). This much for ACID vs eventual consistency. 

2. Scalability and distributiveness  

In relational approach this is not possible as you cannot save tables and/or their parts thereof to multiple machines as the tables are strongly related to each other, i.e. if two tables have any referential integrity constraints between them, they must reside at the same disk.  Relational databases "do not allow referential integrity constraints to span databases".  

With NoSQL, you just need to add a few nodes to your cluster (each node performing the same task) of machines for horizontal scaling. Since the data chunks/tables are not closely related (not dependent on each other), if the data grows, you can add a few more machines and just set up a few pointers to scale. Thus, NoSQL databases scale horizontally, you just need to add a few more nodes to make it house more data. Relational databases scale vertically, you need to upgrade the database server to a machine with more RAM, Storage and CPU cycles; in order for the database to scale. 

This process invariably requires a migration to a more powerful computer, and thus may need a time window when transactions would not be available. Such outages are more common with Relational databases. 

3. Schemas and structure 

As discussed before, relational databases have a fixed predefined schema, you must define the datatype of all columns in a table, and all incoming data must adhere to that datatype else the inserts will be denied. If you take a bird's eye view of all the tables, the relationships between the tables, the referential integrity constraints and the dependencies must all be pre-defined. 

In NOSQL, it's not necessary to have a predefined schema, and the schema can be dynamic. It's flexible to such an extent that all the data can be of data type "text"/"string", and the application decides whether the "text" is to be treated as a number or text or as a date. The incoming data can change its data type at runtime and NoSQL will take care of it in almost all cases. 

4. Types of data well supported 

NoSQL is better suited for hierarchical data, e.g. a car of a particular type will have certain properties like restraint system/engine/chassis etc. The restraint system could have its own properties in that car's different variants. The top variant might come with 6 airbags whereas the entry level variant would come with 2 airbags only, while the make/model/manufacturer/year of both the cars are the same.  To model this scenario on SQL RDBMS you would need many tables and would need to join them to know if a car has 2 or 4 or 6 airbags. In NoSQL this can easily be stored as 2 separate JSON objects, where each object contains all properties of a particular car including the number of airbags in it. 

CarVariant1 {

VIN_number: "ASD48986y9012L...." , 

Engine: "2.2 Diesel Turbo F/INJ", 

GearBox: "6 SP Automatic", 

Year: "2021" ,

Restraint_System: " Active belts with Driver, passenger, side and curtain Airbags...", 

..... 

CarVariant2 {

VIN number: "ASD48012y9012L...." , 

Engine: "2.2 Diesel Turbo F/INJ", 

GearBox: "6 SP Automatic", 

Year: "2021" ,

Restraint_System: " Active belts with Driver, passenger, front Airbags...", 

..... 

Relational databases or SQL is not suited for hierarchical data, it's more suited for data with linear/flat relations between them.  

Note: Each legally produced car in the world has a unique Vehicle Identification Number called VIN number for short. 

5. Types of implementations 

SQL implementations are mainly for relational databases. Since SQL has been widely used for decades, it is a powerful and mature language for relational databases. As NoSQL databases do not impose any fixed structure on the data, and need to be flexible with unstructured data, NoSQL databases come in 4 major types:- 

  • Document-oriented - Records are stored as individual documents, which are indexed and tagged internal for quicker search, more on this below. 
  • Column-oriented - Instead of data being stored as rows as in a relational database, information is stored in columnar format instead, leading to faster context/conditional search. All related information is stored together. E.g BigQuery 
  • Graph-based - Information is stored as a set of nodes connected with branches that depict the relation between the data nodes, akin to a tree structure. E.g. Neo4J  
  • KeyValue store - Information is stored as key-value pairs, the key tells what the information is all about and the paired value has all the details of the object. E.g. MongoDB  

SQL vs NoSQL: side-by-side comparison

                  SQL 

vs

                          NoSQL 

Fixed or predefined schema


Dynamic(No fixed) schema

ACID compliant


follows CAP(eventual consistency, availability, partition tolerance)

Can manage only structured data


Can manage structured, unstructured and semi-structured data 

Not suited for Hierarchical data


Good for Hierarchical data

Vertical scalability


Horizontal scalability

Centralised structure


Decentralised structure 

Integrity constraints and Triggers


References and Pairing

Needs highly available storage(RAID etc.) and Network(Infiniband, Fabric Path, etc.)  to work fast


Can work on commodity storage( simple HDD) and commodity Networks( simple Ethernet) 

A brief history: SQL & NoSQL processes 

As early as the 1960s the concept of NoSQL databases, called non-SQL or non-relational at that time, was implemented and put into use. At that time the motivation for such databases was to store certain types of information in a more efficient manner, which would reduce disk reads and present all the required information in a single command/transaction rather than having to make joins/multiple disk reads. Slowly this concept moved towards an attempt to gain horizontal scalability and to have better object-relational mapping of data objects. Relational databases are not very good in storing object oriented data, and it involves inefficient object-relational mapping methods to force them to do so. 

Hence, NoSQL databases provided easy storage for all the information and state of objects, in a single file/record. With the advent of social media, tracking of user’s activities, streaming data and multi-faceted informational objects; NoSQL gained momentum in the last decade and now it has seen great advancements and improvements in NoSQL technology. 

SQL processes were also envisaged many decades ago when people found out that there are certain types of data which are intrinsically related and must co-exist to make better sense. We have discussed examples of the Library system and Actor-Movie-Reviews database at the beginning of this article. SQL was developed by IBM in the early 1970s and was commercially released in 1979. SQL was initially developed to provide CRUD ( Create, Read, Update, Delete) operations; and has since then evolved to include triggers, procedures and some caching functions. 

How SQL works  

Broadly speaking, the commands of SQL can be grouped into 4 categories:- 

DDL (data definition language) – Used to create and modify database objects like tables, users, and indices. These include CREATE/ALTER/DROP/TRUNCATE commands that work on table/view/index etc. database objects.  

  • DML (data manipulation language) – Used to add, update and data within databases. These include INSERT/UPDATE/DELETE/CALL/LOC commands that work on rows/columns that hold the actual data.

  • DCL (data control language) – Used for access control to the data items within a database. These include GRANT(Gives a privilege to users), REVOKE (takes back privileges granted from users)

  • DQL (data query language) – Used to perform reads(queries) on the data and fetch relevant information, and is composed of SELECT statements only. 

With the SELECT statement certain clauses like WHERE/HAVING/ORDER BY/JOIN are used to specify conditions and characteristics of the data we are searching for.

  • TCL (transaction control language) - This is a subset of SQL commands used to initiate, manage and revoke a transaction. It consists of BEGIN(start a transaction), COMMIT(commits a transaction), ROLLBACK(rollbacks a transaction in case of any error occurs), SAVEPOINT(sets a marker "save point" within a transaction), SET TRANSACTION(defines characteristics for the transaction) . 

Examples of SQL:-  

CREATE DATABASE schoolDB;

USE schoolDB; 

CREATE TABLE student ( studentID INT(20),

    lastName varchar(255),

    firstName varchar(255),

    address varchar(255),

    registration_Number INT(20) 

    PRIMARY KEY (studentID)  

         ) ;              - creates the student table     

ALTER TABLE student ADD local_guardian varchar(20); - adds an extra column named local_guardian after the table was made 

INSERT into student values ( 1, "Burton", "Mack", "25, River dr., Scottsdale", 256789, "John Burton" ); 

INSERT into student values ( 2, "Dias", "Sara", "1356/A, Sunset Boulevard, Scottsdale", 3601245, "Cameron Dias" );  

REVOKE update on student from role_A ; - all people with role_A cannot update the student table now, they still can read/query it though. 

REVOKE all on student from role_A ;  - all people with role_A will not be able to do anything with the student table, this table will be invisible to them in most SQL implementations. 

SELECT 

firstName, address from student WHERE studentID=1 ; returns the "Mack Burton" record. 

How NoSQL works

Let's assume we have some data of people in a city and their visits to public places. This example can be used to emulate contact tracing, i.e. knowing who met whom and where, as was done during "Corona" times. We will try and answer some questions using Neo4J as well as OrientDB. Let's assume we have some data about persons, their health status, their location and the confirmation time of the test to decide if they are healthy/sick. 



Persons data 

We also have the visits data, which tells us the time when a person visited a particular place. If the place and visit timings match, the 2 persons were at the same place together for a period of time. So the likelihood of the disease spreading from a sick to a health person increases when they are the same place 



Visits data 

There can be a lot more data about places, how congested or spacious they were and how close the contact was, but we will omit that as we just want to give you an introduction on how NoSQL works. 

Examples of NoSQL 

1. Neo4J is a graph database management system which is gaining popularity recently. The data objects are depicted as nodes and relations between them are depitcted as directed acyclic edges. A discussion on details of the query syntax and semantics is beyond the scope of this discussion, as we just want to introduce you to NoSQL queries. Using Neo4J, if I make a graph of all persons as nodes and their "meets" as an edge , you will get a graph similar to the one below. 

Healthy persons are depicted by green circles and places by pink. If a path/edge exists between 2 persons, they have met (in close contact at a place) some time or the other. The shortest path between 2 persons, let's say Hazel and Judah, will exist if they have met at some place. I will fire a query something similar to the one below:- 

MATCH =shortestPath((from:Person{name:'Hazel  Kirby'})<-[r:VISITS]->(to:Person{name:'Judah Mcdonald'}) 

This query will give me a result like the one below. 

Of course, this is a pictorial depiction, I can also get this data in textual form. I can also find out who has met whom, and in case any one of them gets infected with Corona, it's likely that they might have transmitted the disease to the other person. So, this is what we call "contact tracing". 

The above result was obtained when I fired the query below. 

MATCH (n:Visit) RETURN n LIMIT 25 

2. OrientDB - OrientDb is a Multi-Model NoSQL DBMS that combines the power of graphs and the flexibility of documents into one scalable solution. You can store your data as graphs as well as documents. We take OrientDb's example as its syntax is quite similar to SQL. It's an open source database. We use the same data as in the previous example, and try to find the shortest path which could lead to two persons meeting each other. 

If you look closely you will also find the query that gave you the desired results. Similarly we can find all places where two persons have visited at the same time, if there are multiple such places and one of them is infected, the probability of transmission increases.


 

Which is better: SQL or NoSQL? 


Both SQL and NoSQL are good, but designed for different scenarios. 

SQL is good if

  • You have intrinsically relational data with predefined simple data types 
  • ACID compliance is needed. 
  • Data storage is not huge and most of the data makes sense only when related(JOIN) with other data. 
  • Incoming data is not in millions per second 
  • When you need complex calculations and aggregations on your data, the analytics to be run are more mathematical in nature 
  • When you need to implements transactions and transactional consistency 
  • Vertical scalability is acceptable and you would not need to scale very frequently.

NoSQL is good if

  • There no fixed schema for your data
  • Your data is better represented as objects which have specific properties and objects can be differentiated based on their properties 
  • Your incoming data is in huge volumes and come in at high speeds 
  • When quick reads are more important than ACID compliance and Transactions are not needed. 
  • When you would frequently need to scale, and horizontal scalability is imperative 
  • Your analytics are more fact finding and informational in nature 
  • You need to uncover hidden trends and predict future demand/turn of events.

Conclusion 

We have discussed the evolution and differences on SQL and NoSQL approaches. We have discussed in detail when each of them are applicable, and data requirements that suit the use of  a particular flavour of database. We hope this discussion is helpful for you to make an appropriate decision for your data needs. 

Topics
No items found.
Share

Articles associés

No items found.
No items found.
No items found.

Commencer gratuitement

Rejoignez les milliers d’entreprises qui utilisent Fivetran pour centraliser et transformer leur data.

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.