In the fast-moving world of unstructured data, does it make more sense to use a database management system (DBMS) built from the start to handle the widely accepted JSON data format? Or can an SQL database that now includes JSON functionality be a better choice? Postgres with its SQL roots started offering NoSQL functionality early on with its key-value store functionality, called hstore and introduced in 2006. JSON document storage and management for Postgres arrived somewhat later after MongoDB began life in 2009 as a native JSON document DBMS.
In the fast-moving world of unstructured data, does it make more sense to use a database management system (DBMS) built from the start to handle the widely accepted JSON data format? Or can an SQL database that now includes JSON functionality be a better choice? Postgres with its SQL roots started offering NoSQL functionality early on with its key-value store functionality, called hstore and introduced in 2006. JSON document storage and management for Postgres arrived somewhat later after MongoDB began life in 2009 as a native JSON document DBMS. Since then, MongoDB and Postgres have both been enhancing their JSON storage capabilities.
The Rise and Rise of JSON and JSONB
Deliberate Constraints and Collateral Limitations
Both Postgres and MongoDB offer JSON and JSONB (MongoDB calls its JSONB “BSON”) data storage functionality. There are however differences:
- The BSON format used by MongoDB is limited to a maximum of 64 bits for representing an integer or floating point number, whereas the JSONB format used by Postgres does not have this limit.
- Postgres provides data constraint and validation functions to help ensure that JSON documents are more meaningful: for example, preventing attempts to store alphabetical characters where numerical values are expected.
- MongoDB offers automatic database sharding for easy horizontal scaling of JSON data storage. Scaling of Postgres installations has often been vertical. Horizontal scaling of Postgres is also possible, but tends to be more involved or use an additional third party solution.
- MongoDB also offers the possibility of increasing write throughput by deferring writing to disk. The tradeoff is a potential loss of data, but this may suit users who have less need to persist their data.
In offering both SQL as well as JSON storage, Postgres lets users keep their options open. Data can be routed to a JSON column for possible data modeling afterwards, or to a table using an SQL schema, all within the same Postgres database.
Native JSON Data Stores do not always have the Best Performance
One of the advantages frequently cited for NoSQL database management systems is their performance. Operating with simpler data structures than those of SQL databases, NoSQL database systems have often shown faster speeds of storage and retrieval. While they may lack the ACID (atomicity, consistency, isolation, and durability) properties needed for financial transactions, for example, they may offer advantages in handling larger volumes of unstructured data more rapidly.
However, NoSQL fans got a shock when performance ratings from EnterpriseDB (enterprisedb.com) in 2014 showed Postgres performance to be significantly better than that of MongoDB. The tests were based on selecting, loading, and inserting complex document data to the tune of 50 million records. Postgres was about twice as fast in data ingestion, two-and-half times as fast in data selection, and three times as fast in data inserts. Postgres also consumed 25% less disk space.
Still, performance ratings are made to be beaten. With the introduction of its WiredTiger database engine, MongoDB 3.0 offered improvements in write speeds (between 7 and 10 times as fast), together with data compression of 50% to cut disk space.
Use Cases and Factors Affecting the Choice of Postgres or MongoDB
The question is – where does this leave us in terms of choosing either Postgres or MongoDB for JSON data storage? The answer is that any choice will depend on your goals and your circumstances.
- Focus on the application. MongoDB minimizes the number of database management commands needed in application development. This can fit well with rapid prototyping, as well as queries and commands built on demand by the application. On the other hand, the application itself must insert meaningful data. Software maintenance may require more effort afterwards as well.
- Structure needed later. Postgres offers similar broad powers for unstructured data, but also lets developers migrate to a mixture of unstructured and structured data later. If ACID compliance is likely to be a future requirement as data collected or generated becomes more valuable to its owners, Postgres may be a more suitable choice from the beginning for JSON data storage.
- Static JSON data. For relatively static JSON data and active data naturally structured for SQL storage, Postgres offers the advantage of efficient JSONB representation and indexing capabilities (although ODBC and BI integration enable running SQL queries in MongoDB reporting as well).
- JSON data modification. On the other hand, for JSON data that will be modified within the data store, MongoDB, engineered from the start around JSON documents, offers possibilities for updating individual fields that Postgres does not. While Postgres is efficient in the storage and retrieval of JSON documents, JSON field modification in Postgres requires the extraction of the entire JSON document concerned, modification of the field concerned, and the subsequent rewriting of the document back into the data store.
- Dynamic queries. Typical uses of MongoDB focus on frequently changing data of different types, without any complex transactions between objects. It is suited to dynamic queries of frequently written or read data, offering good performance for the storage of JSON documents with a large number of fields with ad hoc queries on a small subset of those fields.
- Automatic sharding. The automatic sharding functionality of MongoDB may fit well with IT environments using multiple instances of standardized, commodity hardware (converged architectures).
- Costs and resources. The availability and costs of hosting platforms for Postgres and MongoDB may be part of the decision criteria, as well as the ease or expense of hiring developers with the corresponding skills. Resources of Postgres knowledge and talent have been built up over time, encouraged among other things by the inclusion of Postgres at no extra cost in many Linux operating systems. On the other hand, since its introduction, MongoDB has already achieved the status of fifth most popular database technology out of all the technologies available (and not just NoSQL), suggesting that it too benefits from a reasonable pool of talent.
Emotions sometimes run high, even when it comes to purely technical choices. Data-driven decisions are not always easy to make when new releases and new performance ratings continually upset previous evaluations. In addition, the use cases above show that there is no automatic winner. If you have already made a choice between Postgres and MongoDB, sunk effort and acquired expertise may make a change undesirable. However, the experiences of some business users related on the net show that sometimes such choices are reversed even after a significant period of deployment and operation.
In the future, a choice between Postgres and MongoDB for JSON storage may depend on yet other factors. When commenting on the release of JSONB functionality for Postgres, Robert Haas, the chief architect at EnterpriseDB, said, “The implication for NoSQL solutions is that innovation around the format in which you store your data is not enough; you’ve got to come up with truly novel capabilities for working with the data, which is much harder.”
Interested in learning more about the intersection of databases and data analytics? Check out the free whitepaper: Selecting the Right Database Technology for your BI Project, or read about the Pros and Cons of using MySQL for Analytical Reporting.