Pros and Cons of Using MySQL for Analytical Reporting
Is MySQL a good solution for reporting and data analytics? The internet is bombarded with 1000’s of guides or courses to installing, configuring, and optimizing MySQL for running analytics. However, none of these write-ups stop to even ask the question: “Is MySQL the right solution for business intelligence?”
With the obvious caveat that every project has a different scope, timeline and budget. This article will attempt to answer the question of how Oracle’s MySQL fits into the realm of analytics by enumerating some of its characteristics. This article isn’t going to tell you if MySQL is good or not, but rather, in what situations it could be a good fit for a BI initiative.
Understanding the Requirements
Type of Data
What type of data are you analyzing? MySQL is designed to manage relational data. If you have a lot of qualitative data (like social media posts, images, or bodies of text), do not even consider MySQL. Though MySQL has implemented measures to better handle these types of data, there are NoSQL solutions much better suited for this task, such as MongoDB. This is especially true when you’re looking to run analytical queries on the data.
Size of Data Set
How much data are you dealing with? Many people tout that one of the biggest limits of MySQL is performance degradation when dealing with data sets over 1,000,000 rows. However, usually, digging a bit deeper reveals that this is due to improper table design, inadequate hardware resources, or unoptimized queries and indexing. As the use of MySQL by firms like Facebook would attest to, MySQL can be used efficiently for very large data sets. So, if your project will be handling data sets over about 500GB, consider taking the time to have it properly configured to handle data at scale.
Another way to overcome size limitations in MySQL would be to use Business Intelligence (BI) tools for MySQL reporting in order to take the load off of the database itself. These tools read the necessary data incrementally from your database and then transform, analyze, and store it on their servers.
Speaking of scale, look a few years into the future and consider the growth of your firm. Is it reasonable that you will be handling large or very large data sets in the near future? If so, plan for custom configuration of your data architecture to accommodate your growth accordingly.
Freshness of Data
An important concept to understand when implementing your analytics is data freshness. In other words, how often do you need your data warehouse and data marts updated? Do you need it every second, or will an overnight update suffice? MySQL can handle almost any refersh rate you want to implement but the configuration of your both your data architecture and hardware will need change to match your goals.
Often times, when the performance of a MySQL setup is suboptimal, the hardware requirements for that configuration are overlooked. This leads to MySQL being labeled as ‘slow’ or ‘poor performing on large datasets when MySQL isn’t at fault. Plan your configuration and make sure it has the resources it needs to operate efficiently. This is also worth keeping in mind when implementing BI tools. When it comes to data freshness, the more often you update from your database, the more load you are putting on it.
Speed / Availability
How intense are your demands going to be when your teams of analysts are mining, visualizing, and reporting on your data? How many concurrent users will be querying the data set? Also, do you need to analyze all of your data or only a portion of it? In the world of analytics we often collect way more data than is needed for any single analysis. This means that we can effectively model our data warehouse and data marts to minimize the size of the resulting data set that we are actually working with making the speed at which results are available much faster.
For instance, if you are looking to analyze your web traffic to improve opt-in rates, there’s no need to include customer data (such as names, addresses, or order history) in that data set. This is not so much an issue if you are working with BI tools, as once the data is loaded into the BI system, there is no load on your database any longer. So, the speed at which your data is available for analytics is dependent on the configuration of the BI system, not the MySQL database.
Specific Considerations for MySQL
Let’s take a look at a few key aspects of MySQL to get a better feel for how it fits into your plan.
Features and Functionality
There are many choices on the market when it comes to choosing a database management system (DBMS) and they all have their strengths. However, due to its low cost (free for the Community Edition), large support community, and flexibility, MySQL is one of the most popular database options for many applications. Because of this popularity, combined with the booming nature of data analytics, many users of MySQL have demanded that its functionality keep pace or risk falling out of favor with developers. This has led to many of the shortcomings that had previously limited MySQL being resolved in the most recent versions of the DBMS.
For example, one issue in earlier versions was that there was no native support for storing values in the JSON data type. In the most recent version (5.7), this is no longer the case. The new native binary format, and functions for working with it, allow for more efficient processing & storage leading to improved performance when dealing with the popular data type.
On the surface, this may not seem like a big improvement. But, given MySQL’s popularity in today’s web app-rich world where JSON is a widely used data type, the impact is significant. Many applications will benefit from better data integrity by maintaining a consistent data type across the lifetime of the data. And, as we know, data integrity is key to business metrics. Get it wrong and you stand the chance of making mistakes like losing sales or ordering surplus inventory. All of this can affect your bottom line.
Analyzing Business Data
Depending on the storage engine you choose during configuration, and other factors, running the actual analysis on your data may take a significant amount of time. Adding a dedicated BI layer is one of the best things you can do to help this situation. It separates the data being analyzed from the data in your transactional database, allowing the analysis to be run completely separate. But, again, the different BI tools have their benefits and drawbacks. So, make sure to compare them to identify the ones that will work best for your project.
Another way that this can be mitigated is by employing a processing engine ( such as Apache Spark) to do the heavy lifting and splitting your data into smaller MySQL data marts distributed across multiple nodes. Then, store the results of the analysis in those data marts instead of all of the data itself.
Other ways that can help overcome long analysis times are by optimizing for application, partitioning, using columnar & compression enabled engines, replication, sharding, etc. How to implement these are outside the scope of this article but now you know MySQL is capable of them.
Everyone has a budget. Whether you are CIO of a Fortune 500 or CEO of your own startup, the discussion will eventually come to ‘How much does this cost?’ And the answer to that question can be really cool. Not counting salaries to pay developers, hardware, maintenance, etc., MySQL can be free. The Community Edition of the DBMS is free to use and is widely supported by hosting companies and analytics tools. It isn’t one of those wimpy ‘free-but- with-severely-limited-functionality’ deals either. Actually it’s quite robust and will be able to handle many projects with the default configuration. However, for larger enterprise projects, you would probably need a paid edition.
MySQL can be an effective solution for many analytics projects. It has many qualities that make it an ideal database environment to implement analytics projects. It’s a good fit out of the box for entry-level startups and mid-sized firms just getting started with data analytics. And, for larger applications, it can be scaled and optimized to fit. This is exactly why many firms have chosen to use it and why it has received such wide support from a varied community.
This massive support is what continually drives the refinement of MySQL. Every new version released includes refined functionality that keeps it a viable solution for many data storage needs. Of course, all of the factors mentioned above are going to dictate the level of complexity for your installation. The greater your demands are, the more optimized you are going to need your configuration to be. If you will be operating on any level of scale, consider implementing a dedicated BI layer to your data architecture.