By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data-driven image seo
    Data Analytics Helps Marketers Substantially Boost Image SEO
    8 Min Read
    construction analytics
    5 Benefits of Analytics to Manage Commercial Construction
    5 Min Read
    benefits of data analytics for financial industry
    Fascinating Changes Data Analytics Brings to Finance
    7 Min Read
    analyzing big data for its quality and value
    Use this Strategic Approach to Maximize Your Data’s Value
    6 Min Read
    data-driven seo for product pages
    6 Tips for Using Data Analytics for Product Page SEO
    11 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Demystifying Data Warehouses, Data Lakes and Data Marts
Share
Notification Show More
Latest News
anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
ai in ppc advertising
5 Proven Tips for Utilizing AI with PPC Advertising in 2023
Artificial Intelligence
data-driven image seo
Data Analytics Helps Marketers Substantially Boost Image SEO
Analytics
ai in web design
5 Ways AI Technology Has Disrupted Website Development
Artificial Intelligence
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Quality > Demystifying Data Warehouses, Data Lakes and Data Marts
Big DataData QualityData VisualizationData Warehousing

Demystifying Data Warehouses, Data Lakes and Data Marts

Eran Levy
Last updated: 2016/10/12 at 2:19 PM
Eran Levy
11 Min Read
Image
SHARE
- Advertisement -

Image

Contents
Databases are Basic Data Storage MechanismsTransactional Databases (Write-oriented)Analytical Databases (Read-oriented)Data Warehouses for One Version of the TruthData Marts Serve Insights to Individual DepartmentsData Lakes for Massive Storage that Changes the RulesDoes Your Organization Need a Data Warehouse? Data Lake? Or Both?

As data and analytics become a more integral part of business processes in an organization, so the non-DBAs among us might start to feel lost in a sea of technical terms which are frequently thrown around by technical teams. The disproportionately loud vendor noise that exists in this space further generates jargon, hype, and confusion (just try to get a straight answer to “what is big data”).

- Advertisement -

Image

More Read

data-driven image seo

Data Analytics Helps Marketers Substantially Boost Image SEO

Data Visualization Boosts Business Scalability with Sales Mapping
Data-Driven Marketing Offers Huge Benefits for Landscapers
What Role Does Big Data Have on the Deep Web?
Fascinating Changes Data Analytics Brings to Finance

As data and analytics become a more integral part of business processes in an organization, so the non-DBAs among us might start to feel lost in a sea of technical terms which are frequently thrown around by technical teams. The disproportionately loud vendor noise that exists in this space further generates jargon, hype, and confusion (just try to get a straight answer to “what is big data”).

This post is meant to be used by business users as a (very) abridged guide to the various types of repositories your data might reside in: databases, data marts, data warehouses and data lakes, so that you have a basic understanding of each of these concepts and the role they play in what you’re actually after – real, up-to-date insight from your data.

Ready to get educated? Let’s get started:

- Advertisement -

Databases are Basic Data Storage Mechanisms

In one form or another, the database is at the heart of most data storage and management systems. The relational database used with many applications and systems holds data in tables of rows and columns. In a table, a row corresponds to a record with a set sequence of data fields, while a column lists one given data field for all the records. The data is structured in that only the “right” kind of data can be used in a given field: for example, in a customer relational database, a shipping date cannot be used in a field for a delivery address, and so on.

The structure or “schema” of a relational database is defined before starting to record data. It is often left unchanged afterward. However, by organizing the database as separate tables with defined relationships between them, structured data can be accessed or reassembled in many different ways. By comparison, the need to handle unstructured data has led to the creation of other types of databases. To efficiently handle free-form text in emails or for variable-length video clips, for example, such a non-relational database may have very few fields or different numbers of fields for different records, and may allow fields to be changed “on the fly” after data storage operations begin.

Transactional Databases (Write-oriented)

For key business systems like sales, accounting, and production, it is critical that input of transactional data to the database is quick and reliable, without disruption to the flow of business. The database can be optimized for these “write” operations, by minimizing the duplication of data fields (normalizing the data) among the database tables.

A transactional system can also be efficient in retrieving specific information about an individual transaction, like the date of a shipment to a customer. On the other hand, the “write-oriented” design makes it less well-suited for collating data to provide information such as overall shipment figures over the last two years. This kind of “read-oriented” operation may require massive manipulation of data records or recombination of large tables when querying production databases, either of which could then have a significant negative impact on a transactional system performance.

Analytical Databases (Read-oriented)

Analytical databases systems are optimized for “read” operations and often run separately from transactional-operational systems. From time to time, they ingest data from the transactional systems and possibly other data sources, but otherwise, they perform relatively few “writes”. Analytical systems are used to consolidate data (do roll-ups), slice data (for example, all shipments over one year of a given product), dice data (for example, shipments of a given product to a specific set of customers for a given quarter), and drill down to reveal successive layers of detail from a higher-level statistic.

- Advertisement -

Data Warehouses for One Version of the Truth

Data in an enterprise exists in different formats in various sources and is not necessarily consistent from one source to another. To resolve differences and potential conflicts, a data warehouse consolidates data from the different sources and makes the data available in one unified, harmonized form. The process of getting to this “one version of the truth” for an enterprise or organization is divided into three main steps.

  • Extract. Data is uploaded (perhaps daily, weekly, or even monthly) from different data sources. It may first go to a staging server, before going to the data warehouse.
  • >Transform. The different datasets are made compatible with one another by adjusting formats and resolving conflicts.
  • Load. The transformed data is loaded into the analytical repository for subsequent calculations, trend analyses, reporting, and other business intelligence functions.

A data warehouse is also known as a “schema on write” system because the data written into it by the ETL (extract-transform-load) process has also been given a certain structure or schema. It is often in effect another database, but one that is now optimized for analytic processing, instead of transaction handling.

Data Marts Serve Insights to Individual Departments

A data mart offers the analytical capability for a restricted area of data, for example, for just one functional domain or department in an enterprise. Data marts can help avoid one department interfering with another department’s data. They can also simplify data analytics or meet a smaller, more specific requirement, before trying to tackle a data warehouse for the whole organization. Consequently, there are two points of view about how to implement data warehouses and data marts. One is to start with the data warehouse as an overarching construction. The dependent data marts are then restrictions or subsets of the data warehouse. The other is to make independent data marts from source data, then bring them together afterward to form an overall or larger data warehouse.

Data Lakes for Massive Storage that Changes the Rules

The realization that unstructured data and big data can also be analyzed for business insights have led to the concept of the data lake. In relation to a data warehouse, a data lake offers massive storage with high availability of data at lower costs, together with increased agility and flexibility of use. A data lake can help increase data democracy, allowing users more possibilities to ask new questions as they go. It supports all data from all sources, including relational databases, Hadoop file systems, and social media data. It also retains the original data, as well as transformed versions, for data traceability. However, there is still a need for data governance and understanding of the practical business uses to be made of it. Otherwise, indiscriminate dumping of data may turn a potentially useful data lake into an unhealthy data swamp.

Compared to a data warehouse, which uses a “schema on write” approach to hold structured, processed data, a data lake uses a “schema on read” approach in which structure and schema are only applied when the data is being read from the data lake to answer a business intelligence question or for other analytics purposes. In other words, whereas a data warehouse functions via an ETL process (extract, transform, load), a data lake uses ELT (extract, load, and then transform) instead.

- Advertisement -

Does Your Organization Need a Data Warehouse? Data Lake? Or Both?

The growth in data complexity and the high number of disparate datasets accessed or generated by organizations today are both factors to consider when choosing between solutions for gathering, storing, and analyzing data. Data lakes can be used for massive quantities of unstructured data, but also require tools that can easily interface with them. Data warehouses let organizations create a single source of truth, but come at the cost of agile analysis when they require the involvement of IT professionals to set up new queries or reports.

However, it is also possible to use both. A data lake can be used for sandboxing, allowing users to experiment with different data models and transformations, before setting up new schema in a data warehouse. It can also serve as a staging area, from which to supply data to a data warehouse to then produce cleansed data with known value. By using each possibility suitably, enterprises and organizations can get the best of both solutions.

 

Eran Levy October 12, 2016
Share this Article
Facebook Twitter Pinterest LinkedIn
Share
- Advertisement -

Follow us on Facebook

Latest News

anti-spoofing tips
Anti-Spoofing is Crucial for Data-Driven Businesses
Security
ai in software development
3 AI-Based Strategies to Develop Software in Uncertain Times
Software
ai in ppc advertising
5 Proven Tips for Utilizing AI with PPC Advertising in 2023
Artificial Intelligence
data-driven image seo
Data Analytics Helps Marketers Substantially Boost Image SEO
Analytics

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data-driven image seo
Analytics

Data Analytics Helps Marketers Substantially Boost Image SEO

8 Min Read
data visualization for small business
Data Visualization

Data Visualization Boosts Business Scalability with Sales Mapping

7 Min Read
landscape marketing secrets
Big Data

Data-Driven Marketing Offers Huge Benefits for Landscapers

8 Min Read
big data technology has helped improve the state of both the deep web and dark web
Big Data

What Role Does Big Data Have on the Deep Web?

8 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
data-driven web design
5 Great Tips for Using Data Analytics for Website UX
Big Data

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?