By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData Collective
  • Analytics
    AnalyticsShow More
    data science anayst
    Growing Demand for Data Science & Data Analyst Roles
    6 Min Read
    predictive analytics in dropshipping
    Predictive Analytics Helps New Dropshipping Businesses Thrive
    12 Min Read
    data-driven approach in healthcare
    The Importance of Data-Driven Approaches to Improving Healthcare in Rural Areas
    6 Min Read
    analytics for tax compliance
    Analytics Changes the Calculus of Business Tax Compliance
    8 Min Read
    big data analytics in gaming
    The Role of Big Data Analytics in Gaming
    10 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-23 SmartData Collective. All Rights Reserved.
Reading: Lessons To Learn From How Google Stores Its Data
Share
Notification Show More
Latest News
ai in automotive industry
AI Is Changing the Automotive Industry Forever
Artificial Intelligence
SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science
ai software development
Key Strategies to Develop AI Software Cost-Effectively
Artificial Intelligence
Aa
SmartData Collective
Aa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Data Management > Lessons To Learn From How Google Stores Its Data
Data Management

Lessons To Learn From How Google Stores Its Data

Anand
Last updated: 2016/07/07 at 12:57 PM
Anand
6 Min Read
SHARE

Google Datacenter When it comes to data, there are few companies that have as much data stored on their servers as Google. According to at least one not so recent estimate, Google could be holding as much as 15 exabytes on their servers. That’s 15 million terrabytes of data which would be the equivalent of 30 million personal computers.

Google Datacenter When it comes to data, there are few companies that have as much data stored on their servers as Google. According to at least one not so recent estimate, Google could be holding as much as 15 exabytes on their servers. That’s 15 million terrabytes of data which would be the equivalent of 30 million personal computers. Storing and managing such a humongous volume of data is no easy task and how Google handles this data is a lesson for anybody who deals with cloud and big data.

Designated Servers

Handling servers at that volume is quite similar to how you handle data within a database. A typical database contains tables that perform specific tasks. For instance, the user database only contains information relating to users while payment database contains the billing details of these same users. The reason tables are built this way is in order to make it easy to retrieve specific details in quick time. Querying for billing details from a table that contains only this information is easier than performing the same task on a master table that is much larger.

More Read

data security in big data age

6 Reasons to Boost Data Security Plan in the Age of Big Data

Four Strategies For Effective Database Compliance
Use this Strategic Approach to Maximize Your Data’s Value
5 Big Data Storage Solutions
AI Significantly Increases the Dangers of Social Media Hacking

On a similar note, Google has servers designated to perform specific tasks. They have Index servers that contain the list of document IDs that contain the user’s query, Ad servers that manage ads on the search results pages, Data-Gathering servers that send out bots to crawl the web and even Spelling servers that help correct the typos present in a user’s search query. Even when it comes to populating these servers, it has been found that Google tries to prioritize organized and structured content over unstructured ones. One study found that it takes Google 14 minutes on an average to crawl content from sitemaps. In comparison, it took the same spiders from the Data-Gathering servers nearly 1375 minutes to crawl content which did not contain sitemaps.

The benefits from using such designated servers and structured data processing systems is two-fold – Firstly, it helps the search engine save time by reaching out to specific servers to retrieve specific components that make up the final results page. Also, it helps run these various processes in parallel and thereby further reduces the time it takes to build the final results page.

Lesson: Classify your servers to perform specific data processing components. It can vastly speed up your processes

Data Storage and Retention

The Google search engine business thrives on data and so how they store, manage and retain data is a great lesson for any technology company. The first mantra when it comes to storage and retention is that not all data is made equal. There are some data that is accessed quite frequently, others not so frequently while the rest may not be relevant any longer. Cached versions of dead web pages (that have been removed or edited) have no reason to be stored on Google servers and may be removed from its servers. At the same time, cached pages of an inconsequential news item from the 1990s doesn’t get accessed frequently today. While such contents cannot be deleted, Google puts such data in “cold storage” which are essentially cheaper servers that keep data in compressed form that makes data retrieval slower.

Lesson: Prioritize your data and decide what to do with them – remove them, put them on cold storage or keep them in your active servers.

Data Loss Prevention

When we are talking about indexing several petabytes of information each day, then the threat to loss of such data is extremely real; especially if you are Google and your business depends on data. According to a paper published on the Google File System (GFS), the company duplicates each data indexed as many as three times. What this means is that if there are 20 petabytes of data indexed each day, Google will need to store as much as 60 petabytes of data. This is then a tripling of your data infrastructure investments and while it is a massive expense, there are no two ways about it.

Lesson: The only way to prevent data loss is by replicating content. It costs money, but is a worthwhile investment that needs to catered to.

 

Anand July 7, 2016
Share this Article
Facebook Twitter Pinterest LinkedIn
Share
By Anand
Follow:
Anand Srinivasan is the founder of Hubbion, a suite of business apps. The Hubbion Project Management app was ranked among the top 20 in its category for 2017 by Capterra.

Follow us on Facebook

Latest News

ai in automotive industry
AI Is Changing the Automotive Industry Forever
Artificial Intelligence
SMEs Use AI-Driven Financial Software for Greater Efficiency
Artificial Intelligence
data security in big data age
6 Reasons to Boost Data Security Plan in the Age of Big Data
Big Data
data science anayst
Growing Demand for Data Science & Data Analyst Roles
Data Science

Stay Connected

1.2k Followers Like
33.7k Followers Follow
222 Followers Pin

You Might also Like

data security in big data age
Big Data

6 Reasons to Boost Data Security Plan in the Age of Big Data

7 Min Read
database compliance guide
Data Management

Four Strategies For Effective Database Compliance

8 Min Read
analyzing big data for its quality and value
Big Data

Use this Strategic Approach to Maximize Your Data’s Value

6 Min Read
Data Management

5 Big Data Storage Solutions

6 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence
AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US

© 2008-23 SmartData Collective. All Rights Reserved.

Removed from reading list

Undo
Go to mobile version
Welcome Back!

Sign in to your account

Lost your password?