Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Lessons To Learn From How Google Stores Its Data
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Data Management > Lessons To Learn From How Google Stores Its Data
Data Management

Lessons To Learn From How Google Stores Its Data

Anand
Anand
6 Min Read
SHARE

Google Datacenter When it comes to data, there are few companies that have as much data stored on their servers as Google. According to at least one not so recent estimate, Google could be holding as much as 15 exabytes on their servers. That’s 15 million terrabytes of data which would be the equivalent of 30 million personal computers.

Google Datacenter When it comes to data, there are few companies that have as much data stored on their servers as Google. According to at least one not so recent estimate, Google could be holding as much as 15 exabytes on their servers. That’s 15 million terrabytes of data which would be the equivalent of 30 million personal computers. Storing and managing such a humongous volume of data is no easy task and how Google handles this data is a lesson for anybody who deals with cloud and big data.

Designated Servers

Handling servers at that volume is quite similar to how you handle data within a database. A typical database contains tables that perform specific tasks. For instance, the user database only contains information relating to users while payment database contains the billing details of these same users. The reason tables are built this way is in order to make it easy to retrieve specific details in quick time. Querying for billing details from a table that contains only this information is easier than performing the same task on a master table that is much larger.

More Read

Harnessing Collective Intelligence in Decision Making through Big Data Analytics
LinkedIn and Hiring: Dream. Fit. Passion.
Big Data, Good and Evil
GDPR: After 25th May, What Medium And Long Term Actions?
Big Data and Analytics – Suggestions to Approach

On a similar note, Google has servers designated to perform specific tasks. They have Index servers that contain the list of document IDs that contain the user’s query, Ad servers that manage ads on the search results pages, Data-Gathering servers that send out bots to crawl the web and even Spelling servers that help correct the typos present in a user’s search query. Even when it comes to populating these servers, it has been found that Google tries to prioritize organized and structured content over unstructured ones. One study found that it takes Google 14 minutes on an average to crawl content from sitemaps. In comparison, it took the same spiders from the Data-Gathering servers nearly 1375 minutes to crawl content which did not contain sitemaps.

The benefits from using such designated servers and structured data processing systems is two-fold – Firstly, it helps the search engine save time by reaching out to specific servers to retrieve specific components that make up the final results page. Also, it helps run these various processes in parallel and thereby further reduces the time it takes to build the final results page.

Lesson: Classify your servers to perform specific data processing components. It can vastly speed up your processes

Data Storage and Retention

The Google search engine business thrives on data and so how they store, manage and retain data is a great lesson for any technology company. The first mantra when it comes to storage and retention is that not all data is made equal. There are some data that is accessed quite frequently, others not so frequently while the rest may not be relevant any longer. Cached versions of dead web pages (that have been removed or edited) have no reason to be stored on Google servers and may be removed from its servers. At the same time, cached pages of an inconsequential news item from the 1990s doesn’t get accessed frequently today. While such contents cannot be deleted, Google puts such data in “cold storage” which are essentially cheaper servers that keep data in compressed form that makes data retrieval slower.

Lesson: Prioritize your data and decide what to do with them – remove them, put them on cold storage or keep them in your active servers.

Data Loss Prevention

When we are talking about indexing several petabytes of information each day, then the threat to loss of such data is extremely real; especially if you are Google and your business depends on data. According to a paper published on the Google File System (GFS), the company duplicates each data indexed as many as three times. What this means is that if there are 20 petabytes of data indexed each day, Google will need to store as much as 60 petabytes of data. This is then a tripling of your data infrastructure investments and while it is a massive expense, there are no two ways about it.

Lesson: The only way to prevent data loss is by replicating content. It costs money, but is a worthwhile investment that needs to catered to.

 

Share This Article
Facebook Pinterest LinkedIn
Share
ByAnand
Follow:
Anand Srinivasan is the founder of Hubbion, a suite of business apps. The Hubbion Project Management app was ranked among the top 20 in its category for 2017 by Capterra.

Follow us on Facebook

Latest News

Hidden AI, a risk?
Hidden AI, Real Risk: A Governance Roadmap For Mid-Market Organizations
Artificial Intelligence Exclusive Infographic
unusual trading activity
Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
Analytics Exclusive Infographic
Ai agents
AI Agent Trends Shaping Data-Driven Businesses
Artificial Intelligence Exclusive Infographic
Why Businesses Are Using Data to Rethink Office Operations
Why Businesses Are Using Data to Rethink Office Operations
Big Data Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

You Might also Like

5 Ways Predictive Analytics Cuts Enterprise Risk

5 Min Read

Lessons From Our Cyber Past: History of Cyber Intelligence

7 Min Read
big data management in cloud computing
Big Data

Best Practices For Big Data Management In Cloud Computing

8 Min Read

Please put the shower curtain inside the bathtub!

5 Min Read

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

AI and chatbots
Chatbots and SEO: How Can Chatbots Improve Your SEO Ranking?
Artificial Intelligence Chatbots Exclusive
ai is improving the safety of cars
From Bolts to Bots: How AI Is Fortifying the Automotive Industry
Artificial Intelligence

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?