Zynga: A Big Data Company Masquerading as a Gaming Company
How much data does an online game developer like Zynga create and use on a daily basis? The answer is, not surprisingly, a lot. They operate at such a large scale that on a regular day Zynga delivers one petabyte of content. In order to cope with these extreme high demands of data, they have built a flexible cloud server centre that can easily add up to 1,000 servers in just 24 hours. In fact, Zynga’s private and public cloud server park is known as one of the biggest hybrid clouds.
Zynga is built on top of major platforms such as Facebook, Google+ or Android/iOS, and offers its own Zynga API. Data at Zynga is divided among two types:
- Game data, which is Vertica driven and generates approximately 60 billion rows of data and 10 terabyte of semi-structured data on a daily basis.
- Server data, generating over 13 terabyte of raw log data from the server and app logs. This data is stored in Vertica or Hadoop.
Interestingly, Zynga’s database keeps growing as they never delete the data because of the complex process this requires.
Zynga has released an infographic that gives more insight in their big data usage:
So, big data is truly big at Zynga, but how do they cope with it?
Metrics driven culture
At Zynga everything resolves around metrics and for the management at Zynga, metrics are a discipline. There is a strong desire at the management to track goal progress by metrics and to support this there is an open ad-hoc SQL access, reports are freely accessible by everyone and integrating external services is easy. Brian Reynolds explains that at Zynga the designers are separated from those analysing the metrics. Analysts need to figure out what questions should be asked and the designers will develop/adjust the game around the answer.
A great example of this data drive decision-making is that they pivoted the use of animals in Farmville 2.0. In the original version of Farmville, animals were merely decoration. However, data showed that more and more people started interacting with the animals and even use real money to buy additional virtual animals. So, in Farmville 2.0 animals were made much more central.
Due to this metrics driven culture, Zynga combines art with science: Art is related to creating, developing and implementing an idea into a game. With the science behind it they listen to customers and find out whether the game is fun or not. Subsequently, they can adjust and pivot games if necessary.
The scalable service architecture
Zynga operates a large number of different databases for different tasks. For example, they use Splunk in order to store primary log analytics. They have 70 nodes and 650 million of rows of data on a daily basis that is stored in a streaming event database using a MySQL cluster. They have sharded transactional databases and they use a Vertica Data Warehouse.
The statistics that Zynga generates are gigantic as well. They have over 6.000 different report types and 15.000 ad hoc queries from users on a daily basis. Each day they run 3.000 reports. Analysts, product managers, engineers and business intelligence teams use all these insights to optimize and improve the operations and products.
All together, Zynga is a truly big data player and uses big data to create engaging social games that people enjoy and play with friends.
Copyright Big Data Startups 2013. You may share using our article tools. Please don’t cut articles from BigData-Startups.com and redistribute by email or post to the web.
Mark van Rijmenam is Co-founder and CEO of Datafloq. Datafloq is the One-Stop Shop around Big Data. We are the number one Big Data platform connecting Data and People, connecting all stakeholders in the global Big Data market. Mark is a strategist who advises organisations on how to develop their big data strategies. As such, he is a well sought after speaker on this topic. His book “Think ...