Cookies help us display personalized product recommendations and ensure you have great shopping experience.

By using this site, you agree to the Privacy Policy and Terms of Use.
Accept
SmartData CollectiveSmartData Collective
  • Analytics
    AnalyticsShow More
    unusual trading activity
    Signal Or Noise? A Decision Tree For Evaluating Unusual Trading Activity
    3 Min Read
    software developer using ai
    How Data Analytics Helps Developers Deliver Better Tech Services
    8 Min Read
    ai for stock trading
    Can Data Analytics Help Investors Outperform Warren Buffett
    9 Min Read
    media monitoring
    Signals In The Noise: Using Media Monitoring To Manage Negative Publicity
    5 Min Read
    data analytics
    How Data Analytics Can Help You Construct A Financial Weather Map
    4 Min Read
  • Big Data
  • BI
  • Exclusive
  • IT
  • Marketing
  • Software
Search
© 2008-25 SmartData Collective. All Rights Reserved.
Reading: Data Infrastructures
Share
Notification
Font ResizerAa
SmartData CollectiveSmartData Collective
Font ResizerAa
Search
  • About
  • Help
  • Privacy
Follow US
© 2008-23 SmartData Collective. All Rights Reserved.
SmartData Collective > Big Data > Data Infrastructures
Big Data

Data Infrastructures

hstevens
hstevens
7 Min Read
Image
SHARE

ImageIt is obvious to say that big data rely on computers. It would be practically impossible to implement big data algorithms with pencil and paper – it would just take too much paper and too long to write all the data down.

ImageIt is obvious to say that big data rely on computers. It would be practically impossible to implement big data algorithms with pencil and paper – it would just take too much paper and too long to write all the data down. But the scale of big data is not the only thing that ties it to computers – big data are dependent on computer hardware and software in all kinds of ways. Data has to be manipulated into certain forms to get inside computers, to be transmitted over networks, and so on.

These shapings of data are often hidden from the direct view of data users. But they are important – the shapes that data can take, the ways in which it can be manipulated determine the kinds of things that data can show and tell us. I call these shapes ‘data infrastructures’ – that is, all the structures and forms that data must take inside the computer in order to be manipulated as big data. To give a sense of how significant such infrastructures are and what kinds of influence they have on our thinking, I will explore one ubiquitous example here in some detail.

In the twenty-first century, perhaps the most significant data infrastructure of all is the World Wide Web (WWW). What is the structure of the WWW? Well, it’s a web, of course! The WWW’s most important feature, and arguably the reason for its great success, is the hyperlink: text from any one WWW source can be “marked up” so that it forms a “link” to any other WWW source. This means that data can be cross-linked, suggesting ways of reading and writing that are multiple and non-linear. 

More Read

Sensemaking on Streams – My G2 Skunk Works Project: Privacy by Design (PbD)
Defining Analytics: Analytics
A Taleo Tale: Oracle Reaches for Cloud Credibility
Big Data Equals Big Jobs: New Educational Programs to Help You Snag the Sexiest Jobs in the Industry
Interactive stock visualizations with R

What was the context in which this system was designed and the purposes it was intended to serve? In the 1980s, Tim Berners-Lee, the WWW’s designer, was a computer programmer working at the Counseil Europeen pour la Recherche Nucleaire (CERN – a massive high energy physics lab straddling the border between France and Switzerland). Berners-Lee saw a failure of information management: the many computers at CERN stored information in different ways and in different formats. Although the computers were networked, there was little way to practically find out anything about what was stored on other machines. Berners-Lee saw work being duplicated and effort wasted due to the inability to share data effectively. The WWW was his solution: it was intended radically expand the circulation of all kinds of information within the closed community of CERN. 

In the late 1980s and early 1990s, the WWW was not the only solution to the problem of managing information in an online network. As various networks around the world were joined together into the Internet, different ideas emerged as to how to organize all this newly accessible information. For instance, in 1991, a team at the University of Minnesota released Gopher – a protocol for retrieving documents over the Internet. Unlike the WWW, Gopher consisted of a series on menus: if you wanted to find a page about, for example, mosquitoes, you might navigate to a menu of animals, then to a menu of insects, and then to the page you want. In the early 1990s, Gopher was a real alternative to the WWW – it imposed more hierarchy and organization on data and was therefore faster and more intuitive for finding many kinds of information.

I describe Gopher to show that in fact, however much we now take the WWW for granted, in fact it is one amongst several possible alternatives. It is one particular way of structuring information and the relationships between different pieces of information. It was designed for a particular purpose and that purpose rendered the structure of the WWW particularly decentralized, freeform, and non-hierarchical. This has some advantages, such as accommodating many different kinds of information. But it also has some disadvantages, such as a lack of organization or indexing of information (a problem we have had to solve using search engines).

In any case, the structure of the WWW is not neutral. It makes doing some tasks easier and others harder; it makes some paths or connections simple to follow, others not so simple. Structures like the WWW are so ubiquitous that they become invisible. This, however, does not diminish their importance.

When it comes to big data, we find data infrastructures everywhere: from the structure of hard drives to the organization of algorithms and databases. These physical and virtual structures place constraints on how data can be organized, processed, and accessed. Understanding the advantages and disadvantages of big data (in various forms) means understanding these structures – in particular it means knowing where they came from and what they were designed to do. Ultimately, what we get out of big data will constrained by the structures we put it into.

 

TAGGED:data infrastructures
Share This Article
Facebook Pinterest LinkedIn
Share

Follow us on Facebook

Latest News

fda14abd c869 4da5 943c c036ad8efc2e
How Data-Driven Journalists Are Using API News Apps to Improve Reporting
Big Data Exclusive News
0622cae5 f7d7 4f74 84b5 eabd1a823dca
How Data-Driven Grocery Recommendations Help Shoppers Eat Better With Less Effort
Big Data Exclusive
business recovering from data loss
How Data-Driven Businesses Protect MySQL Databases from Shutdown
Big Data Exclusive
ai driven task management
Reducing “Work About Work” with AI Task Managers
Artificial Intelligence Exclusive

Stay Connected

1.2KFollowersLike
33.7KFollowersFollow
222FollowersPin

SmartData Collective is one of the largest & trusted community covering technical content about Big Data, BI, Cloud, Analytics, Artificial Intelligence, IoT & more.

ai in ecommerce
Artificial Intelligence for eCommerce: A Closer Look
Artificial Intelligence
giveaway chatbots
How To Get An Award Winning Giveaway Bot
Big Data Chatbots Exclusive

Quick Link

  • About
  • Contact
  • Privacy
Follow US
© 2008-25 SmartData Collective. All Rights Reserved.
Welcome Back!

Sign in to your account

Username or Email Address
Password

Lost your password?