Google Squared a bit wobbly

June 4, 2009
53 Views

Google Labs has just released Google Squared. Unlike a Google web search which returns an unstructured list of web pages, Google Squared is designed to return structured data. Searching for US States returns a “square”, much like an Excel spreadsheet or a data frame in R.  The rows are states, and the columns are “facts” about those states: Name, Image, Population, etc. You can customize the columns returned to add new variables.

My first thought was that this would be a great source of data for examples in R. Just the other day, I was looking for a list the populations of the largest US cities to illustrate Zipf’s law — could Google Squared have helped me?  Sadly, no — at least not yet.

The first problem is data quality. That search for US States included Georgia in the top 10 … but if you add “Capital” to the list of variables, the capital is listed as T’bilisi, not Atlanta. To be fair, Google Squares lets you click on a data value and select from other possibilities, so I can change it to Atlanta if I want. But I was hoping that Google Squared would draw on the consensus of the Web, in context with my search, to produce a table of good data values. It seems the intent is to .

Google Labs has just released Google Squared. Unlike a Google web search which returns an unstructured list of web pages, Google Squared is designed to return structured data. Searching for US States returns a “square”, much like an Excel spreadsheet or a data frame in R.  The rows are states, and the columns are “facts” about those states: Name, Image, Population, etc. You can customize the columns returned to add new variables.

My first thought was that this would be a great source of data for examples in R. Just the other day, I was looking for a list the populations of the largest US cities to illustrate Zipf’s law — could Google Squared have helped me?  Sadly, no — at least not yet.

The first problem is data quality. That search for US States included Georgia in the top 10 … but if you add “Capital” to the list of variables, the capital is listed as T’bilisi, not Atlanta. To be fair, Google Squares lets you click on a data value and select from other possibilities, so I can change it to Atlanta if I want. But I was hoping that Google Squared would draw on the consensus of the Web, in context with my search, to produce a table of good data values. It seems the intent is to use Google Squared as an alternative to Excel for collecting data you’ve found and verified yourself on the Web.

Even if you can find the right variables, getting the right records is tricky, too. Let’s say I want to generate data for the 50 US States. First of all, I have to keep clicking “Add next 10 items” until the Square is full of all 53 rows Google generates. (Why can’t I get all the rows in one fell swoop?) Then I have to delete DC, Virgin Islands, Afghanistan and Harvard University: that leaves me with 49 rows. One state is missing, but which one? You can’t sort the rows by state name, which might have helped.

My next thought was to export the Square to R, and match the names against state.name to find the missing one. But, alas, you can’t export the data. C’mon Google, why not a simple CSV export? I have to spend all this time creating and verifying the data, and now you’re not going to let me use it? Grr.

I know this is only a Labs feature, and it does show promise. But with the data quality issues and the inability to export, sadly it doesn’t seem like it’s going to be a useful source of datasets anytime soon.

Link to original post

You may be interested

How SAP Hana is Driving Big Data Startups
Big Data
298 shares3,139 views
Big Data
298 shares3,139 views

How SAP Hana is Driving Big Data Startups

Ryan Kh - July 20, 2017

The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
Data Management
120 views
Data Management
120 views

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

Manish Bhickta - July 20, 2017

Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…

10 Simple Rules for Creating a Good Data Management Plan
Data Management
69 shares689 views
Data Management
69 shares689 views

10 Simple Rules for Creating a Good Data Management Plan

GloriaKopp - July 20, 2017

Part of business planning is arranging how data will be used in the development of a project. This is why…