Using Google Docs for Web Scraping

January 27, 2009
51 Views

While trying to scrape some data from a Website , I chanced upon the getXML function which is pretty neat, as it basically allows you to import the XML feed of a webpage and then parse the data appropriately.
 
Here is an example-
 
Using the getXML function I parsed all links for “analytics consultant in India” search […]

While trying to scrape some data from a Website , I chanced upon the getXML function which is pretty neat, as it basically allows you to import the XML feed of a webpage and then parse the data appropriately.

 

Here is an example-

 

Using the getXML function I parsed all links for “analytics consultant in India” search results in Google.

The GetXML function works as follows (from the support page here )

Functions:

=importXML("URL","query")

  • URL – the URL of the XML or HTML file
  • query – the XPath query to run on the data given at the URL. For example, "//a/@href" returns a list of the href attributes of all <a> tags in the document (i.e. all of the URLs the document links to). For more information about XPath, please visithttp://www.w3schools.com/xpath/
  • Example: =importXml("www.google.com", "//a/@href"). This returns all of the href attributes (the link URLs) in all the <a> tags on www.google.com home page

 

You can see it here-

http://spreadsheets.google.com/pub?key=pS9vSxWuwOllXHdueY0TDdg

or Using the Embed Function

 

You may be interested

How SAP Hana is Driving Big Data Startups
Big Data
298 shares3,060 views
Big Data
298 shares3,060 views

How SAP Hana is Driving Big Data Startups

Ryan Kh - July 20, 2017

The first version of SAP Hana was released in 2010, before Hadoop and other big data extraction tools were introduced.…

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion
Data Management
60 views
Data Management
60 views

Data Erasing Software vs Physical Destruction: Sustainable Way of Data Deletion

Manish Bhickta - July 20, 2017

Physical Data destruction techniques are efficient enough to destroy data, but they can never be considered eco-friendly. On the other…

10 Simple Rules for Creating a Good Data Management Plan
Data Management
69 shares664 views
Data Management
69 shares664 views

10 Simple Rules for Creating a Good Data Management Plan

GloriaKopp - July 20, 2017

Part of business planning is arranging how data will be used in the development of a project. This is why…