Search

8/26/2009

Data Scraping Wikipedia with Google Spreadsheets « OUseful.Info, the blog…

get IMDB rating: =importXml("http://www.imdb.com/title/tt0088247/", "//div[@class='meta']/b")
Data Scraping Wikipedia with Google Spreadsheets « OUseful.Info, the blog…

So to recap, we have scraped some data from a wikipedia page into a Google spreadsheet using the =importHTML formula, published a handful of rows from the table as CSV, consumed the CSV in a Yahoo pipe and created a geocoded KML feed from it, and then displayed it in a YahooGoogle map.

Functions : Functions for external data - Google Docs Help
Functions: Functions for external data

This new feature lets you get information from filetypes such as xml, html, csv, tsv, as well as RSS and Atom feeds that you might read today in Google Reader.

Additionally, the limit on functions per spreadsheet is 50.

=importXML("URL","query")
* URL - the URL of the XML or HTML file
* query - the XPath query to run on the data given at the URL. For example, "//a/@href" returns a list of the href attributes of all <a> tags in the document (i.e. all of the URLs the document links to). For more information about XPath, please visit http://www.w3schools.com/xpath/
* Example: =importXml("www.google.com", "//a/@href"). This returns all of the href attributes (the link URLs) in all the <a> tags on www.google.com home page

=ImportHtml(URL, "list" | "table", index). This imports the data in a particular table or list from an HTML page. The arguments to the function are as follows:

* URL - the url of the HTML page
* either "list" or "table" to indicate what type of structure to pull in from the webpage. If it's "list," the function looks for the contents of <UL>, <OL>, or <DL> tags; if it's "table," it just looks for <TABLE> tags.
* index - the 1-based index of the table or the list on the source web page. The indices are maintained separately so there might be both a list #1 and a table #1.
* Example: =ImportHtml("http://en.wikipedia.org/wiki/Demographics_of_India", "table",4). This function returns demographic information for the population of India.

Calling Amazon Associates/Ecommerce Web Services from a Google Spreadsheet « OUseful.Info, the blog…

沒有留言: