Search

12/15/2008

YQL - converting the web to JSON with mock SQL

YQL - converting the web to JSON with mock SQL

Hence we need converters. You can use cURL and beautiful soup or roll your own hell of regular expressions. Alternatively you can use Yahoo Pipes to build your converter. Pipes is the bomb but a lot of people complained that there is no version control and that you need to use the very graphical interface to get to your data (which was the point of Pipes but let's not go there).

for all the open services that don't need authentication you can use these YQL statements as a REST API with JSON output and an optional callback function for JSON-P by adding it to http://query.yahooapis.com/v1/public/yql?. For example to get the latest three headlines from Ajaxian's RSS feed as JSON and wrap it in a function called leechajaxian do the following:

http://query.yahooapis.com/v1/public/yql?select title from rss where url="http://feeds.feedburner.com/ajaxian" limit 3

You can also search the web with YQL: http://query.yahooapis.com/v1/public/yql?q=select title,abstract,url from search.web where query="json" limit 3&format=json&callback=leechajaxian
What about screenscraping? You can get data from any valid HTML document using XPATH with select * from html. For example to get the first 3 tag links on my blog you can do the following:

http://query.yahooapis.com/v1/public/yql?q=select * from html where url="http://wait-till-i.com" and xpath='//a[@rel="tag"]' limit 3&format=json&callback=leechajaxian

yql test console
Creating an OAuth Application - YDN
Wait till I come! » Blog Archive » YQL is so the bomb to get web data as XML or JSON


<script type="text/javascript" charset="utf-8">
function photos(o){
var out = document.getElementById('photos');
var html = '';
for(var i=0;i<o.query.results.result.length;i++){
var cur = o.query.results.result[i];
html += ‘<img src="’+cur.thumbnail_url+’" alt="’+cur.abstract+’">’;
}
out.innerHTML = html;
}
</script>
<script type="text/javascript" src="http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20search.images%20where%20query%3D%22rabbit%22%20and%20mimetype%20like%20%22%25jpeg%25%22&format=json&callback=photos"></script>

Beautiful Soup: We called him Tortoise because he taught us.
Beautiful Soup is a Python HTML/XML parser designed for quick turnaround projects like screen-scraping. Three features make it powerful:

1. Beautiful Soup won't choke if you give it bad markup. It yields a parse tree that makes approximately as much sense as your original document. This is usually good enough to collect the data you need and run away.
2. Beautiful Soup provides a few simple methods and Pythonic idioms for navigating, searching, and modifying a parse tree: a toolkit for dissecting a document and extracting what you need. You don't have to create a custom parser for each application.
3. Beautiful Soup automatically converts incoming documents to Unicode and outgoing documents to UTF-8. You don't have to think about encodings, unless the document doesn't specify an encoding and Beautiful Soup can't autodetect one. Then you just have to specify the original encoding.

沒有留言: