Jan 2, 2010
Parsing html with Yql and php without regular expressionsMost data on the Web is stored in the Hypertext Markup Language (HTML) format. There are many times that you might want to parse HTML in your application. However, programming languages do not provide any easy way to parse HTML.
Evidence of this is the numerous questions posted by programmers looking for an easy way to parse HTML.
I am here with a easy way to parse data from any web page.
You just have to pass Url of the page and the xpath(XPath is used to navigate through elements and attributes in an XML document.)
The Php Code
$url ="http://motyar.blogspot.com"; $xpath ="div[@id='actions']/a"; $queryUrl = "http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D'".$url."'%20and%20xpath%3D'%2F%2F".$xpath."'&format=json"; $ch = curl_init(); curl_setopt($ch, CURLOPT_URL, $queryUrl); curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); $output = curl_exec($ch); curl_close($ch); $data = json_decode($output); $results = $data->query->results; print_r($results);
This script will return a stdClass Object look like this -
stdClass Object ( [a] => Array (  => stdClass Object ( [href] => http://twitter.com/dharmmotyar [id] => twitter [target] => _blank [title] => Follow me on Twitter. [span] => stdClass Object ( [content] => Twitter ) )  => stdClass Object ( [href] => http://motyar.blogspot.com/rss.xml [id] => rss [title] => RSS feed of this site. [span] => stdClass Object ( [class] => hidden [content] => RSS ) )  => stdClass Object ( [href] => http://motyar.blogspot.com [id] => home_link [title] => My homepage. [span] => stdClass Object ( [class] => hidden [content] => Home ) ) ) )
Feel free to share any queries.
If this article helped you and you are feeling very Thankful. You can send me Bitcoins
By : Motyar+ @motyar