<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Web scraping using XPath and Visual Basic</title>
	<atom:link href="http://www.aswinvanwoudenberg.com/index.php/2009/04/02/web-scraping-using-xpath-and-visual-basic/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.aswinvanwoudenberg.com/index.php/2009/04/02/web-scraping-using-xpath-and-visual-basic/</link>
	<description>Software, math and other geeky stuff</description>
	<lastBuildDate>Thu, 13 Oct 2011 09:36:53 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
	<item>
		<title>By: George</title>
		<link>http://www.aswinvanwoudenberg.com/index.php/2009/04/02/web-scraping-using-xpath-and-visual-basic/#comment-14855</link>
		<dc:creator>George</dc:creator>
		<pubDate>Thu, 13 Oct 2011 09:36:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.aswinvanwoudenberg.com/?p=212#comment-14855</guid>
		<description>Aswin,

Your code works for me with Excel-2010 VBA and IE8 on any website so far. Of course I need some individual adjustments by case.

I have further tried to develop a shorter code, since the user defined Function getXPathElement runs a lot of loops that accummulate when I have an array of e.g. 5 XPathes to scrap 5 elements in batch progressing (ran 5x10000 items at 2sec/set of 5) from 1 web-page earch engine as following:

strPath(1) = &quot;/html/body/div[2]/div/div/div[2]/div/div/div[4]/div/p/strong&quot;
strPath(2) = &quot;/html/body/div[2]/div/div/div[2]/div/div[2]/div[4]/div/p/strong&quot;
strPath(3) = &quot;/html/body/div[2]/div/div/div[2]/div/div[3]/div[4]/div/p/strong&quot;
strPath(4) = &quot;/html/body/div[2]/div/div/div[2]/div/div/div[2]/h2/a&quot;
strPath(5) = &quot;/html/body/div[2]/div/div/div[2]/div/div[3]/div[2]/p[2]&quot;

  For n = 1 To 5
&#039; get last node:
strPathArray = Split(strPath(n), &quot;/&quot;)
lastNode = strPathArray(UBound(strPathArray))
    
&#039; get element by TagName fpr strpath(n) -&gt; item 1,2,3,4,5:
Set elem = ie.document.getElementsByTagName(lastNode).item(n)
ws.Cells(iRow, oCol).Value = elem.innerHTML

Next n

The only problem is that indexed tags with [ ] - like last node from  strPath(5) p[2] - do not provide the innerHTML, while all others strPath(1) thru (4) work fine.

If that&#039;s proper coding - well who knows?

Since you are most deeply involved in this matter, maybe we can discuss another solution?</description>
		<content:encoded><![CDATA[<p>Aswin,</p>
<p>Your code works for me with Excel-2010 VBA and IE8 on any website so far. Of course I need some individual adjustments by case.</p>
<p>I have further tried to develop a shorter code, since the user defined Function getXPathElement runs a lot of loops that accummulate when I have an array of e.g. 5 XPathes to scrap 5 elements in batch progressing (ran 5&#215;10000 items at 2sec/set of 5) from 1 web-page earch engine as following:</p>
<p>strPath(1) = &#8220;/html/body/div[2]/div/div/div[2]/div/div/div[4]/div/p/strong&#8221;<br />
strPath(2) = &#8220;/html/body/div[2]/div/div/div[2]/div/div[2]/div[4]/div/p/strong&#8221;<br />
strPath(3) = &#8220;/html/body/div[2]/div/div/div[2]/div/div[3]/div[4]/div/p/strong&#8221;<br />
strPath(4) = &#8220;/html/body/div[2]/div/div/div[2]/div/div/div[2]/h2/a&#8221;<br />
strPath(5) = &#8220;/html/body/div[2]/div/div/div[2]/div/div[3]/div[2]/p[2]&#8221;</p>
<p>  For n = 1 To 5<br />
&#8216; get last node:<br />
strPathArray = Split(strPath(n), &#8220;/&#8221;)<br />
lastNode = strPathArray(UBound(strPathArray))</p>
<p>&#8216; get element by TagName fpr strpath(n) -&gt; item 1,2,3,4,5:<br />
Set elem = ie.document.getElementsByTagName(lastNode).item(n)<br />
ws.Cells(iRow, oCol).Value = elem.innerHTML</p>
<p>Next n</p>
<p>The only problem is that indexed tags with [ ] &#8211; like last node from  strPath(5) p[2] &#8211; do not provide the innerHTML, while all others strPath(1) thru (4) work fine.</p>
<p>If that&#8217;s proper coding &#8211; well who knows?</p>
<p>Since you are most deeply involved in this matter, maybe we can discuss another solution?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Aswin F. van Woudenberg</title>
		<link>http://www.aswinvanwoudenberg.com/index.php/2009/04/02/web-scraping-using-xpath-and-visual-basic/#comment-2969</link>
		<dc:creator>Aswin F. van Woudenberg</dc:creator>
		<pubDate>Sun, 10 May 2009 18:51:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.aswinvanwoudenberg.com/?p=212#comment-2969</guid>
		<description>&lt;strong&gt;Mathematician&#8217;s Birthday Calendar...&lt;/strong&gt;

In my previous post I described a way of doing web scraping using XPath and VB. I showed how to get stock quotes from the web into Excel using this method. Recently I used the same method to get a list of birthdays of mathematicians. The data was scrap...</description>
		<content:encoded><![CDATA[<p><strong>Mathematician&#8217;s Birthday Calendar&#8230;</strong></p>
<p>In my previous post I described a way of doing web scraping using XPath and VB. I showed how to get stock quotes from the web into Excel using this method. Recently I used the same method to get a list of birthdays of mathematicians. The data was scrap&#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jurre</title>
		<link>http://www.aswinvanwoudenberg.com/index.php/2009/04/02/web-scraping-using-xpath-and-visual-basic/#comment-2611</link>
		<dc:creator>Jurre</dc:creator>
		<pubDate>Tue, 07 Apr 2009 14:20:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.aswinvanwoudenberg.com/?p=212#comment-2611</guid>
		<description>Ja ik zei al dat je dat het best zo kon doen..</description>
		<content:encoded><![CDATA[<p>Ja ik zei al dat je dat het best zo kon doen..</p>
]]></content:encoded>
	</item>
</channel>
</rss>

