Vba web scrapping in vba in Excel
Web Scrapping also called as Web harvesting is technique by which required data are extracted from the websites.
By data, it means a large amount of data that cannot be extracted manually.
There are primarily two steps in Web Scrapping:
1.Fetching the Page or Downloading the page in the Browser.
2.Extracting the data.
Web Scrapping is automated by some program.
The program may be either readymade or can be written in any other language, like VBA,C# and so on.
Scrapping with Excel VBA is most preffered as data is extracted in the form of a table, which Excel can readily accommodate.
The following discussion shows the steps involved in web scrapping.
Get the Structure of the Web Page
Every web page is nothing but a HTML page which are in a Tree like fashion, as discussed in the previous chapters.
All the contents are embedded within Nodes and Attributes. (Recall the various Node elements we discussed in the previous chapter; they are the base of web scrapping program).
To get this HTML content, the following steps are followed.
For example, let us take a simple site: www.sdca.in
Right Click inside the page and select "View page source"
Now, the Web Page HTML structure is shown in the right.
The HTML is actually large which cannot be put into the excel sheet; so the above figure is just a glimpse of how it appears in the web page.
Now, we have the structure of the web page.From this, we have to get the structure and position of each and every node.
For example, to get the title of the page as show in the figure, below,
Notice that, the title "Sivagangai District Cricket Association" is present in third <div> node of <header> node which in turn is a childnode of <section>
The following tree structure illustrates this:
So, first, the <section> node has to be referenced with IXMLDOMNode element.
The following snippet shows this:
- Dim rtNode as IXMLDOMNode
- Dim str as String
- Set rtNode=<xml_doc>.getElementbyId("section")
- str=rtNode.ChildNodes(0).ChildNodes(0).ChildNodes(2).ChildNodes(0).Value
The above snippet is just to illustrate the way of referencing a particular node.
This way, any data can be fetched with the help of VBA.