Vba web scrapping in vba in Excel

Web Scrapping also called as Web harvesting is technique by which required data are extracted from the websites.

By data, it means a large amount of data that cannot be extracted manually.

There are primarily two steps in Web Scrapping:

1.Fetching the Page or Downloading the page in the Browser.

2.Extracting the data.

Web Scrapping is automated by some program.

The program may be either readymade or can be written in any other language, like VBA,C# and so on.

Scrapping with Excel VBA is most preffered as data is extracted in the form of a table, which Excel can readily accommodate.

The following discussion shows the steps involved in web scrapping.

Get the Structure of the Web Page

Every web page is nothing but a HTML page which are in a Tree like fashion, as discussed in the previous chapters.

All the contents are embedded within Nodes and Attributes. (Recall the various Node elements we discussed in the previous chapter; they are the base of web scrapping program).

To get this HTML content, the following steps are followed.

For example, let us take a simple site: www.sdca.in

Right Click inside the page and select "View page source"

excel vba web scrapping in vba

Now, the Web Page HTML structure is shown in the right.

excel vba web scrapping in vba

The HTML is actually large which cannot be put into the excel sheet; so the above figure is just a glimpse of how it appears in the web page.

Now, we have the structure of the web page.From this, we have to get the structure and position of each and every node.

For example, to get the title of the page as show in the figure, below,

excel vba web scrapping in vba

excel vba web scrapping in vba

Notice that, the title "Sivagangai District Cricket Association" is present in third <div> node of <header> node which in turn is a childnode of <section>

The following tree structure illustrates this:

excel vba web scrapping in vba

So, first, the <section> node has to be referenced with IXMLDOMNode element.

excel vba web scrapping in vba

The following snippet shows this:

  1. Dim rtNode as IXMLDOMNode
  2. Dim str as String
  3. Set rtNode=<xml_doc>.getElementbyId("section")
  4. str=rtNode.ChildNodes(0).ChildNodes(0).ChildNodes(2).ChildNodes(0).Value

The above snippet is just to illustrate the way of referencing a particular node.

This way, any data can be fetched with the help of VBA.

 

You can find similar Excel Questions and Answer hereunder

1) Determine if hyperlinks are valid in Excel

2) How to do workbook protection with VBA in Excel

3) How to create Pivot table in excel VBA

4) How to add a link in a sheet to another sheet

5) How can I update a listbox based on data in a list using VBA?

6) How can I dynamically add series to an existing chart using VBA?

7) How can I check if a file exists in a folder using VBA?

8) How do I assign a macro to a shape using VBA?

9) How to create charts in Excel VBA

10) How do I copy a Table from one location to another and retain all formulas, formats and columnwidths?

 

Here the previous and next chapter