Quantcast
Channel: htmlagilitypack Forum Rss Feed
Viewing all articles
Browse latest Browse all 655

New Post: Link extraction problem

$
0
0
I have written a code in VB.net.

The expected output of my program would be a list of extracted links that are inside the
<a href tag and has a word in common.

In my program i want to display all links that contains the word "test".

For example:
www.drivetest.ca/
www.drivetest.ca/EN/bookatest/Pages/Road-Test-Booking.aspx
www.drivetest.ca/EN/drivereducation/Pages/Driver-Testing.aspx
www.cic.gc.ca/english/citizenship/cit-test.asp
But my program is not displaying anything at all. Where did i go wrong?

Here is my code:
 Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click

        Dim webClient As New System.Net.WebClient
        Dim WebSource As String = webClient.DownloadString("http://www.google.com.ph/search?hl=en&as_q=test&as_epq=&as_oq=&as_eq=&as_nlo=&as_nhi=&lr=&cr=countryCA&as_qdr=all&as_sitesearch=&as_occt=any&safe=images&tbs=ctr%3AcountryCA&as_filetype=&as_rights=#as_qdr=all&cr=countryCA&fp=1e63a873f2e9c884&hl=en&lr=&q=test&start=20&tbs=ctr:countryCA")
        RichTextBox1.Text = WebSource

        Dim links As New List(Of String)()
        Dim htmlDoc As New HtmlAgilityPack.HtmlDocument()
        htmlDoc.LoadHtml(WebSource)

        For Each link As HtmlNode In htmlDoc.DocumentNode.SelectNodes("//a[@href]")

            If link.InnerText.Contains("test") Then
                ListBox1.Items.Add(link.InnerText)
            End If

        Next


    End Sub
I am currently new to this HtmlAgilityPack, I am still learning please bear with me.

Viewing all articles
Browse latest Browse all 655

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>