I am using the htmlagilitypack to parse an xml document. I use it to load the string as a htmldocument and then use the xmltextreader to parse. I will occasionally get an unhandled stackoverflow exception on htmlagility.dll. The specific line is
internal Dictionary<string, Htmlattribute> Hashitems = new Dictionary<string, HtmlAttribute>()
In fact there are different stackoverflow exception errors occurring at different lines but the same error and only while using htmlagilitypack. I am in another method trying to parse the xml using xmldocument, xpathnavigator and it works fine unless I get some bad xml, then I go to this method. I have set up exception catches to just move the bad xml to a folder and then exit this method but I cannot catch these kinds of exceptions or can I?
Another line where error shows:
public string Name { get { if (_name == null) { Name = _ownerdocument.Text.Substring(_namestartindex, _namelength); } return _name != null ? _name.ToLower() : string.Empty;
on the last line in the snippet above in the file HtmlNode.cs. The call stack window shows the top as:
HtmlAgilityPack.dll!HtmlAgilityPack.HtmlNode.Name.get() Line 432 + 0x21 bytes
My code sample
Try Dim hdoc = New HtmlAgilityPack.HtmlDocument() hdoc.LoadHtml(xmlsnippet) Dim nreader As XmlTextReader = New XmlTextReader(New StringReader(xmlsnippet)) Dim ncount As Integer = 0 While nreader.Read If Not nreader.Name = "" Then ncount += 1 If ncount = 18 Then Exit While End If num += 1 nodelist.Add(nreader.Name) If nreader.Name = "id" Then statid = nreader.ReadInnerXml End If If nreader.Name = "published" Then contentDate = nreader.ReadInnerXml contentDate = Regex.Replace(contentDate, "T", " ") contentDate = Regex.Replace(contentDate, "\+", " ") contentDate = contentDate.Replace("Z", "") End If If nreader.Name = "summary" Then ctext = nreader.ReadInnerXml End If If nreader.Name = "title" Then csubject = nreader.ReadInnerXml If csubject.Contains("posted") Then template = csubject author = Regex.Replace(template, "posted.*", "") End If If csubject.Contains("Keyword -") Then Dim tip As String = csubject searchterm = Regex.Replace(csubject, "xxxxxx.*xxxxxx.*xxxx.*-", "") searchterm = Regex.Replace(searchterm, "xxxxx.*xxxxxx.*Search.*-", "") Trim(searchterm) End If End If End If End While Dim mreader As XmlTextReader = New XmlTextReader(New StringReader(xmlsnippet)) Dim mcount As Integer = 0 While mreader.Read If Not mreader.Name = "" Then mcount += 1 If mcount > 15 Then If mreader.Name = "uri" Then authorUri = mreader.ReadInnerXml Trim(authorUri) If authorUri = "http://www.xxxxxxxx.com/" Then authorUri = "" End If End If If mreader.Name = "name" Then author = mreader.ReadInnerXml If author = "xxxxxx" Then author = "" End If End If If mreader.Name = "content" Then htext = mreader.ReadInnerXml End If If mreader.Name = "link" Then Dim address As String address = mreader.ReadOuterXml If address.Contains("related") Then Dim regex12 As Regex = New Regex("<link.*rel.*href=""(?<Link>.*?)"".*/>", RegexOptions.IgnoreCase) Dim m12 As Match = regex12.Match(address) himage = m12.Groups("Link").Value ElseIf address.Contains("alternate") Then Dim regex13 As Regex = New Regex("<link.*rel.*href=""(?<Link>.*?)"".*/>", RegexOptions.IgnoreCase) Dim m13 As Match = regex13.Match(address) authorUri = m13.Groups("Link").Value End If End If If mreader.Name = "subtitle" Then hsubtitle = mreader.ReadInnerXml End If End If End If End While Catch ex As Exception appLogs.constructLog(ex.Message.ToString, True, True) Exit Sub End Try