Hi Community,
I am working on an HTML parsing related utility. During this work HTML Agility Pack is helping me so much.
I am just having a problem, in parsing some html content which is malformed. I want to get all the forms of the html and process them one by one. But on of my forms has an unclosed <li> tag, due to which, the html agility parser, brings all the html present after its parent form in it.
For example:
<form1></form1>
<form2>
<li>
</form2>
<form3></form3>
<form4></form4>
Now, when I do something like this:
var _document = new HtmlDocument();
_document.OptionAutoCloseOnEnd = true;
HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("form"); HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("option");
_document.Load(@"C:\HTMLPage1.htm");
var formNodes = _document.DocumentNode.SelectNodes("//form");
foreach (var node in formNodes)
{
Console.Log(node.OuterHtml);
}
for second form node, it will emit html of form3 and form4 as well.
Any help will be highly appreciated.
Thanks,