Quantcast
Channel: htmlagilitypack Forum Rss Feed
Viewing all articles
Browse latest Browse all 655

New Post: Malformed HTML parsing problem - unclosed li element within a form

$
0
0

Hi Community,

I am working on an HTML parsing related utility. During this work HTML Agility Pack is helping me so much.

I am just having a problem, in parsing some html content which is malformed. I want to get all the forms of the html and process them one by one. But on of my forms has an unclosed <li> tag, due to which, the html agility parser, brings all the html present after its parent form in it.

For example:

<form1></form1>

<form2>

<li>

</form2>

<form3></form3>

<form4></form4>

Now, when I do something like this:

var _document = new HtmlDocument();

_document.OptionAutoCloseOnEnd = true;

HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("form");            HtmlAgilityPack.HtmlNode.ElementsFlags.Remove("option");
_document.Load(@"C:\HTMLPage1.htm");
var formNodes = _document.DocumentNode.SelectNodes("//form");

foreach (var node in formNodes)

{

Console.Log(node.OuterHtml);

}

for second form node, it will emit html of form3 and form4 as well.

Any help will be highly appreciated.

Thanks,

 

 


Viewing all articles
Browse latest Browse all 655

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>