Quantcast
Channel: htmlagilitypack Forum Rss Feed
Viewing all articles
Browse latest Browse all 655

New Post: Parse HTML with whitelist

$
0
0
This code was taken and revised from a previous discussion authored by DarthObiwan. Mainly, I moved removing any children to the end of the recursion cycle. That and I call removeChild in order to keepGrandChildren.

public void RemoveNotInWhiteList(HtmlNode pNode, IEnumerable<string> pWhiteList)
{
pNode.Attributes
     .Where(att => !pWhiteList.Contains(att.Name))
     .ToList()
     .ForEach(att => att.Remove());            

pNode.ChildNodes
     .ToList()
     .ForEach(att => RemoveNotInWhiteList(att, pWhiteList));

// this operation should be performed at the termination of all stack frames.
if (!pWhiteList.Contains(pNode.Name))
{
    pNode.ParentNode.RemoveChild(pNode, true); // preserve children
    return;
}
}

Viewing all articles
Browse latest Browse all 655

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>