Quantcast
Channel: htmlagilitypack Forum Rss Feed
Viewing all articles
Browse latest Browse all 655

New Post: Problems with HTML Character References (e.g. '1') with proposed fix.

$
0
0
A better fix would be to consider all possible Html Entities like this:
        private const string HtmlEntitiesPattern = @"&([a-z]{2,10}|#\d{1,10}|#x[0-9a-f]{1,8});";
        private static readonly Regex HtmlEntitiesPatternRegex = new Regex(HtmlEntitiesPattern, RegexOptions.Compiled | RegexOptions.IgnoreCase);

        public static string FixDoublEntityEncoding(string document)
        {
            return HtmlEntitiesPatternRegex.Replace(document, "&$1;");
        }

Viewing all articles
Browse latest Browse all 655

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>