I'd like to read out phone numbers from a HTML table. However the cell text is parsed wrongly:
HTML source:
C# programm code:
HTML source:
<TD class="tnum">0176
49329688<BR>4989/6492673<BR>123<BR>456<BR>789<BR>123<BR>456<BR>789<BR>012</TD>
Output:0176 \r\n 493296884989/6492673123456789123456789012
Desired Output:0176 49329688
4989/6492673
123
456
789
123
456
789
012
Does someone know what's wrong?C# programm code:
string hTMLDocumentPath = File.ReadAllText(@"D:\Temp\fonbook_list.htm");
HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(hTMLDocumentPath);
// get phone book table in the document
var table = doc.DocumentNode.SelectSingleNode("//div[@id='uiScroll']/table/tbody")
.Descendants("tr")
.Select(n => n.Elements("td").Select(e => e.InnerText).ToArray());
// print entries
string output = "" + Environment.NewLine;
foreach (var tr in table)
{
if (tr.Length == 10)
{
string[] phoneNumbers = tr[2].Replace("\r\n", "").Replace(" ", "").Split(new string[] { "<br>" }, StringSplitOptions.None);
string[] phoneTypes = tr[3].Split(new string[] { "<br>" }, StringSplitOptions.None);
output += tr[1] + Environment.NewLine;
for (int i = 0; i < phoneNumbers.Length; i++)
{
output += phoneTypes[i] + ": " + phoneNumbers[i] + Environment.NewLine;
}
}
output += (Environment.NewLine);
}