Quantcast
Channel: htmlagilitypack Forum Rss Feed
Viewing all articles
Browse latest Browse all 655

New Post: Parsing Nginx Auto Index

$
0
0
I'm trying to parse an nginx auto index page to get the links from a download directory and their timestamps.

I have successfully retrieved the links and their "names" so to speak but I am struggling with the timestamp.

I have the following code:
return doc.DocumentNode.SelectNodes("//a").Select(anchor => new IndexPageLink
                {
                    Link = new Uri(root, anchor.InnerText),
                    Name = anchor.InnerText
                })
Which is parsing the following HTML structure
<pre><a href="../">../</a>
<a href="file.txt">file.txt</a>      24-Jan-2014 01:50    5M
</pre>
I've tried looking at the next element, which correctly shows as text element but it only has new line characters. I can definitely see the text when I look at the document from the pre node but it would be nice to process relative to the anchors that I find with the select nodes search.

Any ideas?

Viewing all articles
Browse latest Browse all 655

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>