Quantcast
Channel: htmlagilitypack Forum Rss Feed
Viewing all 655 articles
Browse latest View live

New Post: HtmlAgilitypack on strings C#

$
0
0

Yes it will still work but you will be unable to get the text before and after the first tags

e.g.

abcd

hij
dsfg

hp.DocumentNode.SelectNodes("//div").InnerText == "hij"


New Post: HtmlAgilitypack on strings C#

$
0
0

Thank you for the answer I suppose I will keep trying to get my Regex to match. I do see the advantage in a html-parser, but I don't think it'll work in this instance where the plain-text is also important.

New Post: HtmlAgilitypack on strings C#

$
0
0

If you need help with regex give us a shout have done some complicated regex in the past.

Lee

On Jan 30, 2013 7:17 PM, "Lobsterfun" <notifications@codeplex.com> wrote:

From: Lobsterfun

Thank you for the answer I suppose I will keep trying to get my Regex to match. I do see the advantage in a html-parser, but I don't think it'll work in this instance where the plain-text is also important.

Read the full discussion online.

To add a post to this discussion, reply to this email (htmlagilitypack@discussions.codeplex.com)

To start a new discussion for this project, email htmlagilitypack@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online atcodeplex.com

New Post: HtmlAgilitypack on strings C#

$
0
0

I actually have a pretty big issue.

I am printing to a pdf in C# and my boss wanted me to implement tinymce which went fine. Unfortunately we are using an old pdf printer-class that only supports b, i and u tags in html. but I need to be able to create indents as well. 

this is how indentation looks in tinymce 

[p style="padding-left: 30px;">ijkk</p] (tags look like <p>)

 

Unfortunately I only have a string that contains mixed plaintext and html(from tinymce) so I wanted to write a regex that get's all p tags with attributes(I have done this) and then based on the amount of pixels in "padding-left:", replace it with "   whitespace30px"+text(if that makes sense?)

 

here's what I have come up with so far:

text = Regex.Replace(text, @"<p.*?>(.*)</p>", "    " + "$1");

but the whitespace is hard coded

New Post: Parsing HTML Table Data

$
0
0

Hi eosjack,

Did you find the solution for case sensitivity?

New Post: Is there a wildcard to filter

$
0
0

 

Hi LeeJeary,

  problem was the version i guess ., it was 1.4.0 ! can you check it once.

 

Well now I have switch to 1.4.6 and it worked :-)

 

Anyways thanks for looking into it.

 

Regards,

Salma

New Post: Is there a wildcard to filter

$
0
0

Perfect.. Glad it's working..

Lee

On Jan 30, 2013 11:04 PM, "arsh" <notifications@codeplex.com> wrote:

From: arsh

Hi LeeJeary,

problem was the version i guess ., it was 1.4.0 ! can you check it once.

Well now I have switch to 1.4.6 and it worked :-)

Anyways thanks for looking into it.

Regards,

Salma

Read the full discussion online.

To add a post to this discussion, reply to this email (htmlagilitypack@discussions.codeplex.com)

To start a new discussion for this project, email htmlagilitypack@discussions.codeplex.com

You are receiving this email because you subscribed to this discussion on CodePlex. You can unsubscribe or change your settings on codePlex.com.

Please note: Images and attachments will be removed from emails. Any posts to this discussion will also be available online atcodeplex.com

New Post: HtmlAgilitypack on strings C#

$
0
0

Here is my solution would have to use a string loop through the regex to do the replace as couldn't think of a way to do multiple spaces based on the match.

string html = "erwerw<p style=\"padding-left: 30px;\">ijkk</p>rwwr<p style=\"padding-left: 40px;\">ijkk</p>";
mc = Regex.Matches(html, "(?<complete><p[^>]*style=[^>]*\"[^>]*padding-left:\\s?(?<number>[0-9]*)px[^>]*>(?<text>[^<]*)</p>)");
foreach (Match match in mc)
{
if (match.Groups.Count > 0)
   {
      html = html.Replace(match.Groups["complete"].Value, string.Format("{0}{1}", newstring(' ',Convert.ToInt32(match.Groups["number"].Value)), match.Groups["text"].Value));
   }
}
Hope this helps

New Post: Children nodes

$
0
0

I am analyzing the timetable of a bus company. I get the necessary data from its website. A node of a bus stop looks like this:

<tr data-stopcode="F01093">
    <td>5</td>
    <td ><span>Fifth street</span></td>
</tr>

I have this node in "hmln" and would like to get the name of the stop. I tried two method but only the first was working. Can you tell me what is wrong with the second one?
Working:
hmln.Elements("td").Last().FirstChild.InnerText
Not Working: -> "Object reference not set to an instance of an object."
hmln.LastChild.FirstChild.InnerText

New Post: Children nodes

$
0
0

Hi Labu

here is a solution

AP.HtmlDocument hp = new AP.HtmlDocument();
hp.LoadHtml("<tr data-stopcode=\"F01093\">    <td>5</td>    <td ><span>Fifth street</span></td></tr>");

var nodes = hp.DocumentNode.SelectNodes("//tr[@data-stopcode]//span");
foreach (AP.HtmlNode node in nodes)
{
    string stopname = node.InnerText;
}

So I'm looking for all tr nodes with the attribute stopcode and looking for the span after and getting the text from the span.
hope this helps and you can see why it works

as for your sample the second one you are looking for the last child of the document which is now tr then the first child which is the spaces you have before the first td and your getting the text which is " "

hp.DocumentNode.LastChild.FirstChild.InnerText = " ";

Lee.

New Post: Children nodes

$
0
0

Thanks for your answer.
I might not have made myself clear. -> " for your sample the second one you are looking for the last child of the document which is now tr then the first child which is the spaces you have before the first td and your getting the text which is " ""
HtmlDocument hmln is <tr data-stopcode="F01093"> not the parent of it. Your solution must be working, but having a distinct <tr data-stopcode="F01093"> node is better than having only one of its child because I need the first child too, which is the time of the travel.

New Post: Children nodes

$
0
0

Ok..
What about

AP.HtmlDocument hp = new AP.HtmlDocument();
hp.LoadHtml("<tr data-stopcode=\"F01093\"><td>5</td><td ><span>Fifth street</span></td></tr>");

var nodes = hp.DocumentNode.SelectNodes("//tr[@data-stopcode]");
foreach (AP.HtmlNode node in nodes)
{
    string stopcode = node.Attributes["data-stopcode"].Value;
    string stopname = node.SelectSingleNode(".//span").InnerText;
    string stoptime = node.SelectSingleNode(".//td[position()=1]").InnerText;
}

any better?

New Post: Beginner Help

$
0
0

Thanks for the response and code Lee. I am getting an error right now though... "Object reference not set to an instance of an object." for this line:

var price_shipping = shipping_block.SelectSingleNode(".//span[@class='price_shipping']").InnerText;

Do you know why this is happening?

New Post: Beginner Help

$
0
0

Its probably that there isn't a shipping price for the item. you would need to check that the div with class = 'shipping_block' exists

if (shipping_block != null)
{

}

Lee

New Post: Beginner Help

$
0
0

It's working now, now I have to look at the code and figure out exactly how it's working... Thanks so much!


New Post: HtmlAgilitypack on strings C#

$
0
0

Wow this is perfect :-D
Thank you very much! I didn't know if it was even possible. I had made a different solution, but it is not even close to being as dynamic as this regex!

New Post: Children nodes

$
0
0

Thanks, it's elegant and works fine. But I still don't know why my second trial doesn't work. Could you have a look at that?

New Post: Children nodes

$
0
0

Hi actually both seem to be working for me..
What version are you using? 1.4.6?
what .Net version?

tried version 1.4.0 and works as well..

all on one line no spaces.
test.xml = <tr data-stopcode="F01093"><td>5</td><td ><span>Fifth street</span></td></tr>

HtmlDocument hp = new HtmlDocument();
            
hp.Load(@"C:\Development\test.xml");
var tr = hp.DocumentNode.FirstChild;
// works
var txt = tr.Elements("td").Last().FirstChild.InnerText;
// works
var txtII = tr.LastChild.FirstChild.InnerText;

Whats your exact code and I'll take a peek..

New Post: Children nodes

$
0
0

I am using HAP 1.4.6 with .Net 4.5

New Post: Creating ElementFlags from comments (non stadard HTML)

$
0
0

Hi everyone,

Can you tell me if it is possible to create ElementFlags starting from comments from a HTML source of a page like:

<!-- START_COMMENTZONE -->

test zone

<p> paragraph to take only between comments</p><!-- END_COMMENTZONE -->

HtmlNode.ElementFlags - Start - (<!-- START_COMMENTZONE -->) with
HtmlNode.ElementFlags - End- (<!-- END_COMMENTZONE -->)

I have seen that you can add Elements, but I did not find how to specify the end of the Element wich is normally finalised in </p> by instance.
In my case I need <!-- END_COMMENTZONE --> to be the end of the Element.

I am using VSTO with .vb net

Thank you

Viewing all 655 articles
Browse latest View live




Latest Images