Dofactory.com
Dofactory.com
 Back to list
Views:   8.6K
Replies:  0
Archived

Parsing of HTML: Use Regex or HTML Agility Pack?

I need to parse some HTML in my project.  It is fairly simple and controlled HTML, that is, we don't parse just any malformed HTML out there in the wild.

I was thinking of using Regex for this purpose, but I am not (yet) an expert in building Regex patterns.
However, I found the following pattern that will match all HTML tags:

<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>
Does anyone have feedback on this pattern?  Will it indeed capture all HTML tags?  Any weaknesses?

As an alternative I believe I could use the HTML Agility Pack. I know that the Orchard Project uses it internally.
Does anyone want to comment on the appropriateness of using the Agility Pack for my purposes?

Thanks.
 
Sean Healy, Apr 18, 2011
I suck at regex, so can't be more helpful than that ;)
Apr 19, 2011
Take a look at a post made by Phil Haack a while back, it might be useful... http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx
Apr 19, 2011
Stay Inspired!
Join other developers and designers who have already signed up for our mailing list.
Terms     Privacy     Cookies       Do Not Sell       Licensing      
© Data & Object Factory, LLC.
Made with    in Austin, Texas.      Vsn 43.0.0