Dofactory.com
Dofactory.com
 Back to list
Views:   8.6K
Replies:  0
Archived

Parsing of HTML: Use Regex or HTML Agility Pack?

I need to parse some HTML in my project.  It is fairly simple and controlled HTML, that is, we don't parse just any malformed HTML out there in the wild.

I was thinking of using Regex for this purpose, but I am not (yet) an expert in building Regex patterns.
However, I found the following pattern that will match all HTML tags:

<(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+>
Does anyone have feedback on this pattern?  Will it indeed capture all HTML tags?  Any weaknesses?

As an alternative I believe I could use the HTML Agility Pack. I know that the Orchard Project uses it internally.
Does anyone want to comment on the appropriateness of using the Agility Pack for my purposes?

Thanks.
 
Sean Healy, Apr 18, 2011
I suck at regex, so can't be more helpful than that ;)
Apr 19, 2011
Take a look at a post made by Phil Haack a while back, it might be useful... http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx
Apr 19, 2011
Stay Inspired!
Join other developers and designers who have already signed up for our mailing list.
Terms     Privacy     Cookies       Do Not Sell       Licensing      
Made with    in Austin, Texas.  - vsn 44.0.0
© Data & Object Factory, LLC.