Views: 8.6K
Replies: 0
Archived
|
Parsing of HTML: Use Regex or HTML Agility Pack?I need to parse some HTML in my project. It is fairly simple and controlled HTML, that is, we don't parse just any malformed HTML out there in the wild.
I was thinking of using Regex for this purpose, but I am not (yet) an expert in building Regex patterns. However, I found the following pattern that will match all HTML tags: <(?:"[^"]*"['"]*|'[^']*'['"]*|[^'">])+> As an alternative I believe I could use the HTML Agility Pack. I know that the Orchard Project uses it internally. Does anyone want to comment on the appropriateness of using the Agility Pack for my purposes? Thanks. Sean Healy, Apr 18, 2011
I suck at regex, so can't be more helpful than that ;)
Apr 19, 2011
Take a look at a post made by Phil Haack a while back, it might be useful... http://haacked.com/archive/2004/10/25/usingregularexpressionstomatchhtml.aspx
Apr 19, 2011
|