Hi David, You may want to use other Perl modules and not just regular expressions for this task. For example, you can do something like (untested): use strict; use HTML::TreeBuilder; #The variable $content contains the whole html page my $tree = HTML::TreeBuilder->new_from_content( $content ); #Say you want to search for the class "foo" my @found = $tree->look_down( class => 'foo' ); #If @found doesn't contain any elements, then this page doesn't contain any element with the class "foo". #Or you can search if there are html elements of type "foo" that have the class "bar": my @found = $tree->look_down( _tag => 'foo', class => 'bar' ); #Or you can search if there are elements "foo" with the ID "bar": my @found = $tree->look_down( _tag => 'foo', id => 'bar' ); Or you may use Web::Scraper for do the search using CSS selectors like jQuery does. or Mojo::DOM for the same thing, and if I remember well it supports better more complex CSS selectors. And there are other modules that allow you to make the selection using XPath if you find it easier. You can read the documentation of these modules at: http://search.cpan.org/~jfearn/HTML-Tree-4.2/lib/HTML/TreeBuilder.pm http://search.cpan.org/~mirod/HTML-TreeBuilder-XPath-0.14/lib/HTML/TreeBuild... http://search.cpan.org/~awncorp/Scrappy-0.94112090/lib/Scrappy/Scraper/Parse... http://search.cpan.org/~sri/Mojolicious-1.99/lib/Mojo/DOM.pm HTH. Octavian ----- Original Message ----- From: "David Mehler" <dave.mehler@gmail.com> To: "blind-sysadmins" <blind-sysadmins@lists.hodgsonfamily.org> Sent: Friday, October 07, 2011 8:45 PM Subject: [Blind-sysadmins] text processing on web pages
Hello Everyone,
I've got what I suspect to some of you will seem a trivial problem, but I'm not good with Perl, regular expressions, or pattern matching which is what I'm suspecting this one is going to take, though time wise running it should be quick enough.
Here's the situation. I need to look at various files for the web, in this case css style sheet files, in them they have styles not used on any page so I want to remove the style in question. I've got several style sheet files, several sites to do this to, and several pages though a majority of them are quite similar.
For example, some sheets have the <blockquote> tag in it, and a style set up for that. What I want to do is take each individual style I'll use blockquote and scan the pages of the site, if that isn't found on any page remove it from the sheet. The hard part comes in when dealing with contextual selectors and classes, but the same thing, scan each page for the contextual selector, ID, or class in question from the sheet, if found even on one page out of the whole site leave it alone, if no pages have that particular item remove it. I need this done for all pages, all sheets, and all sites.
I can do this, if someone can get me started, and would be willing to help out with questions as I'm sure there will be some, I've tried reading about Perl regular expressions and my head hurts.
Any assistance appreciated.
Thanks. Dave.
_______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins