text processing on web pages
Hello Everyone, I've got what I suspect to some of you will seem a trivial problem, but I'm not good with Perl, regular expressions, or pattern matching which is what I'm suspecting this one is going to take, though time wise running it should be quick enough. Here's the situation. I need to look at various files for the web, in this case css style sheet files, in them they have styles not used on any page so I want to remove the style in question. I've got several style sheet files, several sites to do this to, and several pages though a majority of them are quite similar. For example, some sheets have the <blockquote> tag in it, and a style set up for that. What I want to do is take each individual style I'll use blockquote and scan the pages of the site, if that isn't found on any page remove it from the sheet. The hard part comes in when dealing with contextual selectors and classes, but the same thing, scan each page for the contextual selector, ID, or class in question from the sheet, if found even on one page out of the whole site leave it alone, if no pages have that particular item remove it. I need this done for all pages, all sheets, and all sites. I can do this, if someone can get me started, and would be willing to help out with questions as I'm sure there will be some, I've tried reading about Perl regular expressions and my head hurts. Any assistance appreciated. Thanks. Dave.
Prezados, bem, vou escrever em portugues, pois não sei inglês. Sou do brasil.queria saber, como um deficiente visual, pode programar usando visual studio 2010, com o jaws em asp.net, .net e c sharp? É q vou começar um curso da microsoft daqui a uns dias e precisava saber se existem scripts do jaws para tal ferramenta de desenvolvimento. At, juliano -----Mensagem original----- De: blind-sysadmins-bounces@lists.hodgsonfamily.org [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] Em nome de David Mehler Enviada em: sexta-feira, 7 de outubro de 2011 14:46 Para: blind-sysadmins Assunto: [Blind-sysadmins] text processing on web pages Hello Everyone, I've got what I suspect to some of you will seem a trivial problem, but I'm not good with Perl, regular expressions, or pattern matching which is what I'm suspecting this one is going to take, though time wise running it should be quick enough. Here's the situation. I need to look at various files for the web, in this case css style sheet files, in them they have styles not used on any page so I want to remove the style in question. I've got several style sheet files, several sites to do this to, and several pages though a majority of them are quite similar. For example, some sheets have the <blockquote> tag in it, and a style set up for that. What I want to do is take each individual style I'll use blockquote and scan the pages of the site, if that isn't found on any page remove it from the sheet. The hard part comes in when dealing with contextual selectors and classes, but the same thing, scan each page for the contextual selector, ID, or class in question from the sheet, if found even on one page out of the whole site leave it alone, if no pages have that particular item remove it. I need this done for all pages, all sheets, and all sites. I can do this, if someone can get me started, and would be willing to help out with questions as I'm sure there will be some, I've tried reading about Perl regular expressions and my head hurts. Any assistance appreciated. Thanks. Dave. _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins
I don't quite understand this response. My Spanish is not so good. Greg B. -----Original Message----- From: blind-sysadmins-bounces@lists.hodgsonfamily.org [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] On Behalf Of Juliano Cesar Ribeiro Sent: Friday, October 07, 2011 1:49 PM To: Blind sysadmins list Subject: [Blind-sysadmins] RES: text processing on web pages Prezados, bem, vou escrever em portugues, pois não sei inglês. Sou do brasil.queria saber, como um deficiente visual, pode programar usando visual studio 2010, com o jaws em asp.net, .net e c sharp? É q vou começar um curso da microsoft daqui a uns dias e precisava saber se existem scripts do jaws para tal ferramenta de desenvolvimento. At, juliano -----Mensagem original----- De: blind-sysadmins-bounces@lists.hodgsonfamily.org [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] Em nome de David Mehler Enviada em: sexta-feira, 7 de outubro de 2011 14:46 Para: blind-sysadmins Assunto: [Blind-sysadmins] text processing on web pages Hello Everyone, I've got what I suspect to some of you will seem a trivial problem, but I'm not good with Perl, regular expressions, or pattern matching which is what I'm suspecting this one is going to take, though time wise running it should be quick enough. Here's the situation. I need to look at various files for the web, in this case css style sheet files, in them they have styles not used on any page so I want to remove the style in question. I've got several style sheet files, several sites to do this to, and several pages though a majority of them are quite similar. For example, some sheets have the <blockquote> tag in it, and a style set up for that. What I want to do is take each individual style I'll use blockquote and scan the pages of the site, if that isn't found on any page remove it from the sheet. The hard part comes in when dealing with contextual selectors and classes, but the same thing, scan each page for the contextual selector, ID, or class in question from the sheet, if found even on one page out of the whole site leave it alone, if no pages have that particular item remove it. I need this done for all pages, all sheets, and all sites. I can do this, if someone can get me started, and would be willing to help out with questions as I'm sure there will be some, I've tried reading about Perl regular expressions and my head hurts. Any assistance appreciated. Thanks. Dave. _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins
Buenas tardes. Me llamo juliano, soy del brasil. A mi me gusta saber la información: Se el lector de pantala jaws, tiene recursos para la utilización del visual studio 2010, o no tiene hasta el momento? Y la outra información que deseo obtener, es la siguiente: Se el jaws trabaja bien com el oracle? Gracias por la información. Pregunto estas cosas, porque iré hacer um curso de los estudiantes de la microsoft, y, despues, haré um curso de oracle aqui em mi ciudad. Y a mi me gusta saber las informaciónes para que pueda trabajar como los vidientes. Gracias, y estoy em el aguardo. -----Mensagem original----- De: blind-sysadmins-bounces@lists.hodgsonfamily.org [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] Em nome de Greg B. Enviada em: sexta-feira, 7 de outubro de 2011 15:11 Para: 'Blind sysadmins list' Assunto: Re: [Blind-sysadmins] RES: text processing on web pages I don't quite understand this response. My Spanish is not so good. Greg B. -----Original Message----- From: blind-sysadmins-bounces@lists.hodgsonfamily.org [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] On Behalf Of Juliano Cesar Ribeiro Sent: Friday, October 07, 2011 1:49 PM To: Blind sysadmins list Subject: [Blind-sysadmins] RES: text processing on web pages Prezados, bem, vou escrever em portugues, pois não sei inglês. Sou do brasil.queria saber, como um deficiente visual, pode programar usando visual studio 2010, com o jaws em asp.net, .net e c sharp? É q vou começar um curso da microsoft daqui a uns dias e precisava saber se existem scripts do jaws para tal ferramenta de desenvolvimento. At, juliano -----Mensagem original----- De: blind-sysadmins-bounces@lists.hodgsonfamily.org [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] Em nome de David Mehler Enviada em: sexta-feira, 7 de outubro de 2011 14:46 Para: blind-sysadmins Assunto: [Blind-sysadmins] text processing on web pages Hello Everyone, I've got what I suspect to some of you will seem a trivial problem, but I'm not good with Perl, regular expressions, or pattern matching which is what I'm suspecting this one is going to take, though time wise running it should be quick enough. Here's the situation. I need to look at various files for the web, in this case css style sheet files, in them they have styles not used on any page so I want to remove the style in question. I've got several style sheet files, several sites to do this to, and several pages though a majority of them are quite similar. For example, some sheets have the <blockquote> tag in it, and a style set up for that. What I want to do is take each individual style I'll use blockquote and scan the pages of the site, if that isn't found on any page remove it from the sheet. The hard part comes in when dealing with contextual selectors and classes, but the same thing, scan each page for the contextual selector, ID, or class in question from the sheet, if found even on one page out of the whole site leave it alone, if no pages have that particular item remove it. I need this done for all pages, all sheets, and all sites. I can do this, if someone can get me started, and would be willing to help out with questions as I'm sure there will be some, I've tried reading about Perl regular expressions and my head hurts. Any assistance appreciated. Thanks. Dave. _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins
Here is what he says according to google translate. It's not quite clear, but maybe it gives some clerification. Good afternoon. My name is Julian, I'm from Brazil. I like to know the information: Dock is the reader jaws, has resources for using visual studio 2010, or has so far? Outra And the information you want is as follows: Be the jaws works well com the oracle? Thanks for the information. I ask these things because I'll make um course microsoft students, and then I'll um em here oracle course of my city. And I like to know the information so you can work as vidientes. Thanks, and I'm sitting game em. Negoslav ----- Original Message ----- From: "Greg B." <gbobo@woh.rr.com> To: "'Blind sysadmins list'" <blind-sysadmins@lists.hodgsonfamily.org> Sent: Friday, October 07, 2011 9:11 PM Subject: Re: [Blind-sysadmins] RES: text processing on web pages I don't quite understand this response. My Spanish is not so good. Greg B. -----Original Message----- From: blind-sysadmins-bounces@lists.hodgsonfamily.org [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] On Behalf Of Juliano Cesar Ribeiro Sent: Friday, October 07, 2011 1:49 PM To: Blind sysadmins list Subject: [Blind-sysadmins] RES: text processing on web pages Prezados, bem, vou escrever em portugues, pois não sei inglês. Sou do brasil.queria saber, como um deficiente visual, pode programar usando visual studio 2010, com o jaws em asp.net, .net e c sharp? É q vou começar um curso da microsoft daqui a uns dias e precisava saber se existem scripts do jaws para tal ferramenta de desenvolvimento. At, juliano -----Mensagem original----- De: blind-sysadmins-bounces@lists.hodgsonfamily.org [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] Em nome de David Mehler Enviada em: sexta-feira, 7 de outubro de 2011 14:46 Para: blind-sysadmins Assunto: [Blind-sysadmins] text processing on web pages Hello Everyone, I've got what I suspect to some of you will seem a trivial problem, but I'm not good with Perl, regular expressions, or pattern matching which is what I'm suspecting this one is going to take, though time wise running it should be quick enough. Here's the situation. I need to look at various files for the web, in this case css style sheet files, in them they have styles not used on any page so I want to remove the style in question. I've got several style sheet files, several sites to do this to, and several pages though a majority of them are quite similar. For example, some sheets have the <blockquote> tag in it, and a style set up for that. What I want to do is take each individual style I'll use blockquote and scan the pages of the site, if that isn't found on any page remove it from the sheet. The hard part comes in when dealing with contextual selectors and classes, but the same thing, scan each page for the contextual selector, ID, or class in question from the sheet, if found even on one page out of the whole site leave it alone, if no pages have that particular item remove it. I need this done for all pages, all sheets, and all sites. I can do this, if someone can get me started, and would be willing to help out with questions as I'm sure there will be some, I've tried reading about Perl regular expressions and my head hurts. Any assistance appreciated. Thanks. Dave. _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins
Thanks for translate Have a responce. -----Mensagem original----- De: blind-sysadmins-bounces@lists.hodgsonfamily.org [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] Em nome de Negoslav Sabev Enviada em: sábado, 8 de outubro de 2011 04:39 Para: Blind sysadmins list Assunto: Re: [Blind-sysadmins] RES: text processing on web pages Here is what he says according to google translate. It's not quite clear, but maybe it gives some clerification. Good afternoon. My name is Julian, I'm from Brazil. I like to know the information: Dock is the reader jaws, has resources for using visual studio 2010, or has so far? Outra And the information you want is as follows: Be the jaws works well com the oracle? Thanks for the information. I ask these things because I'll make um course microsoft students, and then I'll um em here oracle course of my city. And I like to know the information so you can work as vidientes. Thanks, and I'm sitting game em. Negoslav ----- Original Message ----- From: "Greg B." <gbobo@woh.rr.com> To: "'Blind sysadmins list'" <blind-sysadmins@lists.hodgsonfamily.org> Sent: Friday, October 07, 2011 9:11 PM Subject: Re: [Blind-sysadmins] RES: text processing on web pages I don't quite understand this response. My Spanish is not so good. Greg B. -----Original Message----- From: blind-sysadmins-bounces@lists.hodgsonfamily.org [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] On Behalf Of Juliano Cesar Ribeiro Sent: Friday, October 07, 2011 1:49 PM To: Blind sysadmins list Subject: [Blind-sysadmins] RES: text processing on web pages Prezados, bem, vou escrever em portugues, pois não sei inglês. Sou do brasil.queria saber, como um deficiente visual, pode programar usando visual studio 2010, com o jaws em asp.net, .net e c sharp? É q vou começar um curso da microsoft daqui a uns dias e precisava saber se existem scripts do jaws para tal ferramenta de desenvolvimento. At, juliano -----Mensagem original----- De: blind-sysadmins-bounces@lists.hodgsonfamily.org [mailto:blind-sysadmins-bounces@lists.hodgsonfamily.org] Em nome de David Mehler Enviada em: sexta-feira, 7 de outubro de 2011 14:46 Para: blind-sysadmins Assunto: [Blind-sysadmins] text processing on web pages Hello Everyone, I've got what I suspect to some of you will seem a trivial problem, but I'm not good with Perl, regular expressions, or pattern matching which is what I'm suspecting this one is going to take, though time wise running it should be quick enough. Here's the situation. I need to look at various files for the web, in this case css style sheet files, in them they have styles not used on any page so I want to remove the style in question. I've got several style sheet files, several sites to do this to, and several pages though a majority of them are quite similar. For example, some sheets have the <blockquote> tag in it, and a style set up for that. What I want to do is take each individual style I'll use blockquote and scan the pages of the site, if that isn't found on any page remove it from the sheet. The hard part comes in when dealing with contextual selectors and classes, but the same thing, scan each page for the contextual selector, ID, or class in question from the sheet, if found even on one page out of the whole site leave it alone, if no pages have that particular item remove it. I need this done for all pages, all sheets, and all sites. I can do this, if someone can get me started, and would be willing to help out with questions as I'm sure there will be some, I've tried reading about Perl regular expressions and my head hurts. Any assistance appreciated. Thanks. Dave. _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins _______________________________________________ Blind-sysadmins mailing list Blind-sysadmins@lists.hodgsonfamily.org http://lists.hodgsonfamily.org/listinfo/blind-sysadmins
Hi David, You may want to use other Perl modules and not just regular expressions for this task. For example, you can do something like (untested): use strict; use HTML::TreeBuilder; #The variable $content contains the whole html page my $tree = HTML::TreeBuilder->new_from_content( $content ); #Say you want to search for the class "foo" my @found = $tree->look_down( class => 'foo' ); #If @found doesn't contain any elements, then this page doesn't contain any element with the class "foo". #Or you can search if there are html elements of type "foo" that have the class "bar": my @found = $tree->look_down( _tag => 'foo', class => 'bar' ); #Or you can search if there are elements "foo" with the ID "bar": my @found = $tree->look_down( _tag => 'foo', id => 'bar' ); Or you may use Web::Scraper for do the search using CSS selectors like jQuery does. or Mojo::DOM for the same thing, and if I remember well it supports better more complex CSS selectors. And there are other modules that allow you to make the selection using XPath if you find it easier. You can read the documentation of these modules at: http://search.cpan.org/~jfearn/HTML-Tree-4.2/lib/HTML/TreeBuilder.pm http://search.cpan.org/~mirod/HTML-TreeBuilder-XPath-0.14/lib/HTML/TreeBuild... http://search.cpan.org/~awncorp/Scrappy-0.94112090/lib/Scrappy/Scraper/Parse... http://search.cpan.org/~sri/Mojolicious-1.99/lib/Mojo/DOM.pm HTH. Octavian ----- Original Message ----- From: "David Mehler" <dave.mehler@gmail.com> To: "blind-sysadmins" <blind-sysadmins@lists.hodgsonfamily.org> Sent: Friday, October 07, 2011 8:45 PM Subject: [Blind-sysadmins] text processing on web pages
participants (5)
-
David Mehler
-
Greg B.
-
Juliano Cesar Ribeiro
-
Negoslav Sabev
-
Octavian Rasnita