Login or Sign up Help | Site Map
Connecting Tech Pros Worldwide

urgent need help in parsing html tables

Question posted by: poisonedapple (Newbie) on July 1st, 2008 09:13 PM
I am trying to parse a simple table with two headings and get the rows but I am having a big problem trying to find out how to pass the link to the html or path to the html.

Html is apparently in my desktop itself I have a path but I have no clue how to use that in HTML::TableExtract.

Code: ( text )
  1. use HTML::TableExtract;
  2.  $te = HTML::TableExtract->new( headers => [qw(Date Price Cost)] );
  3.  $te->parse($html_string);
  4.  
  5.  # Examine all matching tables
  6.  foreach $ts ($te->tables) {
  7.    print "Table (", join(',', $ts->coords), "):\n";
  8.  
  9.    foreach $row ($ts->rows) {
  10.       print join(',', @$row), "\n";
  11.    }
  12.  }


Lets say I put those headings supposed heading1 and heading2 in place of Data Price
Where should put the link to the html
which is something like /home/jack/desktop/sample.html
I tried doing $html_string="/home/jack/desktop/sample.html" but it does not work at all

what am I supposed to do I appreciate if you can help me out of this .

thanks a lot
Last edited by eWish : July 1st, 2008 at 11:39 PM. Reason: Please use code tags
Would you like to answer this question?
Sign up for a free account, or Login (if you're already a member).
KevinADC's Avatar
KevinADC
Expert
3,065 Posts
July 2nd, 2008
01:00 AM
#2

Re: urgent need help in parsing html tables
If you use the better HTML::TableParser module it can open the file for you. See the parse_file method:

http://search.cpan.org/~djerius/HTM.../TableParser.pm

basically:

Code: ( text )
  1. $p->parse_file('c:/windows/desktop/foo.html');


where $p is the parser object and the file path is the correct one for your computer and file. Note: you can use forward slashes in windows file/directory paths.

Reply
poisonedapple's Avatar
poisonedapple
Newbie
8 Posts
July 2nd, 2008
04:12 PM
#3

Re: urgent need help in parsing html tables
Thanks for the post but that looks more complicated then the previous one.
I just need to parse the a table in html which is in my desktop itself.
I do not want to use any kind of table id or sizes just the heading name.

What would be the best way to use HTML::TableExtract,
-I need to put the file path for html somewhere
(the problem I am facing here is everywhere throughout the examples in cspan html_string is already there without initialization its an incomplete program)

-I need to put the headers

Results: I need the table data thats all I am sorry but I do not want to get to see what id is my table and all that.


Please help me I think this is seems like a simple problem. I could not debug this problem because whenever I run I dont get errors and I dont get anything printed I am pretty much very irritatted and more hopeless everyday.I think I made a big mistake to tr using perl for this project the whole thing is so disorganized cant find a single example to just to that.

Please I would reall appreciate if someone can help me .

Prior thanks to all of those and thanks for the reply

Reply
KevinADC's Avatar
KevinADC
Expert
3,065 Posts
July 2nd, 2008
04:19 PM
#4

Re: urgent need help in parsing html tables
here you go:

Code: ( text )
  1. open (HTML, 'c:/path/to/foo.html') or die "$!";
  2. my $html = do {local $/; <HTML>};#puts the entire file in a scalar variable
  3. close HTML;


Now you can parse $html.

Reply
poisonedapple's Avatar
poisonedapple
Newbie
8 Posts
July 2nd, 2008
04:53 PM
#5

Re: urgent need help in parsing html tables
This is the program I wrote:
#!/usr/bin/perl
use HTML::TableExtract;
open (HTML, '/root/Desktop/test.html') or die "$!";
my $html = do {local $/; <HTML>};#puts the entire file in a scalar variable
$te = HTML::TableExtract->new( headers => [qw(Heading Heading_2)] );
$te->parse($HTML);
# Examine all matching tables
foreach $ts ($te->tables) {
print "Table (", join(',', $ts->coords), "):\n";
foreach $row ($ts->rows) {
print join(',', @$row), "\n";
}
}

But when I do perl program.pl it does not do anything, it gives me a prompt.
Thanks for the reply I would appreciate if you solve this problem.

I am literally not getthing anything and after I do perl program.pl I get another prompt.
Thanks , please help

Reply
poisonedapple's Avatar
poisonedapple
Newbie
8 Posts
July 2nd, 2008
04:57 PM
#6

Re: urgent need help in parsing html tables
Ok I think I got it there was a minor problem . Thanks a lot for help I appreciate

Reply
poisonedapple's Avatar
poisonedapple
Newbie
8 Posts
July 2nd, 2008
07:19 PM
#7

Re: urgent need help in parsing html tables
Hi ,
I got the table extracted and I have a huge document full of tables. From this(HTML::TableExtract) module I am trying to search for keywords(from the user input) on the parsed tables I have to print only the necessary data.
I tried going CPAN but could not really find how to search through it for particular keywords.

One way to do it would be(a rather wrong way for me since I need corresponding columns or some other relevant data from the table if I find that in that particular table):
Output the result of the parsed tables into some .text and parse it from there
but parsing from there would hinder my aim to actually get the keywords corresponding columns

Aim and problem here:: is I cant find anyway to search through the resulting parsed table and get necessary data.


thanks for the reply I appreciate

Reply
Reply
Not the answer you were looking for? Post your question . . .
183,906 Experts ready to help you find a solution.
Sign up for a free account, or Login (if you're already a member).

Latest Articles: Read & Comment
Top Perl Forum Contributors