473,324 Members | 2,002 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,324 software developers and data experts.

How to use HTML::Parser to remove HTML tags and print result

I am trying to use HTML::Parser to parse an HTML file, remove all HTML tags
(including comments, etc.), replace all ENTITIES (e.g. &amp), and put the
result into a variable as a string. I figure HTML::Parser itself can
somehow preform the filtering, but how do I get it back as a string? I'd
appreciate some sample code if anyone has any. Sorry if this is a real n00b
question.

Thanks a lot,
Mitchua

Jul 19 '05 #1
1 11892

"Mitchua" <mi*****@yahoo.com> wrote in message
news:pv********************@news01.bloor.is.net.ca ble.rogers.com...
I am trying to use HTML::Parser to parse an HTML file, remove all HTML tags (including comments, etc.), replace all ENTITIES (e.g. &amp), and put the
result into a variable as a string. I figure HTML::Parser itself can
somehow preform the filtering, but how do I get it back as a string? I'd
appreciate some sample code if anyone has any. Sorry if this is a real n00b question.

Thanks a lot,
Mitchua


Try this for a sample of parsing a webpage
http://www.wdvl.com/Authoring/Langua...ummarizer.html
If you are just trying to remove all the html tags, you could just do this
$webpage =~ s/<.*?>//g;

Ice Demon
http://adult-xxx-newsgroups.com
http://adult-cybergames.com
http://adult-spider.com
Jul 19 '05 #2

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Himanshu Garg | last post by:
Hello, I am using HTML::Parser to extract text from html pages from http://bbc.co.uk/urdu/ However the encoding of the input text seems to change to some unknown encoding in the output. The...
2
by: Divya Rao | last post by:
Hi, I need to parse a HTML file, and extract all the text in it (not the images, tags). I cannot figure out how to do it. I have the HTML file saved in my local directory. I need to have the text...
3
by: Mark | last post by:
Hi, I am using a program that is ultra paranoid about start and end html tags. For example <p>This is a test <br>A new line The above code causes the program to fail
6
by: wilk | last post by:
Is anybody know here any class in .NET that would help me to parse html in C# ? Or maybe you can even tell me how to do it? -- -- -------------------------------------- Pozdrawiam WILK...
14
by: WUV999U | last post by:
Hi I am fairly familiar in C but not much. I want to know how I can write a html parser in C that only parses for the image file in the html file and display or print all the images found in...
7
by: majid | last post by:
I want write a program with c# to pars a html file how ccan i do this with system.mshtml? or there is other way to do it p;ease help me?
5
by: Just Another Victim of the Ambient Morality | last post by:
I'm trying to parse HTML in a very generic way. So far, I'm using SGMLParser in the sgmllib module. The problem is that it forces you to parse very specific tags through object methods like...
0
by: Rama Jayapal | last post by:
can anyone solve my problem i have developed a webapplication where i have parsed the contents of the webpage using MILHTML parser from codeproject.com i have the document now in html...
2
by: Chris | last post by:
Can anyone recommend a good HTML/XHTML parser, similar to HTMLParser.HTMLParser or htmllib.HTMLParser, but able to intelligently know that certain tags, like <br>, are implicitly closed? I need to...
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
1
isladogs
by: isladogs | last post by:
The next Access Europe meeting will be on Wednesday 6 Mar 2024 starting at 18:00 UK time (6PM UTC) and finishing at about 19:15 (7.15PM). In this month's session, we are pleased to welcome back...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
1
by: Shællîpôpï 09 | last post by:
If u are using a keypad phone, how do u turn on JavaScript, to access features like WhatsApp, Facebook, Instagram....
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.