Connecting Tech Pros Worldwide Help | Site Map
 
 
LinkBack Thread Tools Search this Thread
  #1  
Old July 18th, 2005, 01:21 AM
Miki Tebeka
Guest
 
Posts: n/a
Default PDF Parser?

Hello All,

I'm looking for a PDF parser.
Any pointers?

10x.
Miki
  #2  
Old July 18th, 2005, 01:22 AM
John Hunter
Guest
 
Posts: n/a
Default Re: PDF Parser?

>>>>> "Miki" == Miki Tebeka <tebeka@cs.bgu.ac.il> writes:

Miki> Hello All, I'm looking for a PDF parser. Any pointers?

A little more info would be helpful: do you need access to all the pdf
structures or just the text? AFAIK, there is no full pdf parser in
python. The subject has come up several times before, so check the
google.groups archives

http://groups.google.com/groups?q=pd...=Google+Search

Things people have suggested before:

1) use pdftotext and parse the text
2) wrap xpdf's parser.

For example, if you have pdftotext, the following will give you a
python file-like handle to the source:

def pdf2txt(fname):
return os.popen('pdftotext -raw -ascii7 %s -' % fname)

If you just want to search and index pdf, see
http://pdfsearch.sourceforge.net.

John Hunter

  #3  
Old July 18th, 2005, 01:34 AM
Adam Twardoch
Guest
 
Posts: n/a
Default Re: PDF Parser?

"John Hunter" <jdhunter@ace.bsd.uchicago.edu>
[color=blue]
> A little more info would be helpful: do you need access to all the pdf
> structures or just the text? AFAIK, there is no full pdf parser in
> python.[/color]

If you need to access the graphical elements, you may use pstoedit to
convert the PDF into SVG (Structured Vector Graphics). Since SVG is XML, you
can then use any Python-based XML toolkit to parse the data.
http://www.pstoedit.net/pstoedit

Adam


 

Bookmarks

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On

Popular Articles

What is Bytes?

We are a network of experts and professionals in IT and software development that help one another with answers to tough questions and share insights. Get the best answers to your questions from over 205,248 network members.