473,320 Members | 1,766 Online
Bytes | Software Development & Data Engineering Community
Post Job

Home Posts Topics Members FAQ

Join Bytes to post your question to a community of 473,320 software developers and data experts.

XML Validation with Python

Can you give a commandline example how to do XML Validation (checking
against a DTD) with Python? Not with 4Suite or other 3rd party
libraries, just the Python standard distribution. I have Python 2.2
but can upgrade to 2.3 beta if needed.

I am looking for something like:

"
$ python validate.py myxmlfile.xml mydtd.dtd
"

where validate.py contains something like:

"
import somexmllib
import sys

# prints 1 if Okay :-)
print somexmllib.validate(sys.argv[1], sys.argv[2])
"

I am sorry if this is a FAQ or if it is in one of the xml libraries, I
just could not figure it out!
Jul 18 '05 #1
7 16230
Will Stuyvesant wrote:
Can you give a commandline example how to do XML Validation (checking
against a DTD) with Python? Not with 4Suite or other 3rd party
libraries, just the Python standard distribution.


You can't do it. The base distribution doesn't include a validating
XML parser.

The only pure python validating parser is Lars Garshol's "xmlproc",
which is a part of pyxml (a "third-party" optional extension). You can
read the documentation for xmlproc here

http://www.garshol.priv.no/download/software/xmlproc/

and the bit about validating on the command line is here

http://www.garshol.priv.no/download/...c/cmdline.html

Is there any reason why it has to be in the base distribution?

Assuming that you have a good reason, maybe you can tell us what
platform you're running on? There might be a platform specific
parser/validator that you can call from python.

HTH,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #2
Will Stuyvesant wrote:
Can you give a commandline example how to do XML Validation (checking
against a DTD) with Python? Not with 4Suite or other 3rd party
libraries, just the Python standard distribution.


You can't do it. The base distribution doesn't include a validating
XML parser.

The only pure python validating parser is Lars Garshol's "xmlproc",
which is a part of pyxml (a "third-party" optional extension). You can
read the documentation for xmlproc here

http://www.garshol.priv.no/download/software/xmlproc/

and the bit about validating on the command line is here

http://www.garshol.priv.no/download/...c/cmdline.html

Is there any reason why it has to be in the base distribution?

Assuming that you have a good reason, maybe you can tell us what
platform you're running on? There might be a platform specific
parser/validator that you can call from python.

HTH,

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #3
I could not find a solution using the Python Standard
Libraries to write a simple commandline utility to do
XML validation. And I found the xml.sax documentation
unclear, there are no good examples to look at. Also
in the Python Cookbook and in the Python in a Nutshell
book the XML examples are BAD. There is nowhere a
motivation for the class library design, for example
"why do you need a handler in a xml.sax.parse() and why
is there no default handler", nor simple examples how
to use it. I like the approach taken by the Python
Standard Library book by Fredrik Lundh MUCH more: clear
examples and explanations. A damn shame they do not
want a new edition at O'Reilly, the poor guy is now
putting a free version on his website.

I have found a solution for XML validation using the
3rd party pyRXP library from http://www.reportlab.com/xml/pyrxp.html
Their "download and install" info is a mess, I
downloaded first a .ZIP with
only .DLL and .PYD files and it turned out you had to
plunk that into C:\Python22\DLL. This made me turn
away from pyRXP initially because bad installation
usually means bad software. But later on I found a
bigger .ZIP with more stuff so maybe I should've used
that one? At least it works now. I can do "import
pyRXP". Make sure you also download
pyRXP_Documentation.pdf. This is good documentation
with examples. I notice the docs in the other big .ZIP
are in .RML format...whatever that is!

I can not believe the amount of bad documentation and
bad install approaches I see with 3rd party software.
That is why I normally stick to Python Standard Library
only.

Anyway, I can now do XML validation, below is
"validate.py". But I am not solving my initial
problem: if it validates, then validate.py prints
nothing, if there is a mistake then it prints an error
message. What I really wanted; giving more confidence
that the validation is okay; is to print 1 or 0
depending on the result, but I have not figured out yet
how to do that and now I am too tired of it all...

# file: validate.py
import sys
if len(sys.argv)<2 or sys.argv[1] in ['-h','--help','/?']:
print 'Usage: validate.py xmlfilename'
sys.exit()
import pyRXP
p = pyRXP.Parser()
fn=open(sys.argv[1], 'r').read()
p.parse(fn)
Jul 18 '05 #4
> [Alan Kennedy <al****@hotmail.com>]
The only pure python validating parser is Lars Garshol's "xmlproc",
which is a part of pyxml (a "third-party" optional extension). You can
read the documentation for xmlproc here

http://www.garshol.priv.no/download/software/xmlproc/

and the bit about validating on the command line is here

http://www.garshol.priv.no/download/...c/cmdline.html

Is there any reason why it has to be in the base distribution?


Because I want to use it from a cgi script written in Python. And I
am not allowed to install 3rd party stuff on the webserver. Even if I
was it would not be a solution since it has to be easy to put it on
another webserver. But of course: if there is a validating parser
written completely in Python then I can use it too! If it runs under
Python 2.1.1, that is (that is what they have at the website). I will
investigate this www.garshol.priv.no link you gave me, thank you.
Jul 18 '05 #5
Will Stuyvesant wrote:
Because I want to use it from a cgi script written in Python. And I
am not allowed to install 3rd party stuff on the webserver. Even if I
was it would not be a solution since it has to be easy to put it on
another webserver. But of course: if there is a validating parser
written completely in Python then I can use it too! If it runs under
Python 2.1.1, that is (that is what they have at the website). I will
investigate this www.garshol.priv.no link you gave me, thank you.


Glad to be of help.

There is a comment on Lars site, which is vaguely worrying, which
says:

"Note that it is recommended to use xmlproc through the SAX API rather
than directly, since this provides much greater freedom in the choice
of
parsers. (For example, you can switch to using Pyexpat which is
written
in C without changing your code.)"

Which seems to indicate to me that the author is encouraging the user
not to rely on xmlproc too much. Perhaps performance might be an
issue?

One more thing: There are alternative validation methods, which may or
not be suitable, based on your requirements.

For example, there is a python implementation of James Clark's Tree
Regular EXpressions (TREX), written in pure python, and which uses the
inbuilt C parser, written by James Tauber and called pytrex. I
personally find trex and pytrex a very natural, and thus easy to
learn, way to check structures in a tree, including data validation.
Pytrex is not complete, and is no longer maintained, but what's there
is good code, and with nice little features, such as the ability to
define your own datatype validation functions, which are called at
match time.

http://pytrex.sourceforge.net/

Pytrex is unlikely to be ever completed, because James Clark has
abandoned TREX in favour of RELAX-NG, for which I haven't seen any
python implementation.

http://www.relaxng.org/

There is a python implementation of XML-Schema, xsv, written by Henry
Thompson, which I think was kept fairly up-to-date with the XML-Schema
spec as it evolved. However, given the complexity of XML-Schema, and
having never tried to use xsv, I have no idea of its stability.

http://www.ltg.ed.ac.uk/~ht/xsv-status.html

I note that the author also maintains a web service for validating
documents.

Are you sure that XML validation-parsing is the right solution for
your problem? There may be simpler ways.

--
alan kennedy
-----------------------------------------------------
check http headers here: http://xhaus.com/headers
email alan: http://xhaus.com/mailto/alan
Jul 18 '05 #6
> [Alan Kennedy]
... interesting links and comments ...
Are you sure that XML validation-parsing is the right solution for
your problem? There may be simpler ways.


We have defined a new XML vocabulary with a DTD. I offered to make a
webservice so everybody can validate their XML files based on this
DTD. For this I use CGI with Python 2.1.1 and I have no web master
privileges.

The idea of web applications is nice in that you do not have to code
GUIs anymore: you can do pretty much everything with (X)HTML.
Sometimes you have to rethink your UI so it is possible to give every
user state a URI. A big plus is that everybody can now use your
application. And you can do more than I thought before, for example
users can send files from their computer with type=FILE fields in
forms. And for development you can just download Apache and install
it on your laptop and configure it such that everything is exactly the
same as on the target website (#!/usr/bin/python...means install their
python version in C:\usr\bin on you laptop :-)

The big problem with web applications is all the permissions you need
to install, compile, configure, etc. For Python CGI this means you
are stuck with some Python version and you realize how important the
Python Standard Library is.

--
Experience is what allows you to recognize a mistake the second time
you make it.
Jul 18 '05 #7
hw***@hotmail.com (Will Stuyvesant) wrote in message news:<cb**************************@posting.google. com>...
Anyway, I can now do XML validation, below is
"validate.py". But I am not solving my initial
problem: if it validates, then validate.py prints
nothing, if there is a mistake then it prints an error
message. What I really wanted; giving more confidence
that the validation is okay; is to print 1 or 0
depending on the result, but I have not figured out yet
how to do that and now I am too tired of it all...


This might do the trick:

# file: validate.py
import sys, pyRXP

if len(sys.argv)<2 or sys.argv[1] in ['-h','--help','/?']:
print 'Usage: validate.py xmlfilename'
sys.exit()

fn = open(sys.argv[1], 'r').read()
try :
pyRXP.Parser().parse(fn)
print True
except pyRXP.error :
print False
Though personally, rather than printing False, I would simply raise in
the except clause, as the traceback provides the user with more
information as to what is wrong with their xml.
Jul 18 '05 #8

This thread has been closed and replies have been disabled. Please start a new discussion.

Similar topics

0
by: Will Stuyvesant | last post by:
Can you give a commandline example how to do XML Validation (checking against a DTD) with Python? Not with 4Suite or other 3rd party libraries, just the Python standard distribution. I have...
0
by: Herman Geldenhuys | last post by:
Oops! Sorry guys, wrong list... Apologies. ----- Original Message ----- From: Herman Geldenhuys To: python-list@python.org Sent: Wednesday, January 28, 2004 4:54 PM Subject: Security...
11
by: Paul Rubin | last post by:
I frequently find myself writing stuff like # compute frob function, x has to be nonnegative x = read_input_data() assert x >= 0, x # mis-use of "assert" statement frob = sqrt(x)...
4
by: Edward Diener | last post by:
Try as I might I can not find a routine in os.path which validates whether or not a path is syntactically valid, either as a directory or as a file. This is surprising since, although I know this...
8
by: David S. | last post by:
I am looking for a way to implement the same simple validation on many instance attributes and I thought descriptors (http://users.rcn.com/python/download/Descriptor.htm) looked like the right...
2
by: mmm | last post by:
I found Python code to validate a XML document basd on DTD file layout. The code uses the 'xmlproc' package and these module loading steps from xml.parsers.xmlproc import xmlproc from...
11
by: Nikolaus Rath | last post by:
Hello, I need to synchronize the access to a couple of hundred-thousand files. It seems to me that creating one lock object for each of the files is a waste of resources, but I cannot use a...
0
by: Edwin.Madari | last post by:
can you edit the xml and add the dtd/scheama ? ..Edwin -----Original Message----- From: python-list-bounces+edwin.madari=verizonwireless.com@python.org On Behalf Of Ben Finney Sent:...
8
by: Bryan | last post by:
I want my business objects to be able to do this: class Person(base): def __init__(self): self.name = None @base.validator def validate_name(self): if not self.name: return
0
by: DolphinDB | last post by:
Tired of spending countless mintues downsampling your data? Look no further! In this article, you’ll learn how to efficiently downsample 6.48 billion high-frequency records to 61 million...
0
by: ryjfgjl | last post by:
ExcelToDatabase: batch import excel into database automatically...
0
by: Vimpel783 | last post by:
Hello! Guys, I found this code on the Internet, but I need to modify it a little. It works well, the problem is this: Data is sent from only one cell, in this case B5, but it is necessary that data...
0
by: jfyes | last post by:
As a hardware engineer, after seeing that CEIWEI recently released a new tool for Modbus RTU Over TCP/UDP filtering and monitoring, I actively went to its official website to take a look. It turned...
0
by: ArrayDB | last post by:
The error message I've encountered is; ERROR:root:Error generating model response: exception: access violation writing 0x0000000000005140, which seems to be indicative of an access violation...
1
by: PapaRatzi | last post by:
Hello, I am teaching myself MS Access forms design and Visual Basic. I've created a table to capture a list of Top 30 singles and forms to capture new entries. The final step is a form (unbound)...
1
by: CloudSolutions | last post by:
Introduction: For many beginners and individual users, requiring a credit card and email registration may pose a barrier when starting to use cloud servers. However, some cloud server providers now...
0
by: af34tf | last post by:
Hi Guys, I have a domain whose name is BytesLimited.com, and I want to sell it. Does anyone know about platforms that allow me to list my domain in auction for free. Thank you
0
isladogs
by: isladogs | last post by:
The next Access Europe User Group meeting will be on Wednesday 3 Apr 2024 starting at 18:00 UK time (6PM UTC+1) and finishing by 19:30 (7.30PM). In this session, we are pleased to welcome former...

By using Bytes.com and it's services, you agree to our Privacy Policy and Terms of Use.

To disable or enable advertisements and analytics tracking please visit the manage ads & tracking page.