Using the Windows 32 API from Python

XML Processing in Python

Copyright 2009 by Stephen Vermeulen
Last updated: 2009 Nov 13
Radio Communications





32 allows alternative binary bindings body both cloud community dec document download easier easily edu elements examples exist general interesting jan libraries looking minutes n non o pack performance popular problem processing pub quickly reader reading recipes right searching show similar solution standard supposed table through title updated width writing yet
Python is can be a very good language for processing XML formatted files.

  • 2009-Nov-13: Beautiful Soup, is a Python HTML/XML parser for projects like screen-scraping. It is available here too. An example of doing some markup massage to clean up problematic HTML prior to running Beautiful Soup on it. [231] [1]
  • 2009-Jul-15: bier-soup.py is an example of reading in html tables and processing them to text files with BeautifulSoup. [8265] [1]
  • 2009-May-01: pyxser a Python object to XML serializer. [7929] [1]
  • 2008-Dec-11: Ian Bicking suggests that lxml is a good alternative to BeautifulSoup for web scraping tasks. He also has some short examples including an HTML diff. [7336] [1] [2]
  • 2008-Oct-28: gxml is a module providing a common interface to some of the popular XML libraries. [7109] [1]
  • 2008-Oct-11: A patch for ElementTree to better support the CDATA section. [7006] [1] [2]
  • 2008-Jul-09: lxml, discussed here, provides Python bindings for the libxml2 and libxslt libraries. O'Reilly discusses libxslt here. How lxml and ElementTree compare, and why they both exist. The PyPi page for it is here [220] [1]
  • 2008-Jun-21: Converting XML to Dictionary and Back, the intent of this is to make working with XML structured data easier for the Python programmer. While this is not a general solution for all XML data files it may be useful for things where the structure and content is more restricted, like a configuration file. [6400] [1] [2]
  • 2008-Mar-31: Looking at the performance of various HTML parsers for Python (lxml, BeautifulSoup, html5lib, ElementTree, cElementTree, HTMLParser, htmlfill, Genshi, xml.dom.minidom). [5363] [1]
  • 2008-Mar-30: xmlpolymerase is a Python object serializer that will pack to and unpack from XML. Sort of an XML version of Pickle. [5355] [1]
  • 2008-Feb-25: openxmllib is a module for working with OpenXML documents. [5158] [1]
  • 2008-Feb-05: PyXML, XML Parsers and API for Python, the project home page is here. [5050] [1]
  • 2008-Jan-10: Why binary-XML is solving the wrong problem. [4641] [1]
  • 2007-Oct-22: XML to Python Data Structure, this allows an XML object to be easily accessed as a python object (with some minor limitations). [3500] [1] [2]
  • 2007-Aug-31: Gnosis Utilities, a collection of utilities for working with XML documents. It includes some other things like full text indexing and searching, Python object introspection, hashcash and spam filtering. [227] [1]
  • 2007-Aug-31: PTML, for embedding Python into text documents [229] [1]
  • 2007-Aug-24: simplexml another XML file manipulation library for Python [216] [1]
  • 2007-Aug-24: xmlmodel, allows you to expressively define an XML document, using native python classes, you can then access the elements of the XML through a tree of native python objects. [217] [1]
  • 2007-Aug-24: mlk_xhtml, a package for creation of XHTML. [218] [1]
  • 2007-Aug-24: ElementTree is a Python XML reader/parser/writer that's been implemented in both pure Python and C. Using non-standard encodings in cElementTree. Here is a talk on using ElementTree to process XML. A recommendation for ElementTree. [219] [1]
  • 2007-Aug-24: xmltramp, makes reading XML data easy [221] [1]
  • 2007-Aug-24: PySimpleXML, (and here) simplifies the translation betweeen Python structures and XML [222] [1]
  • 2007-Aug-24: pyxmlserial, XML serialization of basic Python data. [223] [1]
  • 2007-Aug-24: YAXL, is Yet Another (Pythonic) XML Library, one of the design goals being that it can be understood in 15 minutes [224] [1]
  • 2007-Aug-24: A look at Python-based DOM Manipulation templating systems [225] [1]
  • 2007-Aug-24: pyfo, a module for quickly generating XML representations of Python objects. [226] [1]
  • 2007-Aug-24: surely, a program to convert files written in a shorthand notation (similar to Python syntax) into XML [228] [1]
  • 2007-Aug-24: pullparser, a simple module for HTML parsing, supposed to be easier to use that the HTMLParser module for some things. [230] [1]
  • XIST a python framework for reading and writing XML. [215] [1]



              back to vermeulen.ca home