Notes on Windows XP Pro

XML Notes

Copyright 2009 by Stephen Vermeulen
Last updated: 2009 Nov 13

216 218 60 ajax announce athlon both cellspacing claims collection convert current dictionary easier easily experience fr freshmeat head height https ianbicking img implemented includes interesting manipulation mar minutes native notes nov o object objects pro process pub searching sort stephen tasks tbody utilities various writing xml yet
  • 2009-Nov-13: Beautiful Soup, is a Python HTML/XML parser for projects like screen-scraping. It is available here too. An example of doing some markup massage to clean up problematic HTML prior to running Beautiful Soup on it. [231] [1]
  • 2009-Jul-15: is an example of reading in html tables and processing them to text files with BeautifulSoup. [8265] [1]
  • 2009-May-20: Mini-XML aims to be a small, portable, XML parser written in ANSI C. [8005]
  • 2009-May-01: pyxser a Python object to XML serializer. [7929] [1]
  • 2008-Dec-11: Ian Bicking suggests that lxml is a good alternative to BeautifulSoup for web scraping tasks. He also has some short examples including an HTML diff. [7336] [1] [2]
  • 2008-Oct-28: gxml is a module providing a common interface to some of the popular XML libraries. [7109] [1]
  • 2008-Oct-11: A patch for ElementTree to better support the CDATA section. [7006] [1] [2]
  • 2008-Jul-15: Google's Protocol Buffers are intended to provide an object serialization system without the overhead of XML. Some comments on them here. [6494] [1]
  • 2008-Jul-09: lxml, discussed here, provides Python bindings for the libxml2 and libxslt libraries. O'Reilly discusses libxslt here. How lxml and ElementTree compare, and why they both exist. The PyPi page for it is here [220] [1]
  • 2008-Jul-08: If you need really fast parsing of XML you might want to take a look at AsmXml, which claims to be able to parse XML at about 200MB/s on an Athlon XP 1800+ type chip. Despite this being an assembly language implementation there are versions for a number of operating systems (presumably all running on X86 chips). [6495]
  • 2008-Jun-21: Converting XML to Dictionary and Back, the intent of this is to make working with XML structured data easier for the Python programmer. While this is not a general solution for all XML data files it may be useful for things where the structure and content is more restricted, like a configuration file. [6400] [1] [2]
  • 2008-Jun-11: XML is sub-optimal and that can be a good thing. Part 2 is here. [6273]
  • 2008-Mar-31: Looking at the performance of various HTML parsers for Python (lxml, BeautifulSoup, html5lib, ElementTree, cElementTree, HTMLParser, htmlfill, Genshi, xml.dom.minidom). [5363] [1]
  • 2008-Mar-30: xmlpolymerase is a Python object serializer that will pack to and unpack from XML. Sort of an XML version of Pickle. [5355] [1]
  • 2008-Feb-25: openxmllib is a module for working with OpenXML documents. [5158] [1]
  • 2008-Feb-05: PyXML, XML Parsers and API for Python, the project home page is here. [5050] [1]
  • 2008-Jan-15: 23 XML fallacies to watch out for captures some useful experience with XML. [4702]
  • 2008-Jan-10: Why binary-XML is solving the wrong problem. [4641] [1]
  • 2007-Oct-22: XML to Python Data Structure, this allows an XML object to be easily accessed as a python object (with some minor limitations). [3500] [1] [2]
  • 2007-Aug-31: PTML, for embedding Python into text documents [229] [1]
  • 2007-Aug-31: Gnosis Utilities, a collection of utilities for working with XML documents. It includes some other things like full text indexing and searching, Python object introspection, hashcash and spam filtering. [227] [1]
  • 2007-Aug-24: pullparser, a simple module for HTML parsing, supposed to be easier to use that the HTMLParser module for some things. [230] [1]
  • 2007-Aug-24: surely, a program to convert files written in a shorthand notation (similar to Python syntax) into XML [228] [1]
  • 2007-Aug-24: pyfo, a module for quickly generating XML representations of Python objects. [226] [1]
  • 2007-Aug-24: A look at Python-based DOM Manipulation templating systems [225] [1]
  • 2007-Aug-24: YAXL, is Yet Another (Pythonic) XML Library, one of the design goals being that it can be understood in 15 minutes [224] [1]
  • 2007-Aug-24: pyxmlserial, XML serialization of basic Python data. [223] [1]
  • 2007-Aug-24: PySimpleXML, (and here) simplifies the translation betweeen Python structures and XML [222] [1]
  • 2007-Aug-24: xmltramp, makes reading XML data easy [221] [1]
  • 2007-Aug-24: ElementTree is a Python XML reader/parser/writer that's been implemented in both pure Python and C. Using non-standard encodings in cElementTree. Here is a talk on using ElementTree to process XML. A recommendation for ElementTree. [219] [1]
  • 2007-Aug-24: mlk_xhtml, a package for creation of XHTML. [218] [1]
  • 2007-Aug-24: xmlmodel, allows you to expressively define an XML document, using native python classes, you can then access the elements of the XML through a tree of native python objects. [217] [1]
  • 2007-Aug-24: simplexml another XML file manipulation library for Python [216] [1]
  • XIST a python framework for reading and writing XML. [215] [1]

              back to home