216 218 60 ajax announce athlon both cellspacing claims collection convert current dictionary easier easily experience fr freshmeat head height https ianbicking img implemented includes interesting manipulation mar minutes native notes nov o object objects pro process pub searching sort stephen tasks tbody utilities various writing xml yet
|
- 2009-Nov-13: Beautiful
Soup, is a Python HTML/XML parser for projects like screen-scraping. It is available here too. An example of doing some markup massage to clean up problematic HTML prior to running Beautiful Soup on it.
[231] [1]
- 2009-Jul-15: bier-soup.py is an example of reading in html tables and processing them to text files with BeautifulSoup. [8265] [1]
- 2009-May-20: Mini-XML aims to be a small, portable, XML parser written in ANSI C. [8005]
- 2009-May-01: pyxser a Python object to XML serializer. [7929] [1]
- 2008-Dec-11: Ian Bicking suggests that lxml is a good alternative to BeautifulSoup for web scraping tasks. He also has some short examples including an HTML diff. [7336] [1] [2]
- 2008-Oct-28: gxml is a module providing a common interface to some of the popular XML libraries. [7109] [1]
- 2008-Oct-11: A patch for ElementTree to better support the CDATA section. [7006] [1] [2]
- 2008-Jul-15: Google's Protocol Buffers are intended to provide an object serialization system without the overhead of XML. Some comments on them here. [6494] [1]
- 2008-Jul-09: lxml, discussed here,
provides Python bindings for the libxml2 and libxslt libraries.
O'Reilly discusses libxslt here. How lxml
and ElementTree compare, and why they both exist. The PyPi page for it is here
[220] [1]
- 2008-Jul-08: If you need really fast parsing of XML you might want to take a look at AsmXml, which claims to be able to parse XML at about 200MB/s on an Athlon XP 1800+ type chip. Despite this being an assembly language implementation there are versions for a number of operating systems (presumably all running on X86 chips). [6495]
- 2008-Jun-21: Converting XML to Dictionary and Back, the intent of this is to make working with XML structured data easier for the Python programmer. While this is not a general solution for all XML data files it may be useful for things where the structure and content is more restricted, like a configuration file. [6400] [1] [2]
- 2008-Jun-11: XML is sub-optimal and that can be a good thing. Part 2 is here.
[6273]
- 2008-Mar-31: Looking at the performance of various HTML parsers for Python (lxml, BeautifulSoup, html5lib, ElementTree, cElementTree, HTMLParser, htmlfill, Genshi, xml.dom.minidom). [5363] [1]
- 2008-Mar-30: xmlpolymerase is a Python object serializer that will pack to and unpack from XML. Sort of an XML version of Pickle. [5355] [1]
- 2008-Feb-25: openxmllib is a module for working with OpenXML documents. [5158] [1]
- 2008-Feb-05: PyXML, XML Parsers and API for Python, the project home page is here. [5050] [1]
- 2008-Jan-15: 23 XML fallacies to watch out for captures some useful experience with XML. [4702]
- 2008-Jan-10: Why binary-XML is solving the wrong problem. [4641] [1]
- 2007-Oct-22: XML to Python Data Structure, this allows an XML object to be easily accessed as a python object (with some minor limitations). [3500] [1] [2]
- 2007-Aug-31:
PTML, for
embedding Python into text documents
[229] [1]
- 2007-Aug-31:
Gnosis
Utilities, a collection of utilities
for working with XML documents. It includes some other things like full
text indexing and searching, Python object introspection, hashcash and
spam filtering.
[227] [1]
- 2007-Aug-24:
pullparser,
a simple module for HTML parsing, supposed to be easier to use that the
HTMLParser module for some things.
[230] [1]
- 2007-Aug-24:
surely, a program to
convert files written in a shorthand notation (similar to Python
syntax) into XML
[228] [1]
- 2007-Aug-24:
pyfo, a module for
quickly generating XML representations of Python objects.
[226] [1]
- 2007-Aug-24:
A look at Python-based DOM Manipulation templating systems
[225] [1]
- 2007-Aug-24:
YAXL, is Yet
Another (Pythonic) XML Library, one of the design goals being that it can be
understood in 15 minutes
[224] [1]
- 2007-Aug-24:
pyxmlserial,
XML serialization of basic Python data.
[223] [1]
- 2007-Aug-24:
PySimpleXML,
(and here) simplifies
the translation betweeen Python structures and XML
[222] [1]
- 2007-Aug-24:
xmltramp,
makes reading XML data easy
[221] [1]
- 2007-Aug-24:
ElementTree
is a Python XML reader/parser/writer that's been implemented in both pure
Python and C. Using non-standard
encodings in cElementTree. Here is a talk on using
ElementTree to process XML. A recommendation
for ElementTree.
[219] [1]
- 2007-Aug-24:
mlk_xhtml,
a package for creation of XHTML.
[218] [1]
- 2007-Aug-24:
xmlmodel,
allows you to expressively define an XML document, using native python
classes, you can then access the elements of the XML through a tree of
native python objects.
[217] [1]
- 2007-Aug-24:
simplexml
another XML file manipulation library for Python
[216] [1]
- XIST
a python framework for reading and writing XML.
[215] [1]
|