PDF File Creation and Processing with Python

ability actual allowing aug border capable contain convert crop css de directly dxf environment example extract grid head height hr https individual interesting last left libraries making mar most multiple nedbatchelder needs output pages pdf printing projects pure recipes releases script sure tbody title tools utility

Python libraries exist that allow for creation of, and manipulation of, PDF files. ReportLab is possibly the most capable of these modules.

2010-May-28: Manipulating PDFs with Python and pyPdf. [9142]
2009-Oct-02: pdftable is a module and utility for extracting tables from PDF files (using pdftohtml to do the initial extraction). [8592]
2009-Jul-15: Using pyPdf to crop margins in a PDF document. [8275] [1]
2009-Jul-08: Using ReportLab to print page X of Y into a PDF file. [8241] [1]
2009-Jun-23: PDFtoOCR is a Plone module to convert PDF files to text, using OCR processing if necessary. [8187]
2009-Apr-14: Making a PDF from a directory of images by using ReportLab. [7871] [1]
2009-Mar-06: How to concatenate PDF files using pypdf in only 10 lines of code. [7699]
2008-Oct-23: How to generate reports that contain charts with ReportLab, a worked set of examples. [7078]
2008-Oct-21: pdfgrid can be used to add a grid over the top of PDF pages. Not sure why you would want to do this, but it might be useful as an example of modifying PDF files. [7061]
2008-Sep-24: pdfnup is a tool for laying out multiple pages per sheet. This is built on top of pyPdf. [6910]
2008-Sep-17: pyPdf, (a newer link) A Pure-Python library built as a PDF toolkit. At present, there is only one actual tool in the toolkit - the ability to grab pages from PDFs and output them into a new PDF. Like a hammer, this tool is useful for two operations: splitting and merging. You can extract individual pages from a PDF file, or selectively merge pages from multiple PDF files. This is also available here. pdfsplit uses pyPdf to split a PDF file or rearrange its pages into a new PDF file. [112]
2008-Aug-27: rst2pdf converts restructured text to PDF using reportlab. [6749]
2008-Aug-21: pdfrecycle is a module that allows you to build a PDF file out of pages selected from other PDF files. It needs a full LaTeX environment installing to use it though. Its home page is here. [6280]
2008-Aug-20: pdfposter is a tool to scale and tile PDF images to print on multiple pages, so you can print your own posters. The project home page is here. [5992]
2008-Aug-05: The mediawiki markup system can be used to generate PDF pages with mwlib.rl. [6633]
2008-Jul-27: pdfminer can be used to extract text and data from PDF documents. [6590]
2008-Apr-21: The pisa (home page here) module uses the ReportLab toolkit to create PDF files from HTML and CSS input. This recipe is an example of using pisa. [4141] [1]
2008-Apr-03: pdfcat is a tool for concatenation of PDF files. [5386]
2007-Dec-06: An example of using pyPdf to extract text from PDF files [817] [1]
2007-Dec-06: A Python script to extract JPEGs from PDF files. [4400]
2007-Sep-30: pyText2Pdf, a Python script to convert pain text directly into PDF documents. [2407] [1]
2007-Aug-27: Convert Microsoft Office documents to PostScript (or PDF) using an installed printer driver and win32com. [584] [1] [2]
2007-Aug-23: Snowtide Informatics' PDFTextStream has been made usable with Python, allowing for extraction of text content from PDF files. [115]
2007-Aug-23: Tiny RML2PDF converts RML formatted documents to PDF. [114]
2007-Aug-23: Here is a DXF to PDF converter (it uses ReportLab for the PDF generation) [113]
2007-Aug-23: ReportLab, a Python module for creating PDF files [111]
2007-Aug-23: epydoc is a module to allow for the creation of PDF files from Python [110]

back to vermeulen.ca home