Creating and modifying PDF files in Python is straightforward with libraries like pypdf
and ReportLab. You can read, manipulate, and create PDF files using these tools. pypdf
lets you extract text, split, merge, rotate, crop, encrypt, and decrypt PDFs. ReportLab enables you to create new PDFs from scratch, allowing customization with fonts and page sizes.
By the end of this tutorial, you’ll understand that:
- You can read and modify existing PDF files using
pypdf
in Python. - You can create new PDF files from scratch with the ReportLab library.
- Methods to encrypt and decrypt a PDF file with a password are available in
pypdf
. - Concatenating and merging multiple PDF files can be done using
pypdf
. - You can add custom fonts to a PDF using ReportLab.
- Python can create interactive PDFs with forms using ReportLab.
To follow along with this tutorial, you should download and extract to your home folder the materials used in the examples. To do this, click the link below:
Download the sample materials: Click here to get the materials you’ll use to learn about creating and modifying PDF files in this tutorial.
Extracting Text From PDF Files With pypdf
In this section, you’ll learn how to read PDF files and extract their text using the pypdf
library. Before you can do that, though, you need to install it with pip
:
$ python -m pip install pypdf
With this command, you download and install the latest version of pypdf
from the Python package index (PyPI). To verify the installation, go ahead and run the following command in your terminal:
$ python -m pip show pypdf
Name: pypdf
Version: 3.8.1
Summary: A pure-python PDF library capable of splitting,
merging, cropping, and transforming PDF files
Home-page:
Author:
Author-email: Mathieu Fenniak <biziqe@mathieu.fenniak.net>
License:
Location: .../lib/python3.10/site-packages
Requires:
Required-by:
Pay particular attention to the version information. At the time of publication for this tutorial, the latest version of pypdf
was 3.8.1
. This library has gotten plenty of updates lately, and cool new features are added quite frequently. Most importantly, you’ll find many breaking changes in the library’s API if you compare it with its predecessor library PyPDF2
.
Before diving into working with PDF files, you must know that this tutorial is adapted from the chapter “Creating and Modifying PDF Files” in Python Basics: A Practical Introduction to Python 3.
The book uses Python’s built-in IDLE editor to create and edit Python files and interact with the Python shell, so you’ll find occasional references to IDLE throughout this tutorial. However, you should have no problems running the example code from the