Книга: Automate the Boring Stuff with Python: Practical Programming for Total Beginners
Назад: 12. Working with Excel Spreadsheets
Дальше: 14. Working with CSV Files and JSON Data

.

, and enter the following into the interactive shell:

, and so on. This is always the case, even if pages are numbered differently within the document. For example, say your PDF is a three-page excerpt from a longer report, and its pages are numbered 42, 43, and 44. To get the first page of this document, you would want to call pdfReader.getPage(0), not getPage(42) or getPage(1).

Once you have your Page object, call its extractText() method to return a string of the page’s text ➌. The text extraction isn’t perfect: The text Charles E. “Chas” Roemer, President from the PDF is absent from the string returned by extractText(), and the spacing is sometimes off. Still, this approximation of the PDF text content may be good enough for your program.

and place the PDFs in the current working directory. Enter the following into the interactive shell:

. The return values from rotateClockwise() and rotateCounterClockwise() contain a lot of information that you can ignore.

and place the PDF in the current working directory along with meetingminutes.pdf. Then enter the following into the interactive shell:

shows the results. Our new PDF, watermarkedCover.pdf, has all the contents of the meetingminutes.pdf, and the first page is watermarked.

and , respectively. The full documentation for Python-Docx is available at . Although there is a version of Word for OS X, this chapter will focus on Word for Windows.

has four runs.

and save the document to the working directory. Then enter the following into the interactive shell:

. On OS X, you can view the Styles pane by clicking the ViewStyles menu item.

shows this on Windows).

This will open the Create New Style from Formatting dialog, where you can enter the new style. Then, go back into the interactive shell and open this blank Word document with docx.Document(), using it as the base for your Word document. The name you gave this style will now be available to use with Python-Docx.

lists the text attributes that can be set on Run objects.

shows how the styles of paragraphs and runs look in restyled.docx.

.

.

. Note that the text This text is being added to the second paragraph. was added to the Paragraph object in paraObj1, which was the second paragraph added to doc. The add_paragraph() and add_run() functions return paragraph and Run objects, respectively, to save you the trouble of extracting them as a separate step.

Keep in mind that as of Python-Docx version 0.5.3, new Paragraph objects can be added only to the end of the document, and new Run objects can be added only to the end of a Paragraph object.

The save() method can be called again to save the additional changes you’ve made.

.

from a PdfFileReader object?

, write a script that will go through every PDF in a folder (and its subfolders) and encrypt the PDFs using a password provided on the command line. Save each encrypted PDF with an _encrypted.pdf suffix added to the original filename. Before deleting the original file, have the program attempt to read and decrypt the file to ensure that it was encrypted correctly.

Then, write a program that finds all encrypted PDFs in a folder (and its subfolders) and creates a decrypted copy of the PDF using a provided password. If the password is incorrect, the program should print a message to the user and continue to the next PDF.

.

Since Python-Docx can use only those styles that already exist in the Word document, you will have to first add these styles to a blank Word file and then open that file with Python-Docx. There should be one invitation per page in the resulting Word document, so call add_break() to add a page break after the last paragraph of each invitation. This way, you will need to open only one Word document to print all of the invitations at once.

.

. This dictionary file contains over 44,000 English words with one word per line.

Using the file-reading skills you learned in , create a list of word strings by reading this file. Then loop over each word in this list, passing it to the decrypt() method. If this method returns the integer 0, the password was wrong and your program should continue to the next password. If decrypt() returns 1, then your program should break out of the loop and print the hacked password. You should try both the uppercase and lower-case form of each word. (On my laptop, going through all 88,000 uppercase and lowercase words from the dictionary file takes a couple of minutes. This is why you shouldn’t use a simple English word for your passwords.)

© RuTLib.com 2015-2018