pdfReader.getPage(0)
, not getPage(42)
or getPage(1)
.Once you have your Page
object, call its extractText()
method to return a string of the page’s text ➌. The text extraction isn’t perfect: The text Charles E. “Chas” Roemer, President from the PDF is absent from the string returned by extractText()
, and the spacing is sometimes off. Still, this approximation of the PDF text content may be good enough for your program.
rotateClockwise()
and rotateCounterClockwise()
contain a lot of information that you can ignore.has four runs.
This will open the Create New Style from Formatting dialog, where you can enter the new style. Then, go back into the interactive shell and open this blank Word document with docx.Document()
, using it as the base for your Word document. The name you gave this style will now be available to use with Python-Docx.
text
attributes that can be set on Run
objects.Paragraph
object in paraObj1
, which was the second paragraph added to doc
. The add_paragraph()
and add_run()
functions return paragraph and Run
objects, respectively, to save you the trouble of extracting them as a separate step.Keep in mind that as of Python-Docx version 0.5.3, new Paragraph
objects can be added only to the end of the document, and new Run
objects can be added only to the end of a Paragraph
object.
The save()
method can be called again to save the additional changes you’ve made.
PdfFileReader
object?Then, write a program that finds all encrypted PDFs in a folder (and its subfolders) and creates a decrypted copy of the PDF using a provided password. If the password is incorrect, the program should print a message to the user and continue to the next PDF.
Since Python-Docx can use only those styles that already exist in the Word document, you will have to first add these styles to a blank Word file and then open that file with Python-Docx. There should be one invitation per page in the resulting Word document, so call add_break()
to add a page break after the last paragraph of each invitation. This way, you will need to open only one Word document to print all of the invitations at once.
Using the file-reading skills you learned in , create a list of word strings by reading this file. Then loop over each word in this list, passing it to the decrypt()
method. If this method returns the integer 0
, the password was wrong and your program should continue to the next password. If decrypt()
returns 1
, then your program should break out of the loop and print the hacked password. You should try both the uppercase and lower-case form of each word. (On my laptop, going through all 88,000 uppercase and lowercase words from the dictionary file takes a couple of minutes. This is why you shouldn’t use a simple English word for your passwords.)