Книга: Automate the Boring Stuff with Python: Practical Programming for Total Beginners
Назад: 13. Working with PDF and word Documents
Дальше: 15. Keeping Time, Scheduling Tasks, and Launching Programs

, you learned how to extract text from PDF and Word documents. These files were in a binary format, which required special Python modules to access their data. CSV and JSON files, on the other hand, are just plaintext files. You can view them in a text editor, such as IDLE’s file editor. But Python also comes with the special csv and json modules, each providing functions to help you work with these file formats.

CSV stands for “comma-separated values,” and CSV files are simplified spreadsheets stored as plaintext files. Python’s csv module makes it easy to parse CSV files.

JSON (pronounced “JAY-sawn” or “Jason”—it doesn’t matter how because either way people will say you’re pronouncing it wrong) is a format that stores information as JavaScript source code in plaintext files.

would look like this in a CSV file:

or enter the text into a text editor and save it as example.csv.

CSV files are simple, lacking many of the features of an Excel spreadsheet. For example, CSV files

  • Don’t have types for their values—everything is a string

  • Don’t have settings for font size or color

  • Don’t have multiple worksheets

  • Can’t specify cell widths and heights

  • Can’t have merged cells

  • Can’t have images or charts embedded in them

The advantage of CSV files is simplicity. CSV files are widely supported by many types of programs, can be viewed in text editors (including IDLE’s file editor), and are a straightforward way to represent spreadsheet data. The CSV format is exactly as advertised: It’s just a text file of comma-separated values.

Since CSV files are just text files, you might be tempted to read them in as a string and then process that string using the techniques you learned in . For example, since each cell in a CSV file is separated by a comma, maybe you could just call the split() method on each line of text to get the values. But not every comma in a CSV file represents the boundary between two cells. CSV files also have their own set of escape characters to allow commas and other characters to be included as part of the values. The split() method doesn’t handle these escape characters. Because of these potential pitfalls, you should always use the csv module for reading and writing CSV files.

.

and unzip it to a folder. Run the removeCsvHeader.py program in that folder. The output will look like this:

.

, where <Location> is the name of the city whose weather you want. Add the following to quickWeather.py.

for more documentation on what these fields mean. For example, the online documentation will tell you that the 302.29 after 'day' is the daytime temperature in Kelvin, not Celsius or Fahrenheit.

The weather descriptions you want are after 'main' and 'description'. To neatly print them out, add the following to quickWeather.py.

covers scheduling, and explains how to send email.)

  • Pull weather data from multiple sites to show all at once, or calculate and show the average of the multiple weather predictions.

  • , you’ll break away from data formats and learn how to make your programs communicate with you by sending emails and text messages.

    , write a program that reads all the Excel files in the current working directory and outputs them as CSV files.

    A single Excel file might contain multiple sheets; you’ll have to create one CSV file per sheet. The filenames of the CSV files should be <excel filename>_<sheet title>.csv, where <excel filename> is the filename of the Excel file without the file extension (for example, 'spam_data', not 'spam_data.xlsx') and <sheet title> is the string from the Worksheet object’s title variable.

    This program will involve many nested for loops. The skeleton of the program will look something like this:

    , and unzip the spreadsheets into the same directory as your program. You can use these as the files to test the program on.

    © RuTLib.com 2015-2018