• Tidak ada hasil yang ditemukan

Python’s if Statement

Dalam dokumen Buku Data Wrangling with Python (Halaman 85-91)

In its most basic form, the if statement is a way to control the flow of your code.

When you use an if statement, you are telling the code: if this condition is met, then do something particular.

Another way to use if is followed with an else. An if-else statement says: if the first condition is met, then do something, but if it is not, then do what is in the else statement.

Besides if and if-else, you will see == as a comparison operator. While = sets a vari‐

able equal to a value, == tests to see if two values are equal. In addition, != tests to see if they are not equal. Both of these operators return Boolean values: True or False. Try the following examples in your Python interpreter:

x = 5 if x == 5:

print 'x is equal to 5.'

What did you see? x == 5 will return True, and so the text will have printed. Now try:

x = 3 if x == 5:

print 'x is equal to 5.' else:

print 'x is not equal to 5.'

Because x equals 3 and not 5 in this example, you should have received the print statement in the else block of code. You can use if and if-else statements in Python to help guide the logic and flow of your code.

We want to see when lookup_key is equal to Numeric, and use Numeric as the key instead of the value (like we did with the Category keys). Update your code with the following:

for item in observation:

lookup_key = item.attrib.keys()[0]

XML Data | 67

if lookup_key == 'Numeric':

rec_key = 'NUMERIC' else:

rec_key = item.attrib[lookup_key]

print rec_key

If you run your updated code, all of your keys should now look like keys. Now, let’s pull out the values we want to store in our new dictionary and associate them with those keys. In the case of Numeric, it’s simple, because we just want the Numeric key’s value. Make the following changes to your code:

if lookup_key == 'Numeric':

rec_key = 'NUMERIC'

rec_value = item.attrib['Numeric']

else:

rec_key = item.attrib[lookup_key]

rec_value = None print rec_key, rec_value

If you run the updated code, you will see the rec_value for Numeric is properly matched. For example:

NUMERIC 49.00000

For all other values, we set the rec_value to None. In Python, None is used to repre‐

sent a null value. Let’s populate those with real values. Remember each record has a Category and a Code key, like so: {'Category': 'YEAR', 'Code': '2012'}. For these elements, we want to store the Code value as the rec_value. Update the line rec_value = None, so your if-else statement looks like the one shown here:

if lookup_key == 'Numeric':

rec_key = 'NUMERIC'

rec_value = item.attrib['Numeric']

else:

rec_key = item.attrib[lookup_key]

rec_value = item.attrib['Code']

print rec_key, rec_value

Rerun the code, and you should now see that we have values for our rec_key and our rec_value. Let’s build the dictionary:

if lookup_key == 'Numeric':

rec_key = 'NUMERIC'

rec_value = item.attrib['Numeric']

else:

rec_key = item.attrib[lookup_key]

rec_value = item.attrib['Code']

record[rec_key] = rec_value

Adds each key and value to the record dictionary.

We also need to add each record to our all_data list. As we saw in “List Methods:

Things Lists Can Do” on page 32, we can use the list’s append method to add values to our list. We need to append each record at the end of the outer for loop, as that is when it will have all of the keys for each of the subelements. Finally, we will add a print at the end of the file, to show our data.

Your full code to transform the XML tree to a dictionary should look like this:

from xml.etree import ElementTree as ET tree = ET.parse('data-text.xml') root = tree.getroot()

data = root.find('Data') all_data = []

for observation in data:

record = {}

for item in observation:

lookup_key = item.attrib.keys()[0]

if lookup_key == 'Numeric':

rec_key = 'NUMERIC'

rec_value = item.attrib['Numeric']

else:

rec_key = item.attrib[lookup_key]

rec_value = item.attrib['Code']

record[rec_key] = rec_value all_data.append(record)

print all_data

Once you run the code, you will see a long list with a dictionary for each record, just like in the CSV example:

{'COUNTRY': 'ZWE', 'REGION': 'AFR', 'WORLDBANKINCOMEGROUP': 'WB_LI', 'NUMERIC': '49.00000', 'SEX': 'BTSX', 'YEAR': '2012',

'PUBLISHSTATE': 'PUBLISHED', 'GHO': 'WHOSIS_000002'}

As you can see, extracting data from the XML was a little more complicated. Some‐

times CSV and JSON files won’t be as easy to process as they were in this chapter, but they are usually more easily processed than XML files. However, looking at the XML

XML Data | 69

data allowed you to explore and grow as a Python developer, giving you a chance to create empty lists and dictionaries and populate them with data. You also honed your debugging skills as you explored how to extract data from the XML tree structure.

These are valuable lessons in your quest to become a better data wrangler.

Summary

Being able to handle machine-readable data formats with Python is one of the must- have skills for a data wrangler. In this chapter, we covered the CSV, JSON, and XML file types. Table 3-2 provides a reminder of the libraries we used to import and manipulate the different files containing the WHO data.

Table 3-2. File types and file extensions File type File extensions Python library CSV, TSV .csv, .tsv csv

JSON .json, .js json

We also covered a few new Python concepts. At this point, you should know how to run Python code from the Python interpreter and how to save the code to a new file and run it from the command line. We also learned about importing files using import, and how to read and open files with Python on your local filesystem.

Other new programming concepts we covered include using for loops to iterate over files, lists, or trees and using if-else statements to determine whether a certain con‐

dition has been met and to do something depending on that evaluation. Table 3-3 summarizes those new functions and code logic you learned about here.

Table 3-3. New Python programming concepts

Concept Purpose

import Imports a module into the Python space

open Built-in function that opens a file in Python on your system for loop A piece of code that runs n times

if-else statement Runs a piece of code if a certain condition is met

== (equal to operator) Tests to see if one value is equal to another

Indexing a sequence Pulls out the nth object in the sequence (string, list, etc.)

Lastly, in this chapter we started to create and save a lot of code files and data files.

Assuming you did all the exercises in this chapter, you should have three code files and three data files. Earlier in the chapter, there was a recommendation for how to organize your code. If you have not done this already, do it now. Here is one example of how to organize your data so far:

data_wrangling/

code/

ch3_easy_data/

import_csv_data.py import_xml_data.py import_json_data.py data-text.csv data-text.xml data-json.json ch4_hard_data/

...

Now, on to harder data formats!

Summary | 71

CHAPTER 4

Dalam dokumen Buku Data Wrangling with Python (Halaman 85-91)