• Tidak ada hasil yang ditemukan

Opening Files from Different Locations

Dalam dokumen Buku Data Wrangling with Python (Halaman 67-70)

In the current code, we pass the path of the file to the open function like this:

open('data-text.csv', 'rb')

However, if our code was in a subfolder called data, we would need to modify the script to look there. That is, we would instead need to use:

open('data/data-text.csv', 'rb')

CSV Data | 49

In the preceding example, we would have a file structure that looks like this:

data_wrangling/

`-- code/

|-- import_csv_data.py `-- data/

`-- data-text.csv

If you have trouble locating your file, open up your command line and use the follow‐

ing commands on your Mac or Linux machine to navigate through your folders:

ls returns a list of files.

pwd shows your current location.

cd ../ moves to the parent folder.

cd ../../ moves two levels up.

cd data moves into a folder called data that is inside the folder you are currently in (use ls to check!).

For more on navigating on the command line, including an entire section for Win‐

dows users, check out Appendix C.

After you save the file, you can run it using the command line. If you don’t already have it open, open your command line (Terminal or cmd), and navigate to where the file is located. Let’s assume that you put the file in ~/Projects/data_wrangling/code. To navigate there using the Mac command line, you would use the change directory or folder command (cd):

cd ~/Projects/data_wrangling/code

After you get to the right location, you can run the Python file. Up until this point, we were running our code in the Python interpreter. We saved the file as import_csv_data.py. To run a Python file from the command line, you simply type python, a space, and then the name of the file. Let’s try running our import file:

python import_csv_data.py

Your output should look like a bunch of lists—something like the data shown here, but with many more records:

['Healthy life expectancy (HALE) at birth (years)', 'Published', '2012', 'Western Pacific', 'Lower-middle-income', 'Samoa', 'Female', '66', '66.00000', '', '', '']

['Healthy life expectancy (HALE) at birth (years)', 'Published', '2012', 'Eastern Mediterranean', 'Low-income', 'Yemen', 'Both sexes', '54', '54.00000', '', '', '']

['Healthy life expectancy (HALE) at birth (years)', 'Published', '2000', 'Africa', 'Upper-middle-income', 'South Africa', 'Male', '49', '49.00000', '', '', '']

['Healthy life expectancy (HALE) at birth (years)', 'Published', '2000', 'Africa', 'Low-income', 'Zambia', 'Both sexes', '36', '36.00000', '', '', '']

['Healthy life expectancy (HALE) at birth (years)', 'Published', '2012', 'Africa', 'Low-income', 'Zimbabwe', 'Female', '51', '51.00000', '', '', '']

Did you get this output? If not, stop for a minute to read the error you received. What does it tell you about where you might have gone wrong? Take time to search for the error and read a few ways people have fixed the same error. If you need extra help on troubleshooting how to get past the error, take a look at Appendix E.

For a lot of our code from this point onward, we will do the work in a code editor, save the file, and run it from the command line.

Your Python interpreter will still be a helpful tool to try out pieces of code, but as code gets longer and more complex it becomes harder to maintain in a code prompt.

With the current code we are writing, along with many other solutions we’ll write, there are often many ways to solve a problem. csv.reader() returns each new line of your file as a list of data and is an easy-to-understand solution when you begin. We are going to modify our script slightly to make our list rows into dictionary rows.

This will make our data a little easier to read, compare, and understand as we explore our dataset.

In your text editor, take line 4, reader = csv.reader(csvfile), and update it to read reader = csv.DictReader(csvfile). Your code should now look like this:

import csv

csvfile = open('data-text.csv', 'rb') reader = csv.DictReader(csvfile) for row in reader:

print row

When you run the file again after saving it, each record will be a dictionary. The keys in the dictionary come from the first row of the CSV file. All the subsequent rows are values. Here is an example row of output:

{

'Indicator': 'Healthy life expectancy (HALE) at birth (years)', 'Country': 'Zimbabwe',

'Comments': '', 'Display Value': '49',

'World Bank income group': 'Low-income', 'Numeric': '49.00000',

'Sex': 'Female', 'High': '', 'Low': '', 'Year': '2012',

CSV Data | 51

'WHO region': 'Africa', 'PUBLISH STATES': 'Published' }

At this point, we have successfully imported the CSV data into Python, meaning we were able to get the data from the file into a usable format Python can understand (dictionaries). Using a for loop helped us see the data so we could visually review it.

We were able to use two different readers from the csv library to see the data in both a list and a dictionary form. We will be using this library again as we start exploring and analyzing datasets. For now, let’s move on to importing JSON.

Dalam dokumen Buku Data Wrangling with Python (Halaman 67-70)