Integers and Floats - Buku Data Wrangling with Python

The second and third data types we are going to learn about are integers and floats, which are how you handle numbers in Python. Let’s begin with integers.

Integers

You may remember integers from math class, but just in case you don’t, an integer is a whole number. Here are some examples:

10 1 0 -1 -10

If you enter those into your Python interpreter, the interpreter will return them back to you.

Notice in the string example in the previous section, we had a ^'5'. If a number is entered within quotes, Python will process the value as a string. In the following example, the first value and second value are not equal:

5 '5'

To test this, enter the following into your interpreter:

5 == '5'

The ⁼⁼ tests to see if the two values are equal. The return from this test will be true or false. The return value is another Python data type, called a Boolean. We will work with Booleans later, but let’s briefly review them. A Boolean tells us whether a state‐

ment is True or False. In the previous statement, we asked Python whether ⁵ the inte‐

ger was the same as ^'5' the string. What did Python return? How could you make the statement return ^True? (Hint: try testing with both as integers or both as strings!) You might be asking yourself why anyone would store a number as a string. Some‐

times this is an example of improper use—for example, the code is storing ^'5' when the number should have been stored as ⁵, without quotes. Another case is when fields are manually populated, and may contain either strings or numbers (e.g., a survey Basic Data Types | 19

where people can type five or 5 or V). These are all numbers, but they are different representations of numbers. In this case, you might store them as strings until you process them.

One of the most common reasons for storing numbers as strings is a purposeful action, such as storing US postal codes. Postal codes in the United States consist of five numbers. In New England and other parts of the northeast, the zip codes begin with a zero. Try entering one of Boston’s zip codes into your Python interpreter as a string and as an integer. What happens?

'02108' 02108

Python will throw a SyntaxError in the second example (with the message ^invalid token and a pointer at the leading zero). In Python, and in numerous other lan‐

guages, “tokens” are special words, symbols, and identifiers. In this case, Python does not know how to process a normal (non-octal) number beginning with zero, meaning it is an invalid token.

Floats, decimals, and other non–whole number types

There are multiple ways to tell Python to handle non–whole number math. This can be very confusing and appear to cause rounding errors if you are not aware how each non–whole number data type behaves.

When a non–whole number is used in Python, Python defaults to turning the value into a float. A float uses the built-in floating-point data type for your Python version.

This means Python stores an approximation of the numeric value—an approximation that reflects only a certain level of precision.

Notice the difference between the following two numbers when you enter them into your Python interpreter:

2 2.0

The first one is an integer. The second one is a float. Let’s do some math to learn a little more about how these numbers work and how Python evaluates them. Enter the following into your Python interpreter:

2/3

What happened? You got a ^zero value returned, but you were likely expecting 0.6666666666666666 or 0.6666666666666667 or something along those lines. The problem was that those numbers are both integers and integers do not handle frac‐

tions. Let’s try turning one of those numbers into a float:

2.0/3

Now we get a more accurate answer of 0.6666666666666666. When one of the num‐

bers entered is a float, the answer is also a float.

As mentioned previously, Python floats can cause accuracy issues. Floats allow for quick processing, but, for this reason, they are more imprecise.

Computationally, Python does not see numbers the way you or your calculator would. Try the following two examples in your Python interpreter:

0.3 0.1 + 0.2

With the first line, Python returns ^0.3. On the second line, you would expect to see 0.3 returned, but instead you get 0.30000000000000004. The two values ^0.3 and 0.30000000000000004 are not equal. If you are interested in the nuances of this, you can read more in the Python docs.

Throughout this book, we will use the ^decimal module (or library) when accuracy matters. A module is a section or library of code you import for your use. The decimal module makes your numbers (integers or floats) act in predictable ways (fol‐

lowing the concepts you learned in math class).

In the next example, the first line imports ^getcontext and ^Decimal from the ^decimal module, so we have them in our environment. The following two lines use getcontext and ^Decimal to perform the math we already tested using floats:

from decimal import getcontext, Decimal getcontext().prec = 1

Decimal(0.1) + Decimal(0.2)

When you run this code, Python returns Decimal('0.3'). Now when you enter print Decimal('0.3'), Python will return ^0.3, which is the response we originally expected (as opposed to 0.30000000000000004).

Let’s step through each of those lines of code:

from decimal import getcontext, Decimal getcontext().prec = 1 Decimal(0.1) + Decimal(0.2)

Imports ^getcontext and ^Decimal from the ^decimal module.

Sets the rounding precision to one decimal point. The ^decimal module stores most rounding and precision settings in a default context. This line changes that context to use only one-decimal-point precision.

Sums two decimals (one with value ^0.1 and one with value ^0.2) together.

Basic Data Types | 21

What happens if you change the value of getcontext().prec? Try it and rerun the final line. You should see a different answer depending on how many decimal points you told the library to use.

As stated earlier, there are many mathematical specifics you will encounter as you wrangle your data. There are many different approaches to the math you might need to perform, but the decimal type allows us greater accuracy when using nonwhole

Dalam dokumen Buku Data Wrangling with Python (Halaman 37-40)