For more information, refer to our Bulk Print and eBook Sales website at www.apress.com/bulk-sales. Any source code or other supplemental material referenced by the author in this book is available to readers on GitHub via the book's product page, located at www.apress.com.
About the Author
About the Technical Reviewers
He has conducted several faculty development training programs across India and conducted corporate training for software companies across India.
Introduction
You will learn how to apply basic Python programming techniques for data cleaning and manipulation. You will learn how to implement Python techniques to explore and analyze a series of data, create a series,.
Introduction to Data Science with Python
The Stages of Data Science
Data science algorithms are used in products such as internet search engines to deliver the best results for search queries in less time, in recommendation systems that use a user’s experience to generate recommendations, in digital advertisements, in education systems, in healthcare systems, and so on. Data scientists should have in-depth knowledge of programming tools such as Python, R, SAS, Hadoop.
Why Python?
Python is now at version 3.x, which was released in February 2011 after a long period of testing. Many of its key features are also reported in backwards compatible Python 2.6, 2.7 and 3.6.
Basic Features of Python
Extensible: Easily extensible by adding new modules implemented in a compiled language such as C or C++, which can be used to compile the code. Large Standard Library: Comes with a large standard library that supports many common programming tasks, such as connecting to web servers, searching text with regular expressions, and reading and editing files.
Python Learning Resources
See http://www.lulu.com/shop/ossama-embarak/agile-python-programming-applied-for-iedereen/paperback/product-23694020.html.). Python for You and Me is an accessible book with sections for Python syntax and key language constructs.
Python Environment and Editors
These resources are useful not only for Python beginners, but for any developer who wants to have a strong professional career in software. There is a Udacity course by one of the creators of Reddit that shows how to use Python to build a blog.
Portable Python Editors (No Installation Required)
WinPython: This is a free Python distribution for the Windows platform; it contains pre-built packages for ScientificPython. Anaconda: This is a completely free, enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing.
Azure Notebooks
You can create folders and subfolders by selecting +New from the ribbon; then for the item type select Folder, as shown in Figure 1-3. Open the Created Hello World script by clicking it, and start writing your Python code, as shown in Figure 1-4.
Offline and Desktop Python Editors
The Basics of Python Programming
Basic Syntax
Reserved words are words that are already reserved by the Python language and cannot be redefined or declared by the user.
Lines and Indentation
Multiline Statements
Quotation Marks in Python
Multiple Statements on a Single Line
Read Data from Users
Declaring Variables and Assigning Values
Multiple Assigns
Variable Names and Keywords
Statements and Expressions
Basic Operators in Python
Arithmetic Operators
Relational Operators
Assign Operators
Logical Operators
This differs from other languages such as C#, which is a compiled language that must process the entire program.
Python Comments
Conversion Types
In the previous example, you can see that %.2f is replaced by the value 172.16 with two decimals after the decimal point, while %2d is used only to display decimal values, but in a two-digit format. You can display values read directly from a dictionary, as shown below, where %(name)s means to take the dictionary value of the Name key as a string and %(height).2f means to take it as a two-fraction float values, which are dictionary values of key height:.
The Replacement Field, {}
Also, you can use a mixture of combined values from lists, dictionaries, attributes, or even a singleton variable. In the following example, you will create a class called A(), which has a single variable called x that is assigned the value 9.
The Date and Time Module
2[test]} refers to index 2 in the print string and reads its value from the passed dictionary from the key test.
Time Module Methods
Python Calendar Module
Fundamental Python Programming Techniques
Selection Statements
In Listing 1-14, the if statement condition is false, and therefore the outermost print statement is the only statement executed. The nested if statement is an if statement that is the target of another if statement.
Iteration Statements
For example, if you specify 4, the loop statement starts at 1 and ends with 3, which is n-1. Causes the loop to skip the rest of its body and immediately retest the condition before repeating.
The Use of Break, Continues, and Pass Statements
In the previous examples, the first and second iterations used the for loop with an array statement. As shown, you can iterate over a list of letters, as shown in Listing 1-20, and you can iterate over the Python3 word and display all the letters.
String Processing
String Special Operators
String Slicing and Concatenation
String Conversions and Formatting Symbols
Loop Through String
You can also use iterations to count letters in a word or to count words in lines, as shown in Listing 1-25.
Python String Functions and Methods
Listing 1-26 shows how to use built-in methods to remove whitespace from a string, count specific letters within a string, check whether the string contains another string, and so on. Higher Technical Colleges Higher Technical Colleges Higher Technical Colleges Higher Technical University.
The in Operator
Parsing and Extracting Strings
Tabular data can be easily represented in Python using lists of tuples that represent the dataset's records in a data frame structure. Although simple to create, these types of representations typically do not allow for important manipulations of tabular data, such as efficient column selection, matrix mathematics, or spreadsheet-style operations.
Python Pandas Data Science Library
The Pandas library also provides rich data structures and functions designed to make working with structured data fast, easy, and expressive. Sometimes you need to import the Pandas package because the standard Python distribution does not come with the Pandas module.
A Pandas Series
A Pandas Data Frame
You can retrieve data from a data frame from index 1 to the end of rows.
A Pandas Panels
Python Lambdas and the Numpy Library
The map() Function
The filter() Function
The reduce () Function
Python Numpy Package
Data Cleaning and Manipulation Techniques
Abstraction of the Series and Data Frame
The data frame data structure is the main structure for data collection and processing in Python. A data frame is a two-dimensional array object, as shown in Figure 1-8, where there is an index and several columns of content, each with a label.
Running Basic Inferential Analyses
In other words, it is the average of the square of the difference between the values in a data set and the average value. It gives you an idea of the average value of the data in the data set and an indication of how widely distributed the values are in the data set.
Summary
Exercises and Answers
Write a Python program to prompt users to enter a Celsius value; then convert Celsius to Fahrenheit, where T(°F) = T(°C) x 1.8 + 32. Write a program to prompt users to enter the speed of a car; then calculate the fines according to.
The Importance of Data Visualization in
Shifting from Input to Output
Why Is Data Visualization Important?
Figuring out the patterns, trends and correlations in the data being analyzed to determine where they need to improve their operational processes and thereby grow their business. Provide a more complete view of the data being analyzed – Organize and present large amounts of data intuitively.
Why Do Modern Businesses Need Data Visualization?
It allows managers to understand the correlations between business conditions and business performance. It helps companies discover the gray areas of the business and make the right decisions for improvement.
The Future of Data Visualization
Data visualization helps managers understand customer behavior and interests to retain customers and market share. Data visualization will be used extensively to analyze and visualize data streams collected from billions of interconnected devices.
How Data Visualization Is Used for Business Decision-Making
In this context, data visualization will improve security levels, increase operational efficiency, help better understand various global phenomena and improve and adapt the intercontinental services offered.
Faster Responses
Simplicity
Easier Pattern Visualization
Team Involvement
Unify Interpretation
Introducing Data Visualization Techniques
Loading Libraries
Similarly, you can install or upgrade packages or specific Python packages such as Matplotlib on Jupyter Notebooks, as shown in Listing 2-1. Once you load a library into your Python script, you can call its package functions and attributes.
Popular Libraries for Data Visualization in Python
Matplotlib
%matplotlib inline will lead to static graph images of your plot embedded in the notebook. There are many different plot formats generated by the Matplotlib package; some of these formats will be discussed in Chapter 7.
Seaborn
You can use the same data set, called Data, as in the previous example (see Figure 2-5).
Plotly
Importing and using the Plotly V library [67]: import plotly.graph_objs as go, import numpy as np. Running the Plotly Python script as shown in Listing 2-6 will open a web browser with the Plotly dynamic graph drawn as shown in Figure 2-12.
Geoplotlib
Pandas
Introducing Plots in Python
Data visualization is the process of interpreting data in pictorial or graphical form. Simplicity: Data visualization techniques give a complete picture of the parameters being measured and simplify the data by allowing decision makers to select the relevant data they need and drill down into details wherever necessary.
Data Collection Structures
Lists
Creating Lists
Accessing Values in Lists
Adding and Updating Lists
As shown in Listing 3-3, you can add a new element to the list using the append() method. You can also update an item in the list using the list name and item index.
Deleting List Elements
Basic List Operations
Indexing, Slicing, and Matrices
Built-in List Functions and Methods
List Functions
List Methods
List Sorting and Traversing
Lists and Strings
You must specify the delimiter that the join() method will append between the elements of the list to form a string.
Parsing Lines
Aliasing
Dictionaries
Dictionary values can be repeated multiple times and the values can be of any data type.
Creating Dictionaries
Updating and Accessing Values in Dictionaries
Listing 3-14 shows that you can create a function to calculate net salary after subtracting the 5 percent payroll tax value, repeating all the dictionary elements.
Deleting Dictionary Elements
Built-in Dictionary Functions
2 len(dict) Returns the total length of the dictionary, i.e. the number of items in the dictionary.
Built-in Dictionary Methods
Tuples
Creating Tuples
Listing 3-21 shows how to sort the elements of a tuple in place and how to create another sorted tuple. By default, the built-in sort function has detected that elements are tuples, so the function sorts tuples based on the first element and then on the second element.
Concatenating Tuples
Accessing Values in Tuples
You can access the tuple element back and forth; in addition, you can cut values from a tuple with indices. Listing 3-25 shows that you can slice forward, where MarksCIS[1:4] retrieves elements from element 1 to element 3, while MarksCIS[:] retrieves all elements in the array.
Basic Tuples Operations
Series
Creating a Series with index
Creating a Series from a Dictionary
You can use the get method to access series values per index label, as shown in Listing 3-31.
Creating a Series from a Scalar Value
Vectorized Operations and Label Alignment with Series
Name Attribute
Data Frames
Creating Data Frames from a Dict of Series or Dicts
Creating Data Frames from a Dict of Ndarrays/Lists
Creating Data Frames from a Structured or Record Array
Creating Data Frames from a List of Dicts
Creating Data Frames from a Dict of Tuples
Selecting, Adding, and Deleting Data Frame Columns
Also, if you insert a series that does not have the same index as the data frame, it will match the index of the data frame. To delete a column, you can use the del or pop method, as shown in Listing 3-41.
Assigning New Columns in Method Chains
Indexing and Selecting Data Frames
Transposing a Data Frame
Data Frame Interoperability with Numpy Functions
Panels
Creating a Panel from a 3D Ndarray
Creating a Panel from a Dict of Data Frame Objects
Selecting, Adding, and Deleting Items
How to maintain a collection of data in different forms – How to create lists and how to manipulate the contents of a list – What a dictionary is and the purpose of creating a dic-. How to create a series from other data collection forms – How to create data frames from different data collections.
File I/O Processing and Regular
Expressions
File I/O Processing
Data Input and Output