• Tidak ada hasil yang ditemukan

Data Analysis and Visualization Using Python

N/A
N/A
Chris K

Academic year: 2024

Membagikan "Data Analysis and Visualization Using Python"

Copied!
390
0
0

Teks penuh

For more information, refer to our Bulk Print and eBook Sales website at www.apress.com/bulk-sales. Any source code or other supplemental material referenced by the author in this book is available to readers on GitHub via the book's product page, located at www.apress.com.

About the Author

About the Technical Reviewers

He has conducted several faculty development training programs across India and conducted corporate training for software companies across India.

Introduction

You will learn how to apply basic Python programming techniques for data cleaning and manipulation. You will learn how to implement Python techniques to explore and analyze a series of data, create a series,.

Introduction to Data Science with Python

The Stages of Data Science

Data science algorithms are used in products such as internet search engines to deliver the best results for search queries in less time, in recommendation systems that use a user’s experience to generate recommendations, in digital advertisements, in education systems, in healthcare systems, and so on. Data scientists should have in-depth knowledge of programming tools such as Python, R, SAS, Hadoop.

Why Python?

Python is now at version 3.x, which was released in February 2011 after a long period of testing. Many of its key features are also reported in backwards compatible Python 2.6, 2.7 and 3.6.

Basic Features of Python

Extensible: Easily extensible by adding new modules implemented in a compiled language such as C or C++, which can be used to compile the code. Large Standard Library: Comes with a large standard library that supports many common programming tasks, such as connecting to web servers, searching text with regular expressions, and reading and editing files.

Python Learning Resources

See http://www.lulu.com/shop/ossama-embarak/agile-python-programming-applied-for-iedereen/paperback/product-23694020.html.). Python for You and Me is an accessible book with sections for Python syntax and key language constructs.

Python Environment and Editors

These resources are useful not only for Python beginners, but for any developer who wants to have a strong professional career in software. There is a Udacity course by one of the creators of Reddit that shows how to use Python to build a blog.

Portable Python Editors (No Installation Required)

WinPython: This is a free Python distribution for the Windows platform; it contains pre-built packages for ScientificPython. Anaconda: This is a completely free, enterprise-ready Python distribution for large-scale data processing, predictive analytics, and scientific computing.

Azure Notebooks

You can create folders and subfolders by selecting +New from the ribbon; then for the item type select Folder, as shown in Figure 1-3. Open the Created Hello World script by clicking it, and start writing your Python code, as shown in Figure 1-4.

Offline and Desktop Python Editors

The Basics of Python Programming

Basic Syntax

Reserved words are words that are already reserved by the Python language and cannot be redefined or declared by the user.

Lines and Indentation

Multiline Statements

Quotation Marks in Python

Multiple Statements on a Single Line

Read Data from Users

Declaring Variables and Assigning Values

Multiple Assigns

Variable Names and Keywords

Statements and Expressions

Basic Operators in Python

Arithmetic Operators

Relational Operators

Assign Operators

Logical Operators

This differs from other languages ​​such as C#, which is a compiled language that must process the entire program.

Python Comments

Conversion Types

In the previous example, you can see that %.2f is replaced by the value 172.16 with two decimals after the decimal point, while %2d is used only to display decimal values, but in a two-digit format. You can display values ​​read directly from a dictionary, as shown below, where %(name)s means to take the dictionary value of the Name key as a string and %(height).2f means to take it as a two-fraction float values, which are dictionary values ​​of key height:.

The Replacement Field, {}

Also, you can use a mixture of combined values ​​from lists, dictionaries, attributes, or even a singleton variable. In the following example, you will create a class called A(), which has a single variable called x that is assigned the value 9.

The Date and Time Module

2[test]} refers to index 2 in the print string and reads its value from the passed dictionary from the key test.

Time Module Methods

Python Calendar Module

Fundamental Python Programming Techniques

Selection Statements

In Listing 1-14, the if statement condition is false, and therefore the outermost print statement is the only statement executed. The nested if statement is an if statement that is the target of another if statement.

Iteration Statements

For example, if you specify 4, the loop statement starts at 1 and ends with 3, which is n-1. Causes the loop to skip the rest of its body and immediately retest the condition before repeating.

The Use of Break, Continues, and Pass Statements

In the previous examples, the first and second iterations used the for loop with an array statement. As shown, you can iterate over a list of letters, as shown in Listing 1-20, and you can iterate over the Python3 word and display all the letters.

String Processing

String Special Operators

String Slicing and Concatenation

String Conversions and Formatting Symbols

Loop Through String

You can also use iterations to count letters in a word or to count words in lines, as shown in Listing 1-25.

Python String Functions and Methods

Listing 1-26 shows how to use built-in methods to remove whitespace from a string, count specific letters within a string, check whether the string contains another string, and so on. Higher Technical Colleges Higher Technical Colleges Higher Technical Colleges Higher Technical University.

The in Operator

Parsing and Extracting Strings

Tabular data can be easily represented in Python using lists of tuples that represent the dataset's records in a data frame structure. Although simple to create, these types of representations typically do not allow for important manipulations of tabular data, such as efficient column selection, matrix mathematics, or spreadsheet-style operations.

Python Pandas Data Science Library

The Pandas library also provides rich data structures and functions designed to make working with structured data fast, easy, and expressive. Sometimes you need to import the Pandas package because the standard Python distribution does not come with the Pandas module.

A Pandas Series

A Pandas Data Frame

You can retrieve data from a data frame from index 1 to the end of rows.

A Pandas Panels

Python Lambdas and the Numpy Library

The map() Function

The filter() Function

The reduce () Function

Python Numpy Package

Data Cleaning and Manipulation Techniques

Abstraction of the Series and Data Frame

The data frame data structure is the main structure for data collection and processing in Python. A data frame is a two-dimensional array object, as shown in Figure 1-8, where there is an index and several columns of content, each with a label.

Running Basic Inferential Analyses

In other words, it is the average of the square of the difference between the values ​​in a data set and the average value. It gives you an idea of ​​the average value of the data in the data set and an indication of how widely distributed the values ​​are in the data set.

Summary

Exercises and Answers

Write a Python program to prompt users to enter a Celsius value; then convert Celsius to Fahrenheit, where T(°F) = T(°C) x 1.8 + 32. Write a program to prompt users to enter the speed of a car; then calculate the fines according to.

The Importance of Data Visualization in

Shifting from Input to Output

Why Is Data Visualization Important?

Figuring out the patterns, trends and correlations in the data being analyzed to determine where they need to improve their operational processes and thereby grow their business. Provide a more complete view of the data being analyzed – Organize and present large amounts of data intuitively.

Why Do Modern Businesses Need Data Visualization?

It allows managers to understand the correlations between business conditions and business performance. It helps companies discover the gray areas of the business and make the right decisions for improvement.

The Future of Data Visualization

Data visualization helps managers understand customer behavior and interests to retain customers and market share. Data visualization will be used extensively to analyze and visualize data streams collected from billions of interconnected devices.

How Data Visualization Is Used for Business Decision-Making

In this context, data visualization will improve security levels, increase operational efficiency, help better understand various global phenomena and improve and adapt the intercontinental services offered.

Faster Responses

Simplicity

Easier Pattern Visualization

Team Involvement

Unify Interpretation

Introducing Data Visualization Techniques

Loading Libraries

Similarly, you can install or upgrade packages or specific Python packages such as Matplotlib on Jupyter Notebooks, as shown in Listing 2-1. Once you load a library into your Python script, you can call its package functions and attributes.

Popular Libraries for Data Visualization in Python

Matplotlib

%matplotlib inline will lead to static graph images of your plot embedded in the notebook. There are many different plot formats generated by the Matplotlib package; some of these formats will be discussed in Chapter 7.

Seaborn

You can use the same data set, called Data, as in the previous example (see Figure 2-5).

Plotly

Importing and using the Plotly V library [67]: import plotly.graph_objs as go, import numpy as np. Running the Plotly Python script as shown in Listing 2-6 will open a web browser with the Plotly dynamic graph drawn as shown in Figure 2-12.

Geoplotlib

Pandas

Introducing Plots in Python

Data visualization is the process of interpreting data in pictorial or graphical form. Simplicity: Data visualization techniques give a complete picture of the parameters being measured and simplify the data by allowing decision makers to select the relevant data they need and drill down into details wherever necessary.

Data Collection Structures

Lists

Creating Lists

Accessing Values in Lists

Adding and Updating Lists

As shown in Listing 3-3, you can add a new element to the list using the append() method. You can also update an item in the list using the list name and item index.

Deleting List Elements

Basic List Operations

Indexing, Slicing, and Matrices

Built-in List Functions and Methods

List Functions

List Methods

List Sorting and Traversing

Lists and Strings

You must specify the delimiter that the join() method will append between the elements of the list to form a string.

Parsing Lines

Aliasing

Dictionaries

Dictionary values ​​can be repeated multiple times and the values ​​can be of any data type.

Creating Dictionaries

Updating and Accessing Values in Dictionaries

Listing 3-14 shows that you can create a function to calculate net salary after subtracting the 5 percent payroll tax value, repeating all the dictionary elements.

Deleting Dictionary Elements

Built-in Dictionary Functions

2 len(dict) Returns the total length of the dictionary, i.e. the number of items in the dictionary.

Built-in Dictionary Methods

Tuples

Creating Tuples

Listing 3-21 shows how to sort the elements of a tuple in place and how to create another sorted tuple. By default, the built-in sort function has detected that elements are tuples, so the function sorts tuples based on the first element and then on the second element.

Concatenating Tuples

Accessing Values in Tuples

You can access the tuple element back and forth; in addition, you can cut values ​​from a tuple with indices. Listing 3-25 shows that you can slice forward, where MarksCIS[1:4] retrieves elements from element 1 to element 3, while MarksCIS[:] retrieves all elements in the array.

Basic Tuples Operations

Series

Creating a Series with index

Creating a Series from a Dictionary

You can use the get method to access series values ​​per index label, as shown in Listing 3-31.

Creating a Series from a Scalar Value

Vectorized Operations and Label Alignment with Series

Name Attribute

Data Frames

Creating Data Frames from a Dict of Series or Dicts

Creating Data Frames from a Dict of Ndarrays/Lists

Creating Data Frames from a Structured or Record Array

Creating Data Frames from a List of Dicts

Creating Data Frames from a Dict of Tuples

Selecting, Adding, and Deleting Data Frame Columns

Also, if you insert a series that does not have the same index as the data frame, it will match the index of the data frame. To delete a column, you can use the del or pop method, as shown in Listing 3-41.

Assigning New Columns in Method Chains

Indexing and Selecting Data Frames

Transposing a Data Frame

Data Frame Interoperability with Numpy Functions

Panels

Creating a Panel from a 3D Ndarray

Creating a Panel from a Dict of Data Frame Objects

Selecting, Adding, and Deleting Items

How to maintain a collection of data in different forms – How to create lists and how to manipulate the contents of a list – What a dictionary is and the purpose of creating a dic-. How to create a series from other data collection forms – How to create data frames from different data collections.

File I/O Processing and Regular

Expressions

File I/O Processing

Data Input and Output

Referensi

Dokumen terkait

a) Berdasarkan asumsi bahwa PCA hanya dapat mengatasi masalah hubungan antara setiap data yang linier, sedangkan kenyataannya dalam dalam setiap situasi hubungan data ada yang

This paper focuses on analyzing and segmenting agricultural data and determining optimal parameters to maximize crop yield using data mining techniques such

Object : An object is made up of data members including class variables and instance variables as well as methods that create a unique instance of a data structure defined by

 Course Learning Outcome: at the end of the course, student will be able to do: CLO1 Able to possess the basic knowledge of Weka and Python concerning data mining and machine learning

Wes McKinney 2022, "Python for Data Analysis: Data Wrangling with pandas, NumPy, and Jupyter", 3rd Edition, O'Reilly Media... 132 Wes McKinney 2022, "Python for Data Analysis: Data

CONCLUSION The Fourier series can analyze electrical circuits to generate rms voltage and voltage data where odd harmonics are large, then the voltage 𝑉 and RMS voltage 𝑉𝑅𝑀𝑆 are

In This Lecture  Learn the motivation and main idea of doubly linked list  Learn the Stack and Queue data structure  Learn the Dictionary data structure... Limitation of Linked