• Tidak ada hasil yang ditemukan

Relational Databases: MySQL and PostgreSQL

Dalam dokumen Buku Data Wrangling with Python (Halaman 159-162)

Relational databases are great for data coming from a variety of sources with varying levels of interconnectedness. Relational data exemplifies its name: if your data has Databases: A Brief Introduction | 141

connections similar to a family tree, a relational database like MySQL will likely work well for you.

Relational data usually uses a series of unique identifiers to actively match datasets. In SQL, we normally call them IDs. These IDs can be used by other sets of data to find and match connections. From these connected datasets, we can make what we call joins, which allow us to access connected data from many different datasets at once.

Let’s look at an example.

I have a really awesome friend. Her name is Meghan. She has black hair and works at The New York Times. In her spare time, she likes to go dancing, cook food, and teach people how to code. If I had a database of my friends and used SQL to represent their attributes, I might break it down like so:

**friend_table:

friend_id friend_name

friend_date_of_birth friend_current_location friend_birthplace friend_occupation_id

**friend_occupation_table:

friend_occupation_id friend_occupation_name friend_occupation_location

**friends_and_hobbies_table:

friend_id hobby_id

**hobby_details_table:

hobby_id hobby_name

hobby_level_of_awesome

In my database of friends, each of these sections (marked with **) would be tables. In relational databasing, tables usually hold information about a specific topic or object.

Each of the pieces of information a table holds are called fields. In this case, the friend_id field holds a unique ID for each friend in my friend_table.

With my database, I can ask: What are Meghan’s hobbies? To access the information, I would say to the database, “Hey, I’m looking for my friend Meghan. She lives in New York and here is her birthday; can you tell me her ID?” My SQL database will respond to this query with her friend_id. I can then ask my friend_and_hobbies_table

(which properly matches hobby IDs with the friend IDs) what hobbies match up with this friend ID and it will respond with a list of three new hobby IDs.

Because these IDs are numbers, I want to learn more about what they mean. I ask the hobby_details_table, “Can you tell me more about these hobby IDs?” and it says,

“Sure! One is dancing, one is cooking food, and one is teaching people how to code.”

Aha! I have solved the riddle, using just an initial friend description.

Setting up and getting data into a relational database can involve many steps, but if your datasets are complex with many different relationships, it shouldn’t take more than a few steps to figure out how to join them and get the information you desire.

When building relational databases, spend time mapping out the relations and their attributes, similar to what we did with the friend database. What are the different types of data, and how can they be mapped to one another?

In relational database schema, we figure out how we want to match data by thinking about how we will most often use the data. You want the queries you ask the database to be easy to answer. Because we thought we might use occupation to help identify a friend, we put the occupation_id in the friend-table.

Another thing to note is there are several different kinds of relationships. For exam‐

ple, I can have many friends with cooking as a hobby. This is what we call a many-to- many relationship. If we were to add a table such as pets, that would add a different kind of relationship—a many-to-one. This is because a few of my friends have more than one pet, but each pet belongs to only one friend. I could look up all the pets of a friend by using their friend_id.

If learning more about SQL and relational databases interests you, we recommend taking a longer look at SQL. Learn SQL The Hard Way and SQLZOO are great first places to start. There are some slight differences in syntax between PostgreSQL and MySQL, but they both follow the same basics, and learning one over the other is a matter of personal choice.

MySQL and Python

If you are familiar with (or learning) MySQL and you’d like to use a MySQL database, there are Python bindings to easily connect. You will need to perform two steps. First, you must install a MySQL driver. Then, you should use Python to send authentication information (user, password, host, database name). There is a great Stack Overflow write-up covering both.

PostgreSQL and Python

If you are familiar with (or learning) PostgreSQL and you’d like to use a PostgreSQL database, there are Python bindings for PostgreSQL, too. You will also need to per‐

form two steps: installing a driver and then connecting with Python.

Databases: A Brief Introduction | 143

1For more reading on database migration between SQL and NoSQL databases, check out Matt Asay’s writeup on migrating Foursquare to a NoSQL database from a relational database. Additionally, there are some Quora writeups covering migration in the opposite direction.

There are many PostgreSQL drivers for Python, but the most popular one is Psycopg.

Psycopg’s installation page has details about getting it running on your machine and there is a lengthy introduction on how to use it with Python on the PostgreSQL site.

Dalam dokumen Buku Data Wrangling with Python (Halaman 159-162)