Creating bubble charts - Java for Data Science

Bubble charts are similar to scatter plots except they represent data with three dimensions. The first two dimensions are expressed on the X and Y axes and the third is represented by the size of the point plotted. This can be helpful in determining relationships between data values.

We will again use the ^DataTable class to initially hold the data to be displayed. In this example, we will read data from a sample file called MarriageByYears.csv. This is also a CSV file, and contains one column representing the year a marriage occurred, a second column holding the age at which a person was married, and a third column holding integers representing marital

satisfaction on a scale from ¹ (least satisfied) to ¹⁰ (most satisfied). We create a ^DataSeries to represent our type of desired data plot and then create a ^XYPlot object:

DataReader readType =

DataReaderFactory.getInstance().get("text/csv");

String fileName = "C://MarriageByYears.csv";

try {

DataTable bubbleData = (DataTable) readType.read(

new FileInputStream(fileName), Integer.class, Integer.class, Integer.class);

DataSeries bubbleSeries = new DataSeries("Bubble", bubbleData);

XYPlot testPlot = new XYPlot(bubbleSeries);

Next, we set basic property information about our chart. We will set the color and turn off the vertical and horizontal grids in this example. We will also make our X and Y axes invisible in this example. Notice that we still set a range for the axes, even though they are not displayed:

testPlot.setInsets(new Insets2D.Double(30.0)); testPlot.setBackground(new Color(0.75f, 0.75f, 0.75f));

XYPlotArea2D areaProp = (XYPlotArea2D) testPlot.getPlotArea();

areaProp.setBorderColor(null);

areaProp.setMajorGridX(false);

areaProp.setMajorGridY(false);

areaProp.setClippingArea(null);

testPlot.getAxisRenderer(XYPlot.AXIS_X).setShapeVisible(false);

testPlot.getAxisRenderer(XYPlot.AXIS_X).setTicksVisible(false);

testPlot.getAxisRenderer(XYPlot.AXIS_Y).setShapeVisible(false);

testPlot.getAxisRenderer(XYPlot.AXIS_Y).setTicksVisible(false);

testPlot.getAxis(XYPlot.AXIS_X).setRange(1940, 2020);

testPlot.getAxis(XYPlot.AXIS_Y).setRange(17, 30);

We can also set properties related to the bubbles drawn on the chart. Here, we set the color and shape, and specify which column of the data will be used to scale the shapes. In this case, the third column, with the marital satisfaction rating, will be used. We set it using the setColumn

method:

Color color = GraphicsUtils.deriveWithAlpha(Color.black, 96);

SizeablePointRenderer renderBubble = new SizeablePointRenderer();

renderBubble.setShape(new Ellipse2D.Double(-3.5, -3.5, 4.0, 4.0));

renderBubble.setColor(color);

renderBubble.setColumn(2);

testPlot.setPointRenderers(bubbleSeries, renderBubble);

Finally, we create our panel and set its size:

add(new InteractivePanel(testPlot), BorderLayout.CENTER);

setSize(new Dimension(1500, 700));

setVisible(true);

When the application is executed, the following graph is displayed. Notice both the size and color of the points changes depending upon the frequency of that particular data point:

Summary

In this chapter, we introduce basic graphs, plots, and charts used to visualize data. The process of visualization enables an analyst to graphically examine the data under review. This is more

intuitive, and often facilitates the rapid identification of anomalies in the data that can be hard to extract from the raw data.

Several visual representations were examined, including line charts, a variety of bar charts, pie charts, scatterplots, histograms, donut charts, and bubble charts. Each of these graphical

depictions of data provides a different perspective of the data being analyzed. The most

appropriate technique depends on the nature of the data being used. While we have not covered all of the possible graphical techniques, this sample provides a good overview of what is available.

We were also concerned with how Java is used to draw these graphics. Many of the examples used JavaFX. This is a readily available tool that is bundled with Java SE. However, there are several other libraries available. We used GRAL to illustrate how to generate some graphs.

With the overview of visualization techniques, we are ready to move on to other topics, where visualization will be used to better convey the essence of data science techniques. In the next chapter, we will introduce basic statistical processes, including linear regression, and we will use the techniques introduced in this chapter.

Chapter 5. Statistical Data Analysis Techniques

The intent of this chapter is not to make the reader an expert in statistical techniques. Rather, it is to familiarize the reader with the basic statistical techniques in use and demonstrate how Java can support statistical analysis. While there are quite a variety of data analysis techniques, in this chapter, we will focus on the more common tasks.

These techniques range from the relatively simple mean calculation to sophisticated regression analysis models. Statistical analysis can be a very complicated process and requires significant study to be conducted properly. We will start with an introduction to basic statistical analysis techniques, including calculating the mean, median, mode, and standard deviation of a dataset.

There are numerous approaches used to calculate these values, which we will demonstrate using standard Java and third-party APIs. We will also briefly discuss sample size and hypothesis testing.

Regression analysis is an important technique for analyzing data. The technique creates a line that tries to match the dataset. The equation representing the line can be used to predict future

behavior. There are several types of regression analysis. In this chapter, we will focus on simple linear regression and multiple regression. With simple linear regression, a single factor such as age is used to predict some behavior such as the likelihood of eating out. With multiple

regression, multiple factors such as age, income level, and marital status may be used to predict how often a person eats out.

Predictive analytics, or analysis, is concerned with predicting future events. Many of the

techniques used in this book are concerned with making predictions. Specifically, the regression analysis part of this chapter predicts future behavior.

Before we see how Java supports regression analysis, we need to discuss basic statistical techniques. We begin with mean, mode, and median.

In this chapter, we will cover the following topics:

Working with mean, mode, and median

Standard deviation and sample size determination Hypothesis testing

Regression analysis

Dalam dokumen Java for Data Science (Halaman 172-176)