Biochemical Data Analysis with Spreadsheet Application

(alphanumeric) arranged in rows and columns used to display, manipulate, and analyze data(Atkinson et al., 1987; Diamond and Hanratty, 1997). Microsoft Excel

20 BIOCHEMICAL DATA: ANALYSIS AND MANAGEMENT

(http://www.microsoft.com/) is a spreadsheet software package that allows you to do the following:

·

Manipulate data through built-in or user-deﬁned mathematical functions.

·

Interpret data through graphical displays or statistical analysis.

·

Create tables of numeric, text, and other formats.

·

Graph tabulated input in various formats.

·

Perform various curve-ﬁtting procedures, either a built-in or a user-deﬁned add-on Solver.

·

Write user-deﬁned macros for applications routines to automate or enhance a spreadsheet for a particular purpose.

The user is referred to the Microsoft Excel User’s Guide or online Help for information.

Launching Microsoft Excel brings you into the workspace of Excel and a new workbook. A workbook is a collection of related spreadsheets organized in rows (number headings) and columns (letter headings). The intersection of a row and a column is a cell that is addressed by row and column headings. To select a group of cells, click on the ﬁrst cell and drag down to the last cell. This collection of cells is known as a range. Inserting cells, rows, and columns is done through either the Insert or shortcut Edit menu. Excel inserts the new column to the left of the highlighted column. New rows are inserted above the highlighted row. At the bottom of the spreadsheets are sheet tabs representing the worksheets of the workbook.

Click on these tabs one at a time to move from one sheet to another within the same workbook. The Excel commands are either in the form of a drop-down menu or in the form of icon buttons grouped as toolbars.

2.2.1. Mathematical Operations

There are three types of data that can be entered in the cells of the spreadsheet;

number, date/time, and text. For multiple entries of serial number with constant increment, enter the initial value into the first cell, highlight the cells, and select Edit; Fill ; Series. To fill multiple entries of the same value into a range, enter the value into the first cell, move the pointer to the right-hand corner of the active cell to activate fill handle (a bold crosshair), and drag it through the range. Data are edited with the usual cut, copy, and paste operations in the Edit menu and the decimal places of scientific numbers is controlled via dialog box in Format; Cells.

The fundamental operation of a spreadsheet is performing calculations on data.

Excel performs mathematical operations through formula and functions. Formulas are written by using the formula bar and beginning with an equal sign (: ).

Functions can be either user-deﬁned or built-in Excel functions. Because the formulas and functions are acting on cells of the spreadsheet, the variables used therein are the cell references of those cells(Figure 2.1). This is where an understand-ing of the difference between relative and absolute(preceded by a $ sign) references is important. Excel’s built-in functions are accessed through the Function Wizard tool( fV) found on the standard toolbar. To use a wizard, just follow the instructions

BIOCHEMICAL DATA ANALYSIS WITH SPREADSHEET APPLICATION 21

Figure 1.1. Mathematical operation with Excel. Glycine ionizes according to

>H₃NsC

H O

CsOH_ >H₃NsC

H O

CsO\_H₂NsC

H O CsO\

G> G G\

where [G>]:1/1;10&\[1;10&\ ], [G]:10&\ [G>], and [G\]:10&\ [G]. The calculation of ionic species of glycine (pK

:2.4 and pK

:9.7) at different pH values using Excel is illustrated with formula input and calculated values output.

in the dialog boxes, moving through them one step at a time. Excel has many mathematical, statistical, and scientiﬁc functions. These have the general syn-tex::Function Name (arguments). For example, to calculate the mean and standard deviation, type Mean: in B1 and S.D: in C1, select B2 and C2, and enter the formulas :average(range) and :stdev(range). The calculated Mean and Standard Deviation are placed in the cells B2 and C2, respectively.

2.2.2. Statistical Functions

Statistical functions are selected from two menus within Excel. The ﬁrst approach is through the Function Wizard. Click the Function Wizard, fV to activate the Function Wizard dialog box; select Statistical under the Function Category list and

22 BIOCHEMICAL DATA: ANALYSIS AND MANAGEMENT

select Statistical Functions under the Function Name(refer to Help for the deﬁnition and usage of a function). The second approach is from Tools; Data Analysis. If Data Analysis is not present when the Tools command is selected, this means that the Analysis ToolPak was not loaded during Excel installation. The Analysis ToolPak can be loaded from Tools; Add-Ins.

To perform F test or t test:

·

Enter the data into a workbook.

·

Select Tools in the menu bar and select Data Analysis.

·

From the box, select F-Test or t-Test Two-Sample for Variances and click OK.

·

Enter variable 1 and variable 2(in absolute format, e.g., $C$4:$C$10).

·

Under the output options, select New Worksheet Ply to report the results on a new worksheet.

A report is generated, which contains all the required information to interpret the data.

To perform ANOVA:

·

Enter the data into a workbook.

·

Select Tools in the menu bar and select Data Analysis.

·

Select ANOVA: Single Factor from the list of options to bring up the dialog box.

·

Enter the input range covering the entire data set(in absolute format, e.g.,

$B$2:$D$6)

·

Check whether the data are group in columns or rows.

·

Alpha can be set(default is 0.05).

·

Select the output range (select New Worksheet Ply to report on a new worksheet).

·

Click on the OK button.

The ANOVA report as shown in Figure 2.2 is generated. The comparison between the F-test result (F) and the critical value (F ) provide a decision concerning the similarity/difference of the groups.

2.2.3. Regression Analysis

Traditional approaches to experimental data processing are largely based on linearization and/or graphical methods. However, this can lead to problems where the model describing the data is inherently nonlinear or where the linearization process introduces data distortion. In this case, nonlinear curve-ﬁtting techniques for experimental data should be applied.

Excel provides some built-in tools for ﬁtting models to data sets. By far the most common routine method for experimental data analysis is linear regression, from which the best-ﬁt model is obtained by minimizing the least-squares error between the y-test data and an array of predicted y data calculated according to a linear

BIOCHEMICAL DATA ANALYSIS WITH SPREADSHEET APPLICATION 23

Figure 2.2. ANOVA (single factor) with Excel. Output of ANOVA data analysis of NAD(H) assays of 15 samples from each of two tissues using Excel is shown. The difference in NAD(H) content of the two tissues are indicated by a small P value and FF_crit(reject H₀).

equation with common x values. Linear regression can be accessed in Excel by LINEST function or via Data Analysis tool.

LINEST allows for detailed multiple linear regression:

·

Select a blank sheet and enter the data in arrays.

·

Activate a free cell.

·

Click the Function Wizard icon and select Statistical and LINEST.

·

Enter the array for the known —y’s, for example, (A1:A10), and known —x’s, for example, (B1:B10), (C1:C10), (D1:D10), but leave the const and stats boxes blank. Click OK.

The LINEST result is returned at the assigned (activated) free cell. To display the Slope and Intercept:

·

Enter LINESTm,b in the new cell.

·

Highlight the range of cells corresponding to 2; n matrix, where n is the number of parameters, x. Select LINEST from the Function Wizard and enter the input as before.

·

Do not press Return, but instead click the mouse in the formula bar and place the cursor at the end of the entry.

·

^{Press CTRL}; SHIFT ; ENTER. The values of the slope (coefﬁcients) and the intercept have been entered into the highlighted cell range.

24 BIOCHEMICAL DATA: ANALYSIS AND MANAGEMENT

Figure 2.3. Linear regression analysis with Excel. Simple linear regression analysis is performed with Excel using Tools;Data Analysis;Regression. The output is reor-ganized to show regression statistics, ANOVA residual plot and line fit plot (standard error in coefficients and a listing of the residues are not shown here).

The detailed linear regression analysis is obtained via the Tools menu as follows:

·

Select Tool;Data Analysis;Regression. This opens the Regression dialog box.

·

Enter the cell ranges for the array of y and x values using absolute format — for example, $A$2:$A$17 in the appropriate boxes.

·

Check Conﬁdence Level(e.g., 95%).

·

Enter the output range where you want the regression analysis report to be copied(check New Worksheet Ply for reporting on a new worksheet).

·

Check the options to display the plots(e.g., Residual plots and Line Fit plots) and click OK.

The detailed statistical report (Figure 2.3) includes the slope and intercept coeffi-cients for the ‘‘best-fit’’ line, the standard error in these coefficoeffi-cients, and a listing of the residues. The goodness of fit is evaluated from the high correlation coefficient, R (R : 1.00 for a perfect fit), and residuals evenly scattered about zero along the entire range.

Perhaps the most common situation involving graphing scientiﬁc data is to generate a linear regression plot with y error bars. In most situations, the error in the x data is regarded as being so much smaller than that of the y data that it can

BIOCHEMICAL DATA ANALYSIS WITH SPREADSHEET APPLICATION 25

Figure 2.4. SPSS Home page.

be effectively ignored. Excel allows several methods for generating error bars where custom error by which the experimental standard deviation of several estimates of a value can be used. This is accomplished by obtaining the mean values,:AVERAGE

, and the standard deviations, :STADEV , for experimental data. Generate the regression line for the mean values and add error bar via Insert;Error Bars, given by the standard deviations.

2.2.4. Use of Statistical Packages

A number of custom-made statistical software packages are available commercially.

The student is urged to be familiar with at least one of them because of their versatility and efﬁciency.

SPSS. The statistical analysis software, SPSS(http://www.spsscience.com), is a common statistical package available in most of university networks, and the student could learn its use by following the tutorial session of the Spss program. Open and click Spsswin.exe to start the SPSS program(Figure 2.4). Go to Help and select the Spss Tutorial. From the Main menu page, follow the session. The biochemical data for statistical analysis can be entered directly by starting File, and then choose New and Data. However, it can be advantageous for the student to prepare data files with Excel (filename.xls) in advance. In this case, start File and then choose Open and Data. Click File type to select Excel (filename.xls). From the menu bar, select Statistics to initiate the data analysis.

SyStat. SyStat is a stand-alone statistical package of SPSS Inc. (http://

www.spsscience.com), that performs comprehensive statistical analysis. The user’s

26 BIOCHEMICAL DATA: ANALYSIS AND MANAGEMENT

Figure 2.5. Linear regression analysis with Systat. The front window shows the input data for multiple linear regression analysis and the back window shows the statistical results.

guide, SYSTAT Getting Started, should be consulted. There are three types of windows (main, data and graph windows), each with its own menus. The main window displays prints and saves results from statistical analyses. The data window (opened by the Data command of the File menu from the main window) allows entering, editing, and viewing of data. The graph window(opened from Graph menu or by double-clicking a graph button from the main window) provides facilities for editing graphs. The new data can be entered directly (via File;New;Data) or imported from various spreadsheets (via File;Open;Data). Selecting Microsoft Excel and entering filename.xls imports the Excel data file. To define variable names, double click the variable heading to bring up the variable properties box (for a numeric data type, the string variable ends with $). A statistical analysis is initiated by selecting the analytical options from Statistic menu of the data window(Figure 2.5). The analytical results are displayed in the output pan by selecting it from the organization pan of the main window. The data file is saved from the data window as filename.syd, and the analysis results(outputs) are saved from the main window as filename.syo.

The online statistical calculations can be performed at http://members.aol.com/

johnp71/javastat.html. To carry out linear regression analysis as an example, select

‘‘Regression, correlation, least squares curve-ﬁtting, nonparametric correlation,’’ and then select any one of the methods(e.g., Least squares regression line, Least squares straight line). Enter number of data points to be analyzed, then data, xG and yG. Click the Calculate Now button. The analytical results, a(intercept), b (slope), f (degrees of freedom), and r (correlation coefﬁcient) are returned.

BIOCHEMICAL DATA ANALYSIS WITH SPREADSHEET APPLICATION 27

Figure 2.6. Relationship among components of databases.

2.3. BIOCHEMICAL DATA MANAGEMENT WITH DATABASE PROGRAM

Dalam dokumen AN INTRODUCTION TO COMPUTATIONAL BIOCHEMISTRY (Halaman 30-38)