• Tidak ada hasil yang ditemukan

A Service Oriented Framework for Running (2)

N/A
N/A
Protected

Academic year: 2018

Membagikan "A Service Oriented Framework for Running (2)"

Copied!
6
0
0

Teks penuh

(1)

gramming interfaces, defined service specifications, schemas, and configu-ration files. The framework can be instantiated to submit specific quantum mechanical simulation (e.g., CASTEP) jobs to a Grid from a Web browser with the required tasks managed and coordinated by the workflow without human interaction. The paper details analysis, design, and implementation of a prototype framework. In the test case, the prototype framework is in-stantiated to submit a CASTEP simulation job to available Grid resources in order to calculate the equation of state of a material.

Index Terms—e-Science, Grid computing, service-oriented architecture (SOA), software framework, Web service, workflow.

I. INTRODUCTION

Currently, running a quantum mechanical simulation (e.g., CASTEP [1] and SIESTA [2]) of material properties on remote computational resources typically involves procedures such as the following.

1) Create the necessary simulation input files.

2) Copy these input files and simulation code to the remote compu-tational resource.

3) Log into the resource and submit the simulation job.

4) Wait for the job to finish, and once the job finishes, copy back the simulation output files to the local machines.

This approach works successfully, but it does have disadvantages. First, the approach involves many human interactions. Second, in order to submit job(s) to remote resources, some Grid software has to be installed (e.g., Globus Toolkit [3]) on a local machine, or the user must log into a machine where such Grid software has been installed.

Manuscript received March 22, 2009; revised May 31, 2009, August 20, 2009, and November 6, 2009; accepted January 8, 2010. Date of publication March 8, 2010; date of current version June 16, 2010. This paper was recommended by Associate Editor A. M. Tjoa.

X. Yang was with the Department of Earth Sciences and Cambridge e-Science Centre, University of Cambridge, Cambridge, CB2 3EQ, U.K. He is now with School of Electronics and Computer Science, University of Southampton, Southampton, SO17 1BJ, U.K. (e-mail: kev.x.yang@gmail.com).

R. P. Bruin was with the Department of Earth Sciences and Cambridge e-Science Centre, University of Cambridge, Cambridge, CB2 3EQ, U.K. He is now with Knowledge Integration Ltd., Sheffield, S3 8PZ, U.K.

M. T. Dove is with the Department of Earth Sciences, University of Cam-bridge, CamCam-bridge, CB2 3EQ, U.K.

A. Walkingshaw was with the Department of Chemistry, University of Cam-bridge, CamCam-bridge, CB2 1EW, U.K.

T. V. Mortimer-Jones was with the Science and Technology Facilities Coun-cil, Daresbury Laboratory, Warrington, WA4 4AD, U.K.

R. Sinclair is with the Science and Technology Facilities Council, Rutherford Appleton Laboratory, Oxford, OX11 0QX, U.K.

D. J. Wilson was with Johann Wolfgang Goethe University, 60325 Frankfurt, Germany.

V. Milman is with Accelrys Ltd., Cambridge, CB4 0WN, U.K. T. Donovan is with IBM, Winchester, SO21 2JN, U.K.

Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org.

Digital Object Identifier 10.1109/TSMCC.2010.2040826

direct human control. Our use case is that of performing simulations of materials properties from an atomic-scale model using quantum mechanical methods.

The framework developed consists of the following.

1) A Grid portal [7] that can initiate a quantum mechanical sim-ulation run in Grid from just a Web browser without any Grid software downloaded and installed, and provide a unified inter-face for end users to access distributed resources.

2) A workflow system that can facilitate the modeling of workflow and automate the simulation process without human interactions. 3) A set of interface specifications for the Web services required for running simulations over Grids (e.g., job submission and moni-toring service). The interface specification defines the operation, input, and output of each required service.

4) A set of class/libraries and application programming interfaces (APIs).

5) Data repositories that include a file system and a material prop-erty database.

6) Other components that include schemas (both XML schema and database schema), configuration files, and scientific code. The paper is structured as follows. Section II details the analy-sis, design, and implementation of the framework. In Section III, the framework developed is instantiated to submit a CASTEP quantum mechanical simulation job to a Grid. Section IV presents discussions. Section V reviews the related work.

II. FRAMEWORKDEVELOPMENT

The software framework contains portal and workflow systems, class library and APIs, service interface specifications, schemas, configura-tion files, scientific code, etc., as shown in Fig. 1.

(2)

Fig. 1. Reference architecture of service-oriented framework. The front end provides user interface to collect user inputs. The workflow layer consists of Pipeline Pilot as workflow engine and template workflows (e.g., equation of state workflow) developed by integrating Web services from Web service layer. The Web service layer consists of Web services that handle tasks required by the simulation (e.g., job submission/monitoring service, data processing service) and error recovery. RMCS submits the job(s) to computation resources and cml2sql extracts data from CML output. CML dictionaries, configuration files, schemas, and backend codes are used by components from these layers.

implementation of job submission and monitoring service within the framework. RMCS is a Web service job submission tool, which extends the two-tier client/Grid model ofmy_condor_submit[14] to the three-tier model of client, server, and Grid [13].cml2sqlis a parser written in Perl, which can manage the extraction of data from the CML output files and storage of the data into the database.

In this section, the major elements of the framework are described, and some of the challenges presented are discussed.

A. Job Setup: Task Definition and Simulation Input

Running a quantum mechanical simulation requires the creation of simulation input files. There are several approaches to create these files. For example: 1) using pre-existing simulation input files (e.g., .cell file and .param file for CASTEP), or uploading crystallographic informa-tion file (CIF). CIF is a standard text file format for representing crystal-lographic information [15], which can be used to create simulation in-put, or 2) requesting values for required parameters from the user, which can then be combined to create the required files. This is acceptable for the simplest materials, but soon becomes unmanageable. Even the rela-tively simple material KAlSi2O6requires the user to specify 40 param-eters just for the atomic coordinates. This leads to the third method in which the user provides a smaller set of atomic information and the ma-terial’s space group. These pieces of information can then be combined

Fig. 2. Example of Tasklist.xml—the task definition file.

to calculate the complete set of information required in order to simulate the material. This, in turn, can be used to create the required input files. However, these approaches still do not directly meet several im-portant requirements. 1) The set of parameters required will vary de-pending on the type of calculation being performed. For example, the task of calculating the dielectric properties of a material require several parameters that can be ignored in the task of calculating the equation of state (EoS) for the same material. This means the input boxes to these questions that user must answer must be dynamically generated according to each task. 2) The framework should support different sim-ulation codes (e.g., CASTEP and SIESTA), which will require totally different input parameters.

In order to accommodate these needs, we developed an ontology presented as a CML dictionary and an associated task definition file. The task definition file is an XML document that specifies a list of tasks along with their input and output requirements. Part of the task definition file for running CASTEP simulations can be seen in Fig. 2. Each task element within the task definition file can contain a number of<inFile>elements, which are used to divide the information that

the user gives into the relevant files that are required by the simulation code to be used. Thenameattribute is a human readable label to be given to the question when posed to the user.Levelattribute is used to adjust complexity based on the understanding of the user. Thedefault

attribute allows the specification of default answers for questions. The final attribute,instanceOf, specifies the entry in the CML dictionary that relates to this parameter.

Each parameter has an entry in the CML dictionary for the code in question, which defines the data type of the parameter (e.g., integer and matrix) and any bounds or enumeration of possible values. By using the task definition file and CML dictionary, one single job submission portlet can dynamically generate simulation input files for different simulation tasks, and can also facilitate the different simulation codes (e.g., CASTEP and SIESTA) by providing a simulation-code-specific CML dictionary and modifying the associated task definition file.

(3)

Fig. 3. Screenshot of the structure builder interface. A user can select the space group symbol whose value is automatically completed. The question boxes for cell content can dynamically grow by clicking the “More” button.

JavaScript,DOM, andXHTML.This JavaScript library can generate question boxes on demand for cell contents. For the capture of user input entered on the fly, the portlet specification JSR-168 [16] provides a method, namely,getParameterValues(String arg0), which can return an array of values for this purpose. However, using this API to capture user input requires that the user clicks aSubmitbutton, which takes users to the next page. This breaks the consistency of cell structure in-put and separates the structure visualization from the data inin-put, which is not user friendly. In order to address this problem, the AJAX tech-nologies have been employed, so that we can capture the user input and pass the acquired values to the structure builder on the fly without going to the next page.

Another major challenge for manual input is that the specification of the space group symbol is error prone, which result in the structure builder could not work properly. In order to tackle this problem, we developed a functionality that can automatically complete the value as the user starts its specification. This is done by using a JQuery [17] library, which makes sure that the input of space group symbol is always valid. This facility can also control the enabling and disabling of input boxes that are varied with different space group symbols. A screenshot of the structure builder interface is shown in Fig. 3.

B. Creation of Workflow: Service Integration

In order to automate the whole process without direct human in-teraction, the required tasks for the simulation are wrapped as Web services, and these Web services are orchestrated by employing the workflow technology. Considering the requirements of quantum me-chanical simulation, the following key Web services are identified and associated interfaces are defined.

1) Metascheduling service. 2) Create output file list service.

3) Job Submission and monitoring service. 4) Property calculator service.

5) Store-to-DB service. 6) Notification service.

A workflow combining these services as appropriate is created based on the Web Services Description Language (WSDL) interfaces defined. The workflow engine is then responsible for the management and co-ordination of these services.

A workflow system, namely, Pipeline Pilot, is employed in the frame-work. Pipeline Pilot provides powerful scripting capabilities to enable

by taking an array of values obtained inmetascheduling service

as input.

If job submission fails, an error message will be sent to the client and the workflow will terminate. If the job submission succeeds, a job ID is captured and the workflow invokes thejob monitoringservice by taking the job ID as input. During the job monitoring, if the job is completed, the workflow goes to the next stage (as labeled in No. 2) to invokeproperty calculatorservice andstore-to-DBservice. If the job is not completed, the workflow then checks whether the job is still running. If it is still running, the workflow sleeps for 60 s, and then invokes thejob monitoringservice again until the job is completed.

C. Workflow Monitoring

Once the workflow is enacted, the next important step is to monitor its status. Pipeline Pilot does provide an administrative application to monitor the status of workflow, and one possible solution is to integrate this into the portal using single sign on functionality. However, the main disadvantage of the approach is that it is difficult to do any customized monitoring of workflow (e.g., monitoring the status of an actual simulation). In order to monitor the workflow and simulation status, a portlet has been developed. This portlet allows user to monitor the status of their last ten submissions. The development of workflow monitoring is described as follows.

1) In order to retrieve the workflow status at any time point, the Pipeline Pilot workflow is enacted in an asynchronous mode, where a client program can retrieve workflow status at any fre-quency or not at all, or a client program can shut down and test for status later, perhaps from another client computer.

2) The workflow-related information (e.g., workflow status) can be easily obtained by Pipeline Pilot, but the acquisition of simulation-associated information is not that straightforward. Pipeline Pilot provides a methodSendMessageToClient() that sends the simulation status to the client. But it can only send 30 characters to the client. In order to capture more information from the simulation, we divide the simulation information into dynamic information (e.g., simulation job status such as “RUN-NING” and “FINISHED”) and static information. Dynamic in-formation keeps changing during the lifecycle of the workflow, while static information is constant. We then capture dynamic information usingSendMessageToClient(). Both workflow in-formation and simulation inin-formation are also logged for tracing the failure and process analysis.

(4)

Fig. 4. Geometry optimization workflow (v1.0). This workflow contains two parts that are pipelined sequentially: the first part mainly handles the CASTEP simulation run, while the second part handles postsimulation operations, which include property calculation, store-to-DB, and notification.

is no parameter sweep, the job ID is 1–1 relation to simulation ID. If there is a parameter sweep, the job ID is 1-many relation to simulation IDs. In this paper, the job actually refers to the workflow; hence, the workflow ID is 1–1 relation to job ID. Con-sidering this, two database tables are created: one is for storing simulation data whereSimulation IDis used as a primary key, and the other is for storing workflow/job data whereWorkflow IDis used as a primary key.

D. Flexible Chemical Formula Search

As one of the requirements, chemical formula search should allow users to enter chemical formulas with a fair degree of flexibility rather than just to enter chemical formulas in a rigid format. In many cases, the user may not know the exact chemical composition of the materials that they are interested in. Furthermore, scientists often write the same ma-terial’s structure using different formulas. For example, “CaO2H2” is written as “Ca(OH)2,” “FeO2H” as “FeOOH,” and “CaAl2Si2O10H4,” a mineral lawsonite, is written as “CaAl2Si2O7(OH)2•(H2O).”1

How-ever, this flexibility in chemical formula search is not very common. For example, search the inorganic crystal structure database (ICSD), which is the world’s most extensive database on inorganic crystal struc-tures [18], cannot be searched with such flexibility. In order to address this need, we have developed a search functionality that allows this flexibility in formula entry.

The user can enter either a whole or partial chemical formula to return information on materials that may be of interest. For instance, a user interested in the properties of Quartz would enter “SiO2” into the search box, and would receive back data about materials that con-tain one silicon atom and two oxygen atoms, plus any atoms of other elements, in its asymmetric unit cell. Different ways of writing the chemical formula are allowed—for example, “FeOOH” and “FeO2H,” which result in the same search being performed with all data related to hydrated iron oxide that are of interest; users can use “∗” as a wildcard

to specify that they are interested in materials with any number of that element.

III. EXAMPLE OFUSE

An example use of the service-oriented framework prototype devel-oped is the submission of several CASTEP simulations to the National

1Mineralogists do differentiate between oxygen on its own, as part of an OH group and as part of H2O groups, leading to this flexibility.

Fig. 5. Cell structure of GaN can be viewed in Jmol by loading the CML cell structure information file created by the portal of the instantiated framework.

Grid Service (NGS) [19] to calculate the EoS of a material GaN. EoS is a relation between state variables (e.g., volume, pressure) [20]. It involves a parameter sweep that requires running many independent simulations in parallel. The test approach is generally described as follows. 1) Instantiate the framework so that it can support running CASTEP simulation(s). 2) Submit a CASTEP simulation job to NGS to calculate EoS of GaN.

The instantiation of the framework to support the CASTEP-specific simulation mainly comprises the following procedures.

1) Define parameters and values required for EoS in the task defi-nition file.

2) Create CASTEP CML dictionary according to the schemas de-fined in the framework. By using the CASTEP-specific task defi-nition file and CML dictionary, the portal can generate CASTEP-specific simulation input files.

3) Implement or modify Web services according to WSDL inter-faces defined by the framework.

(5)

Fig. 6. Fits of equations of state to simulated results of GaN. The entered lower pressure is 0 GPa, upper value is 50 GPa, and the required number of simulations is 5. The plot is generated byproperty calculationservice of the EoS workflow v1.0, which only shows the curve of predicted pressure (third-order B-M).

output, and can also receive an email notification. The plot of fits of EoS from the simulation output is shown in Fig. 6.

IV. DISCUSSIONS

The framework can also be instantiated to submit other quantum mechanical simulation (e.g., SIESTA) to Grid systems. The main chal-lenge of instantiating the framework to submit different simulations is the creation of simulation-code-specific input files. However, this can be achieved by modifying the task definition file and creating a simulation-code-specific CML dictionary, which follows the schema defined in the framework.

The framework is independent of any physical Grid resources (e.g., NGS and CamGrid). The resource locating and job submission are handled bymetaschedulingservice andjob submissionservice, which need to be configured/implemented when instantiating the framework. The advantages of employing the SOA approach in the framework have been demonstrated. This modular design approach ensures that the framework is scalable, robust, and easy to maintain. For example, currently the job submission/monitoring service is using RMCS as a job submission tool. But the service could also use other job submission tool (e.g., GridSam [21]) and this replacement has no impact on the rest of the systems. The SOA and WSDL interfaces effectively decouple the workflow creation and service implementation.

V. RELATEDWORK

Currently, a popular approach to running CASTEP applications is to employ the Materials Studio [22]. Materials Studio provides a client– server software environment. At the client side, the users can create models of molecular structures and materials. The model information is then transferred to the server side, where the calculations are performed and the results are returned to the users. This approach means users have to purchase and install Materials Studio, and have to use Materials Studio client to run the simulation. Furthermore, this approach cannot take advantage of Grid computational resources.

The eMinerals [23] project has developed a suite of tools (e.g., RMCS) for running quantum mechanical simulation of environmental

creation and running of simulation jobs in a Grid environment, the subsequent extraction of core output information, and the longer term archiving of data. It also provides a mechanism to automate the co-ordination of a sequence of tasks within a simulation process without direct human control, whereas each task is presented as a Web service and integrated into the workflow orchestrated by the workflow engine.

ACKNOWLEDGMENT

The authors would like to acknowledge U.K. government Depart-ment of Trade and Industry/Technology Strategy Board, which funds the MaterialsGrid project, and contributions from MaterialsGrid part-ners, which include IBM, Accelrys, the Science and Technology Facili-ties Council, the University of Frankfurt, and University of Cambridge.

REFERENCES

[1] M. D. Segall, P. J. D. Lindan, M. J. Probert, C. J. Pickard, P. J. Hasnip, S. J. Clark, and M. C. Payne, “First-principles simulation: Ideas, illus-trations and the CASTEP code,” J. Phys., Condens. Matter, vol. 14, pp. 2117–2744, 2002.

[2] J. M. Soler, E. Artacho, J. D. Gale, A. Garcia, J. Junquera, P. Ordej´on, and D. S´anchez-Portal, “The SIESTA method for ab initio order-N materials simulation,” J. Phys., Condens. Matter, vol. 14, pp. 2745–2779, 2002. [3] Globus Toolkit. (2008). [Online]. Available: http://www.globus.org/

toolkit/

[4] X. Yang, P. R. Moore, C. B. Wong, and J. Pu, “A component-based software framework for product lifecycle information management for consumer products,” IEEE Trans. Consum. Electron., vol. 53, no. 3, pp. 1195–1203, Aug. 2007.

[5] D. F. D’Souza and A. C. Wills,Objects, Components and Framework With UML: The Catalysis Approach: Addison-Wesley, 1999.

[6] Materials Grid. (2009). [Online]. Available: http://www.materialsgrid.org [7] X. Yang, M. Dove, M. Hayes, M. Calleja, L. He, and P. Murray-Rust, “Survey of tools and technologies for Grid-enabled portals,” inProc. U.K. e-Sci. Hands Conf., 2006, pp. 353–356.

[8] IBM WebSphere Portal. (2010). [Online]. Available:http://www-01.ibm. com/software/websphere/portal/

[9] Jmol. (2009). [Online]. Available: http://jmol.sourceforge.net/

[10] X. Yang, R. Bruin, and M. Dove, “Developing an end-to-end scientific workflow: A case study of using a reliable, lightweight, and comprehensive workflow platform in e-science,”IEEE Comput. Sci. Eng., 2009. DOI: http://doi.ieeecomputersociety.org/10.1109/MCSE.2009.211.

[11] WebDav. (2009). [Online]. Available: http://www.webdav.org/

[12] P. Murray-Rust and H. S. Rzepa, “Chemical markup, XML, and the Worldwide Web. 1. Basic principles,”J. Chem. Inf. Model., vol. 39, no. 6, pp. 928–942, 1999.

[13] A. M. Walker, R. P. Bruin, M. T. Dove, T. O. H. White, K. Kleese van Dam, and R. P. Tyer, “Integrating computing, data and collaboration grids: The RMCS tool,”Philos. Trans. R. Soc. Spec. Issue, Environ. e-Sci. Revolution, vol. 367, no. 1890, pp. 1047–1050, Mar. 2009.

[14] R. Bruin, T. White, A. Walker, K. Austen, M. Dove, R. Tyer, P. Couch, I. Todorov, and M. Blanchard, “Job submission to grid computing envi-ronments,” inProc. U.K. e-Sci. Hands Meeting, 2006, pp. 754–761. [15] S. R. Hall, F. H. Allen, and I. D. Brown, “The crystallographic information

(6)

[16] JSR 168 Portlet Specification. (2009). [Online]. Available: http://jcp.org/ aboutJava/communityprocess/final/jsr168/

[17] JQuery JavaScript Library. (2009). [Online]. Available: http://jquery.com [18] ICSD. (2010). [Online]. Available: http://www.fiz-karlsruhe.de/icsd.html [19] NGS grid [Online]. Available: http://www.grid-support.ac.uk/

[20] P. Perrot,A to Z of Thermodynamics. London, U.K.: Oxford Univ. Press, 1998, ISBN 0-19-856552-6.

[21] W. Lee, A. McGough, and J. Darlington, “Performance evaluation of the GridSAM job submission and monitoring system,” inProc. 2005 UK e-Science All Hands Meeting, Nottingham, U.K., 2005.

[22] Materials Studio. (2009). [Online]. Available: http://accelrys.com/ products/ materials-studio/

Gambar

Fig. 2.Example of Tasklist.xml—the task definition file.
Fig. 3.Screenshot of the structure builder interface. A user can select the spacegroup symbol whose value is automatically completed
Fig. 4.Geometry optimization workflow (v1.0). This workflow contains two parts that are pipelined sequentially: the first part mainly handles the CASTEPsimulation run, while the second part handles postsimulation operations, which include property calculation, store-to-DB, and notification.
Fig. 6.Fits of equations of state to simulated results of GaN. The enteredworkflow v1.0, which only shows the curve of predicted pressure (third-orderlower pressure is 0 GPa, upper value is 50 GPa, and the required number ofsimulations is 5

Referensi

Dokumen terkait

Madapangga , Pokja Bidang Pengairan Dinas Pekerjaan Umum Unit Layanan Pengadaan Kabupaten Bima mengundang Saudara untuk melakukan Pembuktian kualifikasi dan Klarifikasi pada

Jawablah setiap pertanyaan ini sesuai dengan pendapat Bapak/ Ibu/ Saudara/ Saudari dengan sejujur – jujurnya, dan perlu dketahui bahwa jawaban dari kuesioner ini tidak

Pada dasarnya analisis kuantitatif adalah analisis yang menggunakan alat analisis bersifat kuantitaif, yaitu alat analisis yang menggunakan seperti model matematika

(1) Rerata prestasi belajar, sikap bekerja dan tingkat kelulusan uji kompetensi siswa SMK Negeri di Kota Yogyakarta tahun ajaran 2016/2017 sangat tinggi, (2) ada hubungan

Dalam penelitian ini Bapak / Ibu akan menjalani pengambilan sampel saliva (ludah) sewaktu untuk memastikan diagnosis apakah ada terkandung pepsin disaliva, dan sebelumnya saya

As the number of case histories of tunnels subject to earthquake action has increased, the industry has started to recognize that, although tunnels in rock have good resistance

he objective of this study was to isolate and identify nontuberculous mycobacteria species from cattle of local herds in the south region of the State of Mexico through

[r]