• Tidak ada hasil yang ditemukan

University of the Philippines Manila College of Arts and Sciences

N/A
N/A
Protected

Academic year: 2023

Membagikan "University of the Philippines Manila College of Arts and Sciences"

Copied!
138
0
0

Teks penuh

The paradigm shift brought about by next-generation sequencing (NGS) technologies is giving rise to the applications of in silico human genomic data in the clinical setting – one application is genomic disease susceptibility testing (DST). The use of sensitive genomic data during DST implies the need to keep the test private and secure: with both a patient's medical unit and the disease marker research group wanting to keep their genomic data private from each other and from any other adversary who might could stigmatize and/or discriminate against the patients of the medical unit and that could jeopardize the profitable disease markers of the research group.

Background of the Study

The combined effects of these ascertained variants increased the rate of risk discrimination for breast cancer prevention and early detection [19]. This premise of general clinical utility for genomic variants, while potentially improving risk discrimination for DST, raises privacy concerns for the genomic data of DST patients.

Statement of the Problem

With the consequences of the compromise of a patient's genomic data, many patients developed distrust in providing their genomic data even to the medical entity they wish to have tested themselves. With SMC, the medical unit and the research team can feed their genomic data into the calculation without revealing anything about their respective inputs – keeping both their genomic data private while still being able to produce a relevant output.

Objectives of the Study

On the other hand, the disease marker research group cannot provide details of their marker locations because of the possibility that the medical unit will infer these disease markers - compromising the research group's profitable enterprise. A possible way to solve the situation of the medical unit and the research group in performing DST is by using secure multiparty computing (SMC).

Significance of the Project

In addition, a privacy-preserving genomic DST provides a secure way for the medical units to perform further DST calculation using a patient's clinical and/or environmental data. With an obtained genomic DST result (i.e. regression coefficient), medical units can further calculate for the patient's overall DST by merging this obtained result with the result of similar calculation using the same patient's clinical/environmental data [ 29 , 30 ].

Scope and Limitations

Finally, it is important for a medical entity to outsource DST using the research group's disease marker, because reliable disease markers only come from intensive research of hundreds of thousands of samples in genome-wide association studies (GWAS) [31] where there is no average. medical device can produce [32].

Assumptions

Review of Related Literature 7

In their work, the genomic data of the patient is encrypted and stored in a storage processing unit (SPU). When performing DST, the MC requests specific regions of the patient's genome from the SPU.

Table 1: Variant Calling Software
Table 1: Variant Calling Software

Variant Call Format

INFO contains additional information about a particular version that can be traced in the ##INFO= metadata. The content of this format can be specified in the ##FORMAT= metadata.

Figure 1: Sample VCF file [55]
Figure 1: Sample VCF file [55]

Secure Multiparty Computation

Definition

Classification

Peer Model

Sharemind

Sharemind Framework

Sharemind's only developed protocol suite, shared3p, uses the additive sharing scheme with passive security (ie the protocol is secure against a semi-honest peer). Other unreleased versions of Sharemind include: shared2p – a two-party version of shared3p; sharednp – a multiparty version of shared3p; and shared2a – a two-party protocol suite with active security (i.e., the protocol is secure against a malicious adversary) [62].

SecreC and Controller Library

Using Node.js servers together with Sharemind miners contributed to the solution implemented by the Javascript controller library that bypasses the same-origin policy implemented in many modern browsers. By including Socket.io - a library for Node.js applications - in the browser's Javascript controller library, the controller library was able to implement Cross-Domain Resource Sharing (CORS) - a solution to bypass the same-origin policy .

Figure 4: Node.js servers connecting the client and Sharemind miners An example Javascript code implemented over JQuery, a Javascript framework, connecting to the Node.js servers with ports 8081, 8082, and 8083 using the Javascript controller with namespac
Figure 4: Node.js servers connecting the client and Sharemind miners An example Javascript code implemented over JQuery, a Javascript framework, connecting to the Node.js servers with ports 8081, 8082, and 8083 using the Javascript controller with namespac

Design and Implementation 26

To manage the disease, the system administrator can add the disease, edit the disease name and delete the disease. Finally, for system administrator functions, the system administrator can add a disease tag to a disease in the system, edit a disease tag to a disease, and delete disease tags.

Figure 7: Use Case – Manage Users
Figure 7: Use Case – Manage Users

Database Design

Data Dictionary

The research group's database will contain five tables: disease information table, variant information table, user information table, user log table and user table status table. The time when the patient variant upload was completed; also appears on the get results page.

Table 5: Research group’s marker table
Table 5: Research group’s marker table

Entity Relations Diagram

System Architecture

MathJax, on the other hand, is used to render the equations in the about page. These miner binaries are used to run the Sharemind miners that run the compiled scripts from SecreC (i.e. SecreC bytecodes); while the NodeJS module, called JSWCP, is used as the main module that connects the Sharemind miners to the PHP-based system and vice versa.

Figure 13: Genomic DST System Stack
Figure 13: Genomic DST System Stack

System Development

  • Developing overall system functionalities
  • Uploading and processing patient variants
  • Storing disease markers and patient variants
  • Calculating risk coefficient
  • Generating PDF report

Furthermore, most of the functions for the system user (i.e. medical unit) do not rely on CodeIgniter (i.e. PHP server). Instead, a mix of plain Javascript, AngularJS, the Sharemind controller library, and other Javascript libraries were used to perform most of the user functions (i.e., functions that process sensitive data) in the browser. The upload function for the patient variant in VCF file of the system user is processed in the browser by having the browser read the file.

Calculation of the risk coefficient also uses the serversConnect() and server.emit() functions to initialize the SecreC bytecode that calculates the patient's risk coefficient. The bytecode uses Sharemind's simpleLinearRegression() function to obtain the y-intercept of the regression line generated by the weight of the marker and its genotypic value in the patient's variants. Afterwards, these vectors are multiplied with each other to generate a single vector corresponding to the phenotypic values ​​of the patient variants for each disease marker weight.

Technical Architecture

Results 42

Validation and alerting for the login page is similar to the login page and the rest of the pages in the system. The disease catalog page contains all actions related to both the disease name and the management of the disease markers. The portion of the page enclosed by a yellow box is where the administrator can add, edit, and delete a disease.

The disease flag section of the page allows the administrator to add, edit, and delete disease flags. The administrator can also update the markers in the Sharemind database by clicking the “Update Sharemind Database” button at the bottom of the page. The addition of a single disease marker can be achieved by simply completing the form fields in the modal form; while the batch upload can be achieved by uploading a VCF file of the markers to the system.

Figure 17: Login page – invalid password alert
Figure 17: Login page – invalid password alert

Calculating risk coefficient

This was due to the pre-processing of the odds ratio before storage, where instead of storing the odds ratio, the unsigned integer representation of the weights is stored. This was done to make use of the simpleLinearRegression() function, which only accepts integer values. By allowing the odds ratio in its raw form to be stored instead of the unsigned representation of the weights, the computational bytecode was able to derive the real value of the weights using the odds ratio, yielding a decimal value between (−∞ , 0.7) ).

To obtain the real value of the y-intercept, division must be performed in the byte code. 100 variants) in the above graph indicates that the declassification of SecreC variables in the conditional statement of the nested for loop contributes significantly to the overall execution time of the bytecode. Tests based on best case (i.e., markers are present at the beginning of the list of patient variants) on the other hand do not show significant change in execution time given varying patient variant numbers – therefore consolidating the idea that the location of a variant that represents a pointer, greatly affects the execution time of the bytecode.

Sharemind miners

Conclusions 72

The system could store the patient variants and the disease markers without depending on any entity except the browser - where they access the system, send their data directly to the NodeJS servers of the Sharmeind miners, retrieve the risk result directly, and generate a browser-based PDF report for the result. The system was also able to calculate the risk coefficient of the patient among the three Sharemind miners, including the calculation of the y-intercept of the regression line used in the additive model that calculates the overall genomic risk coefficient. Overall, the system was able to fulfill its main goal: to create a privacy-preserving, web-based genomic disease susceptibility test that protects the privacy of the medical unit's patient variants and the research group's disease markers.

The genomic disease risk calculator presented in this paper meets the fundamental requirements for preserving the privacy of the input data and the calculation. The current patient variant upload mechanism of the system can only handle variants with size below 30,000 float. Variants higher than 30,000 make the browser warn of a constantly running script, which continuously analyzes the uploaded VCF file in the browser.

Calculation bytecode

It is therefore recommended to create a more optimal way of loading and parsing the patient's VCF file in the browser, so that the browser can still perform other Javascript functions while still running the file parser (i.e. drill down into the use of possible to multi- Threaded Javascript). Another improvement to the computation bytecode is the nature of the input values ​​processed during the execution of the bytecode. The current design of the computation bytecode clearly relies on the raw likelihood ratio of the disease markers.

Thus, calculation of the weight (ie natural logarithm of the odds ratio) is performed in the bytecode, which adds to its total execution time. Therefore, it is recommended to find a way to move parts of the calculation from the bytecode and place it in the Javascript controller that the system uses in the browser (eg perform logarithmic calculations in Javascript instead). It is recommended to look into multiplying the decimal value by the reciprocal value of the weight multiplier so that a division can technically be performed in SecreC.

Overall patient disease risk

Bibliography 75

  • SecreC source codes
  • Javascript source codes

Gambar

Figure 3: Sharemind’s representative-based architecture with three private peers (miners) [60]
Figure 6: Use Case – System Computation
Figure 11: Use Case – Sharemind Processes
Figure 12: Entity Relations Diagram for research group database
+7

Referensi

Dokumen terkait

MATERIALS AND METHODS The effects of temperature on the duration and survival of egg, larval and pupal stages, adult female weight and fecundity, and reproductive potential were

،سلجملا هػ حمثثىملا لمؼلا قشفَ ناجللا هػ سذصي امك خاػُضُملا لَاىتت لمػ قاسَأ ًتسسامم همض يتشؼلا ذمىلا قَذىص ذؼي ،كلر ّلإ حفاضإ .قشفلاَ ناجللا يزٌ اٍشلاىت يتلا اياضملاَ يتلا