29
Derek M. Bickhart (ed.), Copy Number Variants: Methods and Protocols, Methods in Molecular Biology, vol. 1833, https://doi.org/10.1007/978-1-4939-8666-8_2, © Springer Science+Business Media, LLC, part of Springer Nature 2018
Chapter 2
Using SAAS-CNV to Detect and Characterize Somatic
Copy Number Alterations in Cancer Genomes from Next
Generation Sequencing and SNP Array Data
decreasing cost [9, 10]. Meanwhile, these techniques generate a huge amount of data, demanding more powerful and sophisticated methods for data analysis. To this end, we have developed an SCNA analysis method SAAS-CNV [11] and implemented in an R package saasCNV. The SAAS-CNV has been compared to several state-of-the-art SCNA analysis methods based on a large breast cancer dataset from The Cancer Genome Atlas (TCGA) and achieved the best performance [12].
saasCNV package is applicable to WGS/WES data and whole- genome SNP array data generated from tumor and matched nor- mal tissue. The main workflow of the pipeline is summarized in Fig. 1 [11]. From the sequencing (SNP array) platform, two types of information relevant to SCNA status are extracted at loci with heterozygous genotype: (1) total read depth (total fluorescent intensity) reflecting total copy number change; (2) allele specific read depth (allele specific intensity) reflecting allelic imbalance as a result of differential copy number changes upon the two alleles.
Fig. 1 The analysis workflow of saasCNV package. Picture was reproduced from [11] under the Creative Commons Attribution (CC BY) license
The two types of information also provide valuable clues for the inference of tumor ploidy and purity. These two types of signals are transformed into quantitative measures, called log2ratio and log2m- BAF, respectively [11]. A joint analysis is then performed on these two signal dimensions in both segmentation and SCNA calling steps. In the segmentation step, the two signal sequences are jointly partitioned into segments of different sizes, each corresponding to a particular copy number status. In the following SCNA calling step, the algorithm identifies the baseline values for both log2ratio and log2mBAF dimensions, based on which the SCNA status is decided for each segment. The possible SCNA statuses include normal, loss, gain, copy-neutral loss of heterozygosity (CN-LOH) and “undecided” if no confident status can be decided.
For each tumor–normal pair, the saasCNV package will con- struct a directory in which intermediate data, intermediate results and final results are further organized in different subdirectories.
Moreover, it also provides visualization tools for demonstrating final results as well as diagnosis plots from intermediate analysis steps.
For more information and example data, the users can refer to the package website: http://zhangz05.u.hpc.mssm.edu/saasCNV/.
2 Materials
The saasCNV package was implemented in R scripts [13] and can be run on desktop/laptop with commonly used operation systems (OS), such as Windows, Mac OS, and Linux. It can be also run on a Linux/Unix-based high performance cluster server, which is par- ticularly efficient when processing multiple samples in parallel.
1. Install R package. The users can download OS specific, pre- compiled R package at https://cran.r-project.org/ and follow the step-by-step instructions for installation.
2. Install dependent packages. saasCNV uses functions in another two packages, RANN and DNAcopy, which needs to be installed a priori. The following commands are all typed and run in R environment following a prompt symbol “>”, unless otherwise specified.
To install RANN package, type > install.packages("RANN") To install DNAcopy package, type
> source("https://bioconductor.org/biocLite.R") > biocLite("DNAcopy")
3. Install saasCNV package. You just need to type
> install.packages("saasCNV") 2.1 Computational
Environment
2.2 saasCNV Package
Details about all functions in the package along with their parameters are provided in the software documents, which can be found by typing
> help("function.name")
where “function.name” refers to the name of the R func- tion in the package to be searched.
The analysis with saasCNV package starts with Variant Call Format (VCF) files generated from WGS or WES for the experimental design with tumor and matched normal pairs. The VCF files are usually produced by the GATK pipeline [14, 15] (see Note 1). The package can be also applied to data generated from whole-genome SNP array platforms. The analysis pipelines for WGS/WES data and SNP array data are almost identical except the data preparation step. We will mainly describe the pipeline for sequencing data anal- ysis in Methods section, and explain the differences for SNP array data in Note 2.
To demonstrate how to use the package, we have prepared example data, which can be downloaded at https://
zhangz05.u.hpc.mssm.edu/saasCNV/data/. The users can download these files: WES_example.vcf.gz, vcf_table.txt.gz, snp_
table.txt.gz, refGene_hg19.txt.gz and GC_1kb_hg19.txt.gz; and put them in a working directory (e.g., wk_dir), where all analysis results and plots will be located.
3 Methods
The saasCNV package can be run in a “pipeline” mode with the integrated function NGS.CNV for sequencing data (SNP.CNV for SNP array data) or in a step-by-step mode. While the “pipeline”
mode provides the users with a quick start, the step-by-step mode offers more control and flexibility. We refer the users to more details about “pipeline” mode in Note 3. Here we mainly describe the step-by-step mode, which illustrates the detailed workflow.
At the beginning, you need to load saasCNV package in R envi- ronment and set up working directory:
> library(saasCNV)
> setwd(“/path/to/wk_dir”)
where “/path/to/wk_dir” specifies the absolute path to the working directory wk_dir. The pipeline for sequencing data (WGS/WES) analysis begins with VCF files. We have prepared an example WES VCF file WES_example.vcf.gz. The VCF file contains the information of both tumor and matched normal tissues.
2.3 Input Data
3.1 Input Data Preparation
Following the header of annotations, the first few rows are demon- strated in Table 1 (see more information in Note 1). The first nine columns in VCF file are mandatory, followed by the information of called variants from each sample starting from the tenth column.
We provide a tool vcf2txt to retrieve necessary information from VCF file and convert it to a text table:
> vcf_table <- vcf2txt(vcf.file = "WES_example.
vcf.gz", normal.col = 9+1, tumor.col = 9+2)
The normal.col and tumor.col specify the columns in which the genotype and read depth information of normal and tumor tissues are respectively located in the VCF file. The first few lines of the resulting vcf_table are shown in Table 2. The first seven columns are retrieve from VCF file directly, where CHROM and POS are necessary for subsequent analysis. QUAL and MQ are quality scores for genotyping and reads mapping, which can be used as filters to exclude variants of poor quality (see Note 1). Starting from the eighth column are genotype, reference allele read depth, and alternative allele read depth for normal and tumor, respectively.
Then we transform allele-specific read depth information into log2ratio and log2mBAF [11] to be used for subsequent analysis using cnv.data:
> seq.data <- cnv.data(vcf = vcf_table, min.chr.
probe = 100, verbose = TRUE)
Here min.chr.probe specifies the minimum number of geno- typed sites needed for a chromosome to be analyzed in subsequent steps. The first few rows of seq.data are shown in Table 3. The first four columns chr, position, log2ratio, and log2mBAF are the required information for SCNA analysis, while the rest columns are used for visualization and diagnosis. It should be noted that only heterozygous sites (i.e., Normal.GT is “0/1”) in Table 2 were car- ried over to Table 3.
We provide an option to perform GC content adjustment on the log2ratio signal. The details can be found in Note 4.
We employ a generalized circular binary segmentation (CBS) algo- rithm [16] to perform joint segmentation on log2ratio and log2m- BAF dimensions. The function joint.segmentation outputs the starting and ending locations of each SCNA segment as well as several summary statistics.
> seq.segs <- joint.segmentation(data = seq.data, min.snps = 10,
global.pval.cutoff = 1e-4, max.chpts = 30, verbose
= TRUE) 3.2 Joint
Segmentation
Table 1 An example VCF file #CHROMPOSIDREFAltQualFilterInfoFormatNormalTumor chr114,907rs79585140AG1650.44VQSRTrancheSNP99.50to99.90…; MQ = 20.64; …GT:AD: …0/1:42,43: …0/1:56,46: … chr114,930rs75454623AG2048.44VQSRTrancheSNP99.50to99.90…; MQ = 23.07; …GT:AD: …0/1:34,48: …0/1:53,58: … chr1753,405rs61770173CA683.21PASS…; MQ = 29.16; …GT:AD: …1/1:0,24: …1/1:0,11: … The first few rows of the table are demonstrated. Some of the information that is not directly used in SCNA analysis is skipped and represented as ellipsis
Table 2 Allele-specific read depth information retrieved from the example VCF file CHROMPOSIDREFALTQualMQNormal. GTNormal.REF.DPNormal.ALT.DPTumor. GTTumor. REF.DPTumor. ALT.DP chr1762,589rs71507461GC898.237.91/12191/1014 chr1762,592rs71507462CG880.237.91/12191/1014 chr1762,601rs71507463TC831.237.451/11201/1012 chr1762,632rs61768173TA618.2337.391/11161/10 8 chr1801,943rs7516866CT1551.4452.030/123350/1422 chr1808,631rs11240779GA1173.3754.690/119131/1324 The first few rows of the table are demonstrated
Table 3 Data used for joint segmentation and SCNA calling chrPositionlog2ratiolog2mBAFnormal.BAFnormal.mBAFtumor.BAFtumor.mBAF chr1801,943−1.5504270580.4876898790.6034482760.6034482760.8461538460.846153846 chr1808,631−0.6379982790.5821474850.4062500000.5937500000.8888888890.888888889 chr1880,390−0.4632751090.6109577090.5238095240.5238095240.8000000000.800000000 chr1881,627−1.3928857810.0353348260.7000000000.7000000000.5000000000.717356243 chr1892,460−0.0392488270.7838665690.4444444440.5555555560.9565217390.956521739 chr1898,852−0.3928857810.2410081000.2142857140.7857142860.0714285710.928571429 The first few rows of the table are demonstrated
Please refer to Note 5 for details about the parameters used in this function. Table 4 lists the variables generated from the analysis along with their descriptions.
There is an option to merge adjacent segments, for which the median values in either or both dimensions of log2ratio and log2mBAF are not substantially different (see Note 6).
The results from joint segmentation (or segments merging) can be visualized chromosome by chromosome.
> diagnosis.seg.plot.chr(data = seq.data, segs = seq.segs,
sample.id = "Sample ID", chr = 8)
Figure 2 shows the joint segmentation results for Chromosome 8 as an example (see Note 5).
In this step, we can assign SCNA status to each segment resulted from the joint segmentation step (or from the results after seg- ments merging step). The baseline adjustment step is incorporated implicitly in the function cnv.call.
> seq.cnv <- cnv.call(data = seq.data, sample.id
= "Sample ID",
segs.stat = seq.segs, N = 1000, pvalue.cutoff = 0.05)
When this step is completed, a few more columns are added to seq.segs, which summarize the baseline adjusted log2ratio and log2mBAF mean/median values, p-values, and SCNA status for each segment (Table 4). The parameter pvalue.cutoff controls the significance level for SCNA calling (see Note 7 for more informa- tion). We also provide an option to add gene annotation to each SCNA segment (see details in Note 8).
We provide two ways of visualization of joint segmentation and SCNA calling results: (1) the genome-wide SCNA profile; (2) cluster- type plot of SCNA segments projected onto the log2mBAF- log2ratio space.
> genome.wide.plot(data = seq.data, segs = seq.
cnv, sample.id = "Sample ID", chrs = 1:22)
> diagnosis.cluster.plot(segs = seq.cnv, chrs = 1:22, min.snps = 10)
The rendered figures are shown in Figs. 3 and 4. These plots are informative for diagnosis of data quality (see Note 9 for more details) and manual baseline adjustment (see Note 10 for more details).
3.3 SCNA Calling
3.4 Visualization of Results
Table 4
Variables generated from joint segmentation and SCNA calling
Variable Description
Appear in both joint segmentation and SCNA calling results
chr Chromosome
posStart The physical position of start site on the chromosome posEnd The physical position of end site on the chromosome length Physical size (posEnd - posStart +1)
chrIdxStart The index of start site among all the sites on the chromosome chrIdxEnd The index of end site among all the sites on the chromosome
numProbe The number of sites within the segment (chrIdxEnd - chrIdxStart +1) log2ratio.Mean The mean of log2ratio values at the sites within the segment
log2ratio.SD The standard deviation of log2ratio values at the sites within the segment log2ratio.Median The median of log2ratio values at the sites within the segment
log2ratio.MAD The median absolute deviation of log2ratio values at the sites within the segment
log2mBAF.Mean The mean of log2mBAF values at the sites within the segment
log2mBAF.SD The standard deviation of log2mBAF values at the sites within the segment log2mBAF.Median The median of log2mBAF values at the sites within the segment
log2mBAF.MAD The median absolute deviation of log2mBAF values at the sites within the segment
Additional variables generated from SCNA calling results
Sample_ID Sample ID
remark =1, if the identified baseline is not reliable; =0, otherwise
log2ratio.base.Mean The baseline mean of log2ratio values at the sites within normal segments log2ratio.base.Median The baseline median of log2ratio values at the sites within normal segments log2ratio.Sigma The estimate noise level of log2ratio values across the genome
log2mBAF.base.Mean The baseline mean of log2mBAF values at the sites within normal segments log2mBAF.base.
Median The baseline median of log2mBAF values at the sites within normal segments log2mBAF.Sigma The estimate noise level of log2mBAF values across the genome
log2ratio.Mean.adj The baseline-adjusted mean of log2ratio values at the sites within the segment log2ratio.Median.adj The baseline-adjusted median of log2ratio values at the sites within the
segment
(continued)
Table 4 (continued)
Variable Description
log2mBAF.Mean.adj The baseline-adjusted mean of log2mBAF values at the sites within the segment
log2mBAF.Median.adj The baseline-adjusted median of log2mBAF values at the sites within the segment
log2ratio.p.value The p-value for the log2ratio.Median.adj value of the segment log2mBAF.p.value The p-value for the log2mBAF.Median.adj value of the segment
p.value The p-value for both log2ratio.Median.adj and log2mBAF.Median.adj values of the segment
CNV SCNA status, including loss, gain, CN-LOH, normal, undecided
4 Notes
1. Detailed description of VCF can be found at https://sam- tools.github.io/hts-specs/VCFv4.2.pdf. In GATK pipeline, the detected variants are subject to variant quality score recali- bration (VQSR) so that the FILTER field in generated VCF files will be populated with phrases, such as “PASS”,
“VQSRTrancheSNP99.50to99.90”, and so on, which describe the quality of the variant calls (Table 1). The vcf2txt function uses the variant sites with the quality of “PASS” for down- stream SCNA analysis. When the FILTER field of the input VCF files is not filled with such quality-descriptive values or the VCF files are generated by other tools (e.g., MuTect [17], VarScan2 [18], etc.), the users may need to modify vcf2txt function to make it adaptable for specific data format.
2. We mainly consider the SNP array data produced by Illumina Infinium whole-genome microarray. For SCNA analysis, the log R ratio (LRR) and B allele frequency (BAF) signals, which respectively reflect the total copy number change and allele proportion change [8], can be retrieved from final report files (see Table 5 for an example). It should be noted that, in Table 5, the columns ID, REF, and ALT are not essential and only for annotation purpose. For data generated from Affymetrix SNP array, LRR and BAF information can be extracted by PennCNV- Affy tools (http://penncnv.openbioinformatics.org/en/lat- est/user-guide/affy/).
3. In the “pipeline” mode, all the analysis steps described in Methods section are integrated into NGS.CNV function and can be run altogether. The results, including tables and plots,
are placed in subdirectories of the working directory output.dir specified by the users.
> vcf_table <- read.delim(file = "vcf_table.txt.
gz", as.is = TRUE)
> NGS.CNV(vcf = vcf_table, output.dir = “/path/
to/wk_dir”,
sample.id = “Sample ID”, min.chr.probe = 100, do.GC.adjust = FALSE, min.snps = 10, joint.seg- mentation.pvalue.cutoff = 1e-4, max.chpts = 30,
l ll
l ll
l l
ll
l ll
l l l
l l l
l l l l
ll l
lll
l l l l lll lll
l ll l
l l
ll l l ll l
l l
l l ll
ll l ll l
l l l l
l l l
l l ll l l ll l
ll l l lll
l l l
ll l l ll
ll l
l l l
ll l l l
l l
l
l l l
l
l l l l
l l
ll
l l l l l
l l
l ll ll
l l
l l l
l l
l
l l
l l
l l lll
ll lll
l l l l ll l l
l
l ll l l l l
l l ll l l
l l
l ll l
l l l l
ll l l l
l l l
l ll l l
l l l
l
ll
l l
l l
l ll l
l l
l l
l
l l ll l ll
l l l l l
ll
l l
l
l l l l
l l l
l
l l l l
l l lll ll
l
l l l
l l
ll
l
l l ll l
l l l ll lll l l l
ll l l
l
l lll l l lll l
l l
l l l
l l l l
l l ll l ll
l
l l l l
l
l l l l l l
ll
ll l l
l l l
ll l
l l
l l l ll ll
l l ll l l l l l
l
l l
l l
l l l l l
l l l l l
l l l
ll l ll lll
lll l
l l
l l
l l l l l l
l l
l lll ll l
l l l
l l l
l l ll
l l ll l
l l
ll l
l l
l l ll
l l ll
l l l
l l
l l l l l l l
l l l
l
l l l l
l l l
l ll llll l l ll
l l
l l l
l ll l l l
l l l
l
l l l l
l lll l l lll l
ll
ll l l l l l
l l
llll l
l l
l l l ll
l l l l l l
l l l
l l llll lll l
l
l l ll
l ll l l l
l l l l l ll l l
lll ll
l l ll lllll
l l
l ll
l l l
ll l l l l
lll l
l ll l
l l l l l l
l l l l l l ll l ll
l l
llllll l l l ll ll ll
l l
l ll
l l
ll
l
l l l
l l
l l l l
l ll
l l lll l l ll l l
l l ll l l lll l ll lll l
llll l l
ll l l
l llll
l l
l l
l l l ll ll
l l
l l l
ll llll
l
l
l l
l
l ll
l ll l
l l l ll
ll
l l
l l
l l
l l l
l l l l l l
ll l ll l llll
ll ll l
l l
ll
l l l
ll lll
l llll l ll lll ll l
l l
l lll
l ll
l l l ll
l ll l ll l
l ll l
l
ll
l lll l
ll ll
l l
l l l l
l l
l l
l l l l l
l l
l lllll l lll l l ll ll l lll l ll l
l ll l
ll ll l l l lll l l
l l
l l l
l l l
l ll
l l l l
l l l
l
l l
llll ll ll
l l l
l l l ll l l
l l
l l l l
l l l lll l ll l l ll l
l ll ll
l l l l l ll
l ll l
ll l lll ll
l l l
l l
ll l l l
l
ll l l
l ll l l l l l l l l l l
l l l l ll ll lll
l l l l
l l l l
l l ll
ll ll l
l l l l
l l l l
l l l l
l ll l
l l ll
l l
l l
l l
ll
l l lll ll ll
ll l
l l l l ll
0 50 100 150
−3−113
Sample ID Chr 8
Position (Mb)
log2ratio
l
l l
l l
l
l l l
l
l ll l
l
l l
l l l
l l
l l
l l l
l l
ll l
l l l ll l l
l l ll l
l l
l l l
l
l l l l
l l
l
l l l l l
l
l l l
l l
ll l
l l l l
l l
l l l
l l l
l
l ll l
l l
ll ll
l l
ll
l l
l
l
l l
l l
l l
l l l
l
l
l ll
l
l l
l l l
l l
l l
l l l
l ll l
l
l l l
ll l
ll
l l
l l
l
l
l l
l l
l l
l l
l l
l
l l l l
l
l ll
l l
l l
l l
l
l l
l l l l
l
l l l ll l
l l
l
l ll
l l
l l
l
l l
l l
l
l l l
l
ll l
l
l ll
l l
l l l
l l l l l
l l
l l l l
ll l
ll ll l
l ll
l l
l l
l
l l ll l l
l l
l l l
l ll
l l
l l l l
l l
l l
l l
l l l
l
l l
l l l
l
l l
l l
l l l
l
l l l l l
l l l l
l
l
l l ll
l l
l
l
l l
l l
l l l
l
l l l
l l l
l l
l l l
l
l
l l
l l l
l
l
l l l
l ll
l l
l
l l l l
l
l l l
l
l l
l
l l l l l
ll l
l l
l l l
l l
l l l l
l l
l l ll
l
l l
l
l ll l
l l
l
l l
ll l
l
l l
l l l
l l
l l
ll l l
l
l l
l l l l
l
l l
l
l l
l
l l
l l
l l
l l
l
l l l l
l l ll
l
l l l
l l
l l l l l
l
l l l
l l l
l l
l l
l l
l l
l l l
l
l l
l ll l l l
l
l l l l l l
ll l
l l
l l
l l ll
l l l
l
l l
ll
l l
l
l
l ll
l l
l
l l
l l l
l l l
l l l
l l
l l
l l ll l
l l
l l
l l
l l
l l
l
l
l l l ll l l
l l l
l
l
l l
l ll l l lll
ll l
l l
l l
l l l
l l
l l
l
l l
l ll l
l l
l
l l l
l l
l l ll l
l l l
l l
l l l l l
l l l
l
l l l l l
l l l l
l l
l l
l
l l
l l
ll l
l l
l l
l ll l
l l
ll l l
l lll l
l lll l
l l
l l
l
l l
l l
l l
l l ll l l
l ll l l
l l
l l l l l
l l l
l
l
l l l ll
l l
l
l ll l l
l
l
l l l
l
l l
l l
l
ll l
l
ll l
ll l
l l
ll l
l
l l
l l
l l
l
l l l
l l ll
l l l
l
l l
l l l
l l
l l
l l
l l l
l
l l
l
l l
ll ll l
l l
l l
l l
l l
l l
l
l l
l ll l l l
ll ll
l l l
l ll
l l l
l l
l
l l l ll l
l l l
l l
l l
l l
l
l l l
l l
l
l l ll l
l l l
l l
l l l
l l l
l l
l l
l ll
l l
l l l l l
l l l
l
l l
l l l l
ll l l
ll l
l l
l
l l l l
l l l l
ll l l
l l
l l
l l l
l l l
l l
l l
l l
l
l l
l l
l
l l
l l
lll l l
l l ll
l
l l
l l l l
l l l
l l
l
l
l l
l l l
l l
l l
l l l l
l l
l l
l
l l
l l l l l
l l
l l
l l
l ll l l
lll l l l
l l l
l l l
l l
l l
l l l
l
l l
l l ll
l l l
l l l
l
l l
l l
l l
l l
l l
l l
l l l
l l
l l
l l l
l l l
l l
l l l
l l
l l
l l l l
l l ll
l l
l
l ll
l
0 50 100 150
−0.40.00.40.8
Position (Mb)
log2mBAF
l
ll l
l
l l
l
l l l l
l l
ll
l l
l l l
l l
l l
l l
l l
l l
l l
l
l ll
ll l l l
l l
l l l
l l
l
l l l
l l l l l l
l
l l l
l l
l l ll l l
l l
l l l l ll l
ll
l l l ll
l l
l l
l l l
l
l ll
l
l l l
l l
ll l l
l l
l l l
l
l l ll
l l l
l ll l
l l l
l l ll l
l l
l
l l l
l l
l
l l l l
l
l l
l l l
l
l l l l l l
ll
l l
l l l
ll l
l
l lll l l
l lll l
lll l l l l l ll
ll l
l l
l l
l l
l l
l
l l l
l l
l
l l
l ll
l
l ll
l l l l l l
l
l l
l
l l l l
l
l l lll
l l
l l
l l l
l l
l l
l
l l
l
l l
l
l l l
l l
l l
l l l l
l
l l
l
ll l
l l l
l l l l
l
l l
l
l l l l l
l l
lll l l l
l l l
l l l
l
l l
l l
l
l l
l l
l
l l l
l
l l
l
l l
l l
l l l
l l l
l l l
l
l
l
l l
l l
l l
l l
l l l
l ll
ll l l l
ll l
ll l ll l l
l l l ll
l l
l l l
l l
l l
ll ll
l l l l
l l l
l
l
l l
l l
l l
l
l l
l l l
l
l l
l l l l
l l l
l
l l l
l l l
l l
l l
l
l l
l l
ll
l l
l
l l
l l l ll
l l
l
l l l l
l
ll
l l l l
l lll
ll
l
l
l l l l
l l l l
l lll
l
l l
l ll l
l l l
l l
l l
ll l
l l
l l
l ll
ll
l
l l
l
l l l ll ll
l ll
l l l l ll
l l
l l l
l l lll ll
l l lll l
l l
l l
l
l l l l
l l l
l l l
l l
l l
l
ll
l l l l
l
l l ll l l l l
l
l l l
lll l
l l
l l
l l
l
l l
l l
llll l
ll l l l
l ll l l l l l
l l
l l
l l l l
l ll l
l l l l l l
ll
l l
l
l l
l l l
l l
l l
l ll lll
l l l l l ll l
l l ll
l ll l
l lll l l l
l l
l
l ll l
l l
l lll
l ll l
ll l l l
l l l ll ll
l l
l l
ll
l l l ll
l l
l lll l
l ll l l l l l
l l l
l l
l
l l l l
l l
l l
l l ll ll
l l
l l l l l
l l
l ll
l l
l
l l
l l l
l l
l l l l
l l l
l l
l l
l l l
l l l ll
ll l llll
l l l l l
l lll lll
l l l
l l
l
l l
l l
ll
l l ll l l l l l ll l l l
l l l
l
l l l
l l
l l
l l l
ll lll l
l l l
lll
l l lll
l l
l l
l l
ll l
l
ll l l l
ll
l l l
l l
l l
l l l
l l ll l
l l
l l l
l l
l l
l ll l
l ll ll l l l
l ll
l ll
l
l l
l l
l l
l ll
l
l l l l
l l ll l
l l l lll
l
l
l l
l l l
l l l
l
l l
l l
l l l
ll l
l ll
l ll
l l l
ll l l l l ll l l ll lll l ll l l
l
l l l
l l
l l
l l
l
l l l l l
l l ll
l l l
l
l l l l
l ll
lll l
l l l l l
ll
l l l l
l l
l l l l
l l
l ll l l
l l
l l l
ll
0 50 100 150
0.00.20.40.60.81.0
Position (Mb)
Tumor BAF
l
l l l l
l l
l l l l
l l
l
l l l l
l l l
l
l l
l l l
l l l
l
l l
lllll l ll
l
l l l l
l l l l
l lll l
l l
l l l l l l
l ll l
l
l l
l
l l
l l l
l l l
l l l l l l
lll l
l l
l l l
l l lll
l l l l
l l
l l
l
l l ll
l l
ll
l l
l l
l
ll l
l l
l l
l l
ll l l l
ll l
l l l
l l
l l l l l
l
l
l ll
l l
l
l l
l l
l l
l l
l
l l l
l l
l l l l l
ll l l
l l l l
ll l
l l
l l l
ll l
l l
ll l
l
l l l l
l l
l l l
l l
l ll
l
ll l
l
l l
l
l l
l l l
ll l l
l l
l
l ll l l
l l
l l
l l l
l l l
l l
l l
l
l l l l
ll
l l
l
l ll l
ll l
l l
l l
l l l
l lll
l l
ll
l l
l l
l l
l l l l l
ll l
l l
l l l
l l
l l
l
l l l l l l
l l l
l l l
ll
l l l
l
ll l
l l ll
l l l l
l l
l l l l l
l l ll
l
l l
l l
l l
ll
l
l
l l l l
l l
l l l
l l
l l
l l l
lll
ll ll l
l l l l
l ll
l l l l l l l
l ll l
l
l l l
l l
l
l l l l
l l
l
l l l l l
l l l
l l l ll
l l l
l l
l ll
l ll
l
ll l l
l l l
l l
l l l l
l l l
l l ll l l l
l l
l l l
ll l
l l
l l
l l
l l
l l
l
l l l
l l
l
l l l l
l l
ll l l
l l ll l
l ll
l l l l
l l l l
l l l
l l l
l l
l l l l
l
l l
l
l l
l l l
l l
l l l
l l
l
l l
l l l
l l
l l
l l
l l
l
ll l l
l l l
l l l l l l l
l l
l l
l l
l
l l
l l
l l l
l
l l
l l l l l
l l l
l l l
l l
l l
l l
ll l l l
l l
l ll l
ll
l l l l l l l l l l
l l l
l l l
l
l l
l l
l
l l l
l l
l l l l l
l l l l
l l l l
l ll ll
l
l l l l
l l
l
l l
l l l
llll l l
l
l l
l
l l
l l ll l l
l l l l l
l
l ll
l l l l
ll l l l l l
l l l l
l l l
lll l l l
l l l l
l l
l
ll l l l
l
l l
l ll l
l l ll l
l l
ll l
l l l
l l l
l
l l
l l
l l l l l l
l l ll l l l l l l l
l
l l
l l l
l
l l l
l ll
ll
l
l l l l ll
l
l l l
l
ll
l l
l l l
l ll l l
l l
l l
l lll l
l ll l
l
l l
l l
ll l
l l
l l l
l
l
l l
l l l
l l lll l
l
l l l l l
l l
l l
l l l l
l l
l l l
l
l l l l l
l
l l
l l ll
l l ll
ll
l l
l l
l l
ll l l l
l l
l
l l
l
l l l
l l l l l
l l l
l l
l l
l l l
l
l l
ll l
l
l l l l
l l
l l
ll
l l ll l
l l
l l
l l
l l l l
l l
l l l ll
l l l
l l
l l ll
l l l l l l l l
l l
l ll
l l l
l l
l ll
lll l
l l
ll l
l
l l
l
l
l l
l l l
l l
l
l l
l l
l ll l
lll
l l l
l
l l l
ll l
l l
l l
l l
l l l
l l ll l
l l
l l
l l
0 50 100 150
0.00.20.40.60.81.0
Position (Mb)
Normal BAF
Fig. 2 Visualization of joint segmentation results on a chromosome. From top to bottom, the panels display the signals of log2ratio, log2mBAF, tumor BAF, and normal BAF, respectively. The gray dots indicate the data points measured at genotyped loci ordered by their physical locations across the chromosome. The red segments indicate the SCNA segments produced by the joint segmentation algorithm