directly to the sequences, without opening them for you to view first.
Figure 2.111: Trimming creates annotations on the regions that will be ignored in the assembly process.
These annotated parts of the sequences will be ignored in the subsequent assembly.
A natural question is: Why not simply delete the trimmed regions instead of annotating them?
In some cases, deleting the regions would do no harm, but in other cases, these regions could potentially contain valuable information, and this information would be lost if the regions were deleted instead of annotated. We will see an example of this later in this tutorial.
2.17.2 Assembling the sequencing data
The next step is to assemble the sequences. This is the technical term for aligning the sequences where they overlap and reverse the reverse reads to make a contiguous sequence (also called a contig).
In this tutorial, we will use assembly to a reference sequence. This can be used when you have a reference sequence that you know is similar to your sequencing data.
Toolbox in the Menu Bar|Molecular Biology Tools ( )|Sequencing Data Analysis ( )| Assemble Sequences to Reference ( )
In the first dialog, select the nine sequencing reads and click Nextto go to the second step of the assembly where you select the reference sequence.
Click the Browse and select button ( ) and select the "ATP8a1 mRNA (reference)" from the
"Sequencing data" folder (see figure2.112). You can leave the other options in this window set to their defaults.
Figure 2.112: The "ATP8a1 mRNA (reference)" sequence selected as reference sequence for the assembly.
ClickNextand choose to use the trim information (that you have just added).
ClickNextand choose to Save your results. The next step will ask you for a location to save the results to. You can just accept the default location, or you could use the left hand icon under the
"Save in folder" heading to create a new folder to save your assembly into.
ClickFinishand the assembly process will begins.
2.17.3 Getting an overview of the contig
The result of the assembly is a Contigwhich is an alignment of the nine reads to the reference sequence. Click Fit width ( ) to see an overview of the contig. To help you determine the coverage, display a coverage graph (see figure2.113):
Alignment info in Side Panel|Coverage|Graph
Figure 2.113:An overview of the contig with the coverage graph.
This overview can be an aid in determining whether coverage is satisfactory, and if not, which regions a new sequencing effort should focus on. Next, we go into the details of the contig.
2.17.4 Finding and editing conflicts
Click Zoom to 100% ( ) to zoom in on the residues at the beginning of the contig. Click the Find Conflictbutton at the top of theSide Panelor press theSpacekey to find the first position where there is disagreement between the reads (see figure2.114).
In this example, the first read has a "T" (marked with a light-pink background color), whereas the second line has a gap. In order to determine which of the reads we should trust, we assess the quality of the read at this position.
A quick look at the regularity of the peaks of read "Rev2" compared to "Rev3" indicates that we should trust the "Rev2" read. In addition, you can see that we are close to the end of the end of
Figure 2.114:Using the Find Conflict button highlights conflicts.
"Rev3", and the quality of the chromatogram traces is often low near the ends.
Based on this, we decide not to trust "Rev3". To correct the read, select the "T" in the "Rev3"
sequence by placing the cursor to the left of it and dragging the cursor across the T. PressDelete ( ).
This will resolve the conflict.
2.17.5 Including regions that have been trimmed off Clicking theFind Conflictbutton again will find the next conflict.
This is the beginning of a stretch of gaps in the consensus sequence. This is because the reads have been trimmed at this position. However, if you look at the read at the bottom, Fwd2, you can see that a lot of the peaks actually seem to be fine, so we could just as well include this information in the contig.
If you scroll a little to the right, you can see where the trimmed region begins. To include this region in the contig, move the vertical slider at position 2073 to the left (see figure19.11).
Figure 2.115:Dragging the edge of the trimmed region.
You will now see how the gaps in the consensus sequence are replaced by real sequence information.
Note that you can only move the sliders when you are zoomed in to see the sequence residues.
2.17.6 Inspecting the traces
Clicking theFind Conflictbutton again will find the next conflict.
Here both reads are different than the reference sequence. We now inspect the traces in more detail. In order to see the details, we zoom in on this position:
Zoom in in the Tool Bar ( )|Click the selected base|Click again three times
Now you have zoomed in on the trace (see figure2.116).
Figure 2.116:Now you can see all the details of the traces.
This gives more space between the residues, but if we would like to inspect the peaks even more, simply drag the peaks up and down with your mouse (see figure19.2).
Figure 2.117:Grab the traces to scale.
2.17.7 Synonymous substitutions?
In this case we have sequenced the coding part of a gene. Often you want to know what a variation like this would mean on the protein level. To do this, show the translation along the contig:
Nucleotide info in the Side Panel | Translation | Show | Select ORF/CDS in the Frame box
The result is shown in figure2.118.
You can see that the variation is on the third base of the codon coding for threonine, so this is a synonymous substitution. That is why the T is colored yellow. If it was a non-synonymous substitution, it would be colored in red.
Figure 2.118:Showing the translation along the contig.
2.17.8 Getting an overview of the conflicts
Browsing the conflicts by clicking theFind Conflictbutton is useful in many cases, but you might also want to get an overview of all the conflicts in the entire contig. This is easily achieved by showing the contig in a table view:
Press and hold the Ctrl-button ( on Mac)| Click Show Table ( ) at the bottom of the view
This will open a table showing the conflicts. You can right-click theNotefield and enter your own comment. In this dialog, enter a new text in theNameand clickOK.
When you edit a comment, this is reflected in the conflict annotation on the consensus sequence.
This means that when you use this sequence later on, you will easily be able to see the comments you have entered. The comment could be e.g. your interpretation of the conflict.
2.17.9 Documenting your changes
Whenever you make a change like deleting a "T", it will be noted in the contig's history. To open the history, click the fHistory ( ) icon at the bottom of the view.
In the history, you can see the details of each change (see figure2.119).
2.17.10 Using the result for further analyses
When you have finished editing the contig, it can be saved, and you can also extract and save the consensus sequence:
Figure 2.119: The history of the contig showing that a "T" has been deleted and that the aligned region has been moved.
Right-click the name "Consensus"|Open Copy of Sequence|Save ( )
This will make it possible to use this sequence for further analyses in the CLC Genomics Workbench. All the conflict annotations are preserved, and in the sequence's history, you will find a reference to the original contig. As long as you also save the original contig, you will always be able to go back to it by choosing the Reference contig in the consensus sequence's history (see figure2.120).
Figure 2.120: The history of the consensus sequence, which has been extracted from the contig.
Clicking the blue text "Reference contig" will find and highlight the name of the saved contig in the Navigation Area. Clicking the blue text "history" to the right will open the history view of the earlier contig. From there, you can choose other views, such as the Read mapping view, of the contig.