Project report : Hybrid genome assemblies of 26
strains : Escherichia sp and Pseudomonas sp .
To the attention of : First name LAST NAME , Company - CITY, Country
Analysis/Writing : First name LAST NAME, Bioinformatics Engineer
Corrections/Validation : First name LAST NAME, Ph.D, Operations managerFigure 1 - Summary of the quality of the results
Scores out of 5 of the results according to (Data) data quality and contamination; (Contiguity) number and size of contigs; (Completion) assembly completeness; (Correctness) assembly errors; (Annotation) annotation.
The goal of the project was to assemble the genomes of 26
strains of bacteria using high-throughput sequencing data from Oxford
Nanopore and Illumina technologies.
Overall, the assembly metrics are good for your samples. We observe a good accuracy and contiguity. For the samples D46, we obtain a more fragmented assembly.
Due to the completion metrics, we can estimate that the assemblies are complete but 2 of your samples, D22 and D24, have lower results.
Moreover, the total sizes of the assemblies are close to the size of the reference genomes.
This report describes all bioinformatics analyses that were performed following high-throughput sequencing using Oxford Nanopore and Illumina technologies. The objective was to perform de novo hybrid assemblies of 26 strains : Escherichia sp and Pseudomonas sp and annotation of these assembled genomes.
This process was carried out in the following stages:
Below is a representation of the key steps from the bioinformatics pipeline use to obtain de novo assemblies.
Figure 2 - Key steps from the bioinformatics pipeline leading to the hybrid assembly and annotation of the strains
Summary tables of results :
For each of the samples :
Below is an example of the structure of each sub-folder corresponding to the samples :
Sample/
├── Sample_assembly.fasta
├── Sample_coverage_plots
│ ├── 500kb.png
│ └── contig_1.pdf
│ └── …
├── Sample_k2_report.txt
└── Sample_prokka
├── mygenome.faa
├── mygenome.ffn
├── mygenome.gbk
├── mygenome.gff
└── mygenome.tsvIllumina data cleaning ensures that we have excellent quality reads (>Q30), thus, ensuring the best possible quality for downstream analysis. Read adapters are removed and then the reads are filtered based on their quality.
Details of the data are available in Appendix 1.