Show simple item record

dc.contributor.authorBeka, Nathan
dc.date.accessioned2021-02-15T12:07:18Z
dc.date.available2021-02-15T12:07:18Z
dc.date.issued2020-12-16
dc.identifier.urihttp://hdl.handle.net/2299/23909
dc.description.abstractNext-generation sequencing has empowered genomics by making it possible to sequence genomes at a lower cost and less time compared to the traditional Sanger method. However, these improvements suffer from reduced accuracy when compared with the Sanger method. During the library preparation stage of sequencing, artefacts can be introduced that affect the reliability of a read. These artefacts can arise from biases due to the structure of the genome, such as preferential splitting of DNA between specific nucleotides, bias of adapter ligation towards certain base pair identities, and temperature dependent denaturation due to nucleotide composition. To investigate these issues a library preparation model was developed to simulate the occurrences and investigate effects of such artefacts. The implemented model simulates the DNA fragmentation, adapter ligation and PCR amplification stages of the library preparation process. A set of parameters characterizing these steps and a DNA sequence are used as input and the output is an array of values representing the number of DNA fragments that cover each position of the input sequence (“coverage”). To validate the model a Genetic Algorithm (GA) was used to find parameters that would lead to coverage values that are closely similar to what is found in empirical sequencing data. The GA was able to acquire such parameters for a subsection of the Mycobacterium tuberculosis and Plasmodium falciparum genomes but failed when applied to the TP53 gene of the Homo sapiens genome. From this it was deduced that the model was better at predicting coverage when applied to genomes with subregions of nucleotide repeats. To find the effects of parameters representing each step of the library preparation process the model was applied to a set of in silico generated DNA that represent different sequence structures (GC-rich, AT-rich, neutral composition and a sequence with specific areas of GC and AT rich repeats). My study found that the parameters for the fragmentation, adapter ligation and PCR steps affected coverage. I also found that a combination of parameters between consecutive steps further affected coverage. In the fragmentation step, large fragment size had a negative effect on coverage (p = 0.0), in the adapter ligation step, coverage of AT-rich sequences was affected by a terminal bias (p = 0.0). Modifying parameters for the PCR step affected the coverage of both GC and AT rich sequences due to a temperature dependent bias. Finally, an interaction between the parameters of fragmentation and other steps were found to further reduce coverage. This simulation was able to suggest parameters that need to be fine-tuned to improve coverage.en_US
dc.language.isoenen_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.rightsAttribution 3.0 United States*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/us/*
dc.subjectNext Generation Sequencingen_US
dc.subjectLibrary Preparationen_US
dc.subjectPCRen_US
dc.subjectDNA Ligationen_US
dc.subjectDNA Fragmentation Genetic Algorithmen_US
dc.subjectDNAen_US
dc.subjectModellingen_US
dc.subjectGenomicsen_US
dc.titleEstimation and Modelling of Errors in the Library Preparation Stage of Next Generation Sequencingen_US
dc.typeinfo:eu-repo/semantics/doctoralThesisen_US
dc.identifier.doidoi:10.18745/th.23909*
dc.identifier.doi10.18745/th.23909
dc.type.qualificationlevelDoctoralen_US
dc.type.qualificationnamePhDen_US
dcterms.dateAccepted2020-12-16
rioxxterms.funderDefault funderen_US
rioxxterms.identifier.projectDefault projecten_US
rioxxterms.versionNAen_US
rioxxterms.licenseref.urihttps://creativecommons.org/licenses/by/4.0/en_US
rioxxterms.licenseref.startdate2021-02-15
herts.preservation.rarelyaccessedtrue
rioxxterms.funder.projectba3b3abd-b137-4d1d-949a-23012ce7d7b9en_US


Files in this item

Thumbnail
Thumbnail

This item appears in the following Collection(s)

Show simple item record

info:eu-repo/semantics/openAccess
Except where otherwise noted, this item's license is described as info:eu-repo/semantics/openAccess