DNA read mapping is one of the fundamental problem in bioinformatics. The general goal is to map billions of short pieces of DNA (reads) to a high quality reference genome, while tolerating errors including genomic variations between individuals and the sequencer imprecision.
The seed-and-extend strategy, a general mapping heuristic, efficiently maps erroneous DNA fragments to the reference by breaking the read into multiple nonoverlapping segments, called "seeds", and assume at least one of the seed is error free. By using seed to index into the reference genome, seed-and-extend mappersdrastically reduce the search space hence improves the run time.
However, a fundamental trade-off of seed-and-extend mappers is the balance between seed lengths and seed count. Greater seed length further reduces the search space therefore improve the performance. However, greater seed spacealso renders fewer seeds hence make the mapping process more error prune.
In this talk, I will present an on-going work, VEST (Versatile Error-resilient Seed Technique). VEST aims to adapt long seeds in order to make them error-resilient, hence achieves both high error tolerance as well as high mapping speed.
Presented in Partial Fulfillment of the CSD Speaking Skills Requirement.