2013年12月,FDA批准了首个高通量DNA测序仪,这种仪器可以帮助人们快速有效的测序人类DNA,用于遗传学检测、医学诊断和个性化的药物治疗。在这一审批过程中,研究者们首次使用了一组人类标准基因型的参考数据集。这些标准基因型由美国国家标准技术研究所NIST和瓶中基因组联盟(Genome in a Bottle)共同建立。
Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls
Clinical adoption of human genome sequencing requires methods that output genotypes with known accuracy at millions or billions of positions across a genome. Because of substantial discordance among calls made by existing sequencing methods and algorithms, there is a need for a highly accurate set of genotypes across a genome that can be used as a benchmark. Here we present methods to make high-confidence, single-nucleotide polymorphism (SNP), indel and homozygous reference genotype calls for NA12878, the pilot genome for the Genome in a Bottle Consortium. We minimize bias toward any method by integrating and arbitrating between 14 data sets from five sequencing technologies, seven read mappers and three variant callers. We identify regions for which no confident genotype call could be made, and classify them into different categories based on reasons for uncertainty. Our genotype calls are publicly available on the Genome Comparison and Analytic Testing website to enable real-time benchmarking of any method.