SNP/InDel Data Viewer Summary

The SNP/InDel Data Viewer displays single nucleotide polymorphism (SNP) and insertion/deletion polymorphism (InDel) data for maize. The system provides access to sequence alignment data, gel images, and genetic map position, and provides interoperability between bioinformatics systems and databases.

Statistics

Genetic Map# SNPs# InDelsTotal SNP/InDels
Intermated B73xMo17 (IBM)
132
84
216
IBM Neighbors
-
89
89
Total Maps
132
173
305


SNP/InDel Generation and Data Flow

The SNP/InDel data displayed via SNP/InDel Data Viewer comes from the SNP/InDel Pipeline of MMP-LIMS (6). The SNP/InDel Pipeline is used to manage the flow of data generated during the process of SNP/InDel discovery. The SNP/InDel Pipeline consists of the SNP Discovery Primer Design module and the SNP/InDel Finder module. The SNP Discovery Primer Design module aids in the design of primers. The resulting primers are used to process sequences to find SNPs and/or InDels via SNP/InDel Finder.

  1. SNPs are discovered by detecting nucleotide polymorphisms by sequencing a region of DNA across multiple lines of maize. The SNP Discovery Primer Design module is used to design primer pairs for amplifying the DNA segments used in sequencing.
  2. DNA sequence data is entered into the module, along with information including distance between primer pairs and region of the sequence to search for primer pairs. Using the given parameters, this program builds an input file for Primer3.
  3. SNP Discovery Primer Design module checks the primers returned from Primer3 (5) for repeats in the primer sequence and rejects those with repeats. The output of the program is a list of unique SNP discovery primers.
  4. The primers are used to amplify and sequence DNA in 12 different lines of maize (Mp708, IHO, Mo17, ILO, B73, W22_R-scm2, GT119, CO159, T218, Tx501, NC7A, Tx303). Phred (3) is used to perform base calling of the resulting forward and reverse sequencing trace files. This is followed by trimming of the forward and reverse sequence based on the primers and quality scores. The genotyping is performed using ABI 3100 and 3700 sequencers to generate the chromatogram information which is automatically translated to map scores by the MMP-LIMS Scoring Tool. MapMaker (4) is used to generate a genetic map showing the SNPs in different chromosomes and the database is then updated by the MMP-LIMS system.
  5. Phrap (2) is then used for sequence assembly, and then a script combines the sequences into 12-sequence groups with each group corresponding to a single SNP discovery (dSNP) primer pair.
  6. The sequences are aligned, by clustalw (8), for each primer pair, and then the SNP/InDel Finder script is used to calculate base frequencies at each position in the alignment. If 12 out of 12 sequences contain the same base at a position, then no SNP is present. If one sequence is different at a position than the other 11 (1:11), then the possibility of a SNP is questionable. Candidate SNPs are defined by positions where at least two sequences are different from the other 10 (2 : 10) or better (3 : 9), (4 : 8), (5 : 7) or (6 : 6). To locate InDels, SNP/InDel Finder looks for gaps in the alignment representing insertions/deletions.
  7. The InDel images are stored electronically and the genotype scores are entered into the system via the MMP-LIMS Scoring Tool.
Other SNP resources currently exist, including the maize SNP alignments produced by AutoSNP (7), and the NCBI dbSNP database (1), however these do not currently provide the combination of sequence alignments, gel images, genetic map data, and database interoperability available with the SNP/InDel Data Viewer.

References

(1) dbSNP Home Page. 11 Aug 2003. Single Nucleotide Polymorphism. http://www.ncbi.nlm.nih.gov/SNP/. Accessed 2003 Dec 17.

(2) Ewing,B. and Green,P. (1998) Base-calling of automated sequencer traces using phred. II. Error probabilities. Genome Res., 8, 186-194.

(3) Ewing,B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res., 8, 175-185.

(4) Lander,E.S., Green,P., Abrahamson,J., Barlow,A., Daly,M.J., Lincoln,S.E. and Newburg,I. (1987) MAPMAKER: an interactive computer package for constructing primary genetic linkage maps of experimental and natural populations. Genomics, 1, 174–181.

(5) Rozen,S. and Skaletsky,H.J. (2000) Primer3 on the WWW for general users and for biologist programmers. In Krawetz,S. and Misener,S. (eds), Bioinformatics Methods and Protocols: Methods in Molecular Biology. Humana Press, Totowa, NJ, pp. 365-386.

(6) Sanchez-Villeda,H., Schroeder,S., Polacco,M., McMullen,M.D., Havermann,S., Davis,G., Vroh-Bi,I., Cone,K., Sharopova,N., Yim,Y., Schultz,L., Duru,N., Musket,T., Houchins,K., Fang,Z., Gardiner,J., Coe,E. Development of an Integrated Laboratory Information Management System for the Maize Mapping Project, Bioinformatics, 19(16), 2022-2030.

(7) SNPs. Nov 2002. SNP Discovery using AutoSNP. http://www.cerealsdb.uk.net/discover.htm. Accessed 2003 Dec 17.

(8) Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTALW: Improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positions-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673-4680.