The SNP/InDel Data Viewer displays single nucleotide polymorphism (SNP) and
insertion/deletion polymorphism (InDel) data for maize. The system provides
access to sequence alignment data, gel images, and genetic map position, and
provides interoperability between bioinformatics systems and databases.
Statistics
Genetic Map
# SNPs
# InDels
Total SNP/InDels
Intermated B73xMo17 (IBM)
132
84
216
IBM Neighbors
-
89
89
Total Maps
132
173
305
SNP/InDel Generation and Data Flow
The SNP/InDel data displayed via SNP/InDel Data Viewer comes from the SNP/InDel
Pipeline of MMP-LIMS (6). The SNP/InDel Pipeline is used to manage
the flow of data generated during the process of SNP/InDel discovery. The
SNP/InDel Pipeline consists of the SNP Discovery Primer Design module and the
SNP/InDel Finder module. The SNP Discovery Primer Design module aids in the
design of primers. The resulting primers are used to process sequences to find
SNPs and/or InDels via SNP/InDel Finder.
SNPs are discovered by detecting nucleotide polymorphisms by sequencing a
region of DNA across multiple lines of maize. The SNP Discovery Primer Design
module is used to design primer pairs for amplifying the DNA segments used in
sequencing.
DNA sequence data is entered into the module, along with information
including distance between primer pairs and region of the sequence to search
for primer pairs. Using the given parameters, this program builds an input file
for Primer3.
SNP Discovery Primer Design module checks the primers returned from Primer3 (5)
for repeats in the primer sequence and rejects those with repeats. The output
of the program is a list of unique SNP discovery primers.
The primers are used to amplify and sequence DNA in 12 different lines of maize
(Mp708, IHO, Mo17, ILO, B73, W22_R-scm2, GT119, CO159, T218, Tx501, NC7A, Tx303).
Phred (3) is used to perform base calling of the resulting forward and reverse
sequencing trace files. This is followed by trimming of the forward and reverse
sequence based on the primers and quality scores. The genotyping is performed using
ABI 3100 and 3700 sequencers to generate the chromatogram information which is
automatically translated to map scores by the MMP-LIMS Scoring Tool. MapMaker (4)
is used to generate a genetic map showing the SNPs in different chromosomes and the
database is then updated by the MMP-LIMS system.
Phrap (2) is then used for sequence assembly, and then a script combines the
sequences into 12-sequence groups with each group corresponding to a single SNP
discovery (dSNP) primer pair.
The sequences are aligned, by clustalw (8), for each primer
pair, and then the SNP/InDel Finder script is used to calculate base frequencies at
each position in the alignment. If 12 out of 12 sequences contain the same base at a
position, then no SNP is present. If one sequence is different at a position than
the other 11 (1:11), then the possibility of a SNP is questionable. Candidate SNPs
are defined by positions where at least two sequences are different from the other
10 (2 : 10) or better (3 : 9), (4 : 8), (5 : 7) or (6 : 6). To locate InDels,
SNP/InDel Finder looks for gaps in the alignment representing insertions/deletions.
The InDel images are stored electronically and the genotype scores are entered
into the system via the MMP-LIMS Scoring Tool.
Other SNP resources currently exist, including the maize SNP alignments produced by
AutoSNP (7), and the NCBI dbSNP database (1), however these do not
currently provide the combination of sequence alignments, gel images, genetic map
data, and database interoperability available with the SNP/InDel Data Viewer.
(2) Ewing,B. and Green,P. (1998) Base-calling of automated sequencer
traces using phred. II. Error probabilities. Genome Res., 8,
186-194.
(3) Ewing,B., Hillier,L., Wendl,M.C. and Green,P. (1998) Base-calling
of automated sequencer traces using phred. I. Accuracy assessment.
Genome Res., 8, 175-185.
(4) Lander,E.S., Green,P., Abrahamson,J., Barlow,A., Daly,M.J.,
Lincoln,S.E. and Newburg,I. (1987) MAPMAKER: an interactive
computer package for constructing primary genetic linkage maps
of experimental and natural populations. Genomics, 1, 174–181.
(5) Rozen,S. and Skaletsky,H.J. (2000) Primer3 on the WWW for
general users and for biologist programmers. In Krawetz,S.
and Misener,S. (eds), Bioinformatics Methods and Protocols:
Methods in Molecular Biology. Humana Press, Totowa, NJ,
pp. 365-386.
(6) Sanchez-Villeda,H., Schroeder,S., Polacco,M., McMullen,M.D., Havermann,S.,
Davis,G., Vroh-Bi,I., Cone,K., Sharopova,N., Yim,Y., Schultz,L., Duru,N., Musket,T.,
Houchins,K., Fang,Z., Gardiner,J., Coe,E. Development of an Integrated Laboratory
Information Management System for the Maize Mapping Project, Bioinformatics, 19(16), 2022-2030.
(8) Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTALW:
Improving the sensitivity of progressive multiple sequence
alignment through sequence weighting, positions-specific gap
penalties and weight matrix choice. Nucleic Acids Res., 22,
4673-4680.