*To whom correspondence should be addressed
Received April 10,1995; Revised and Accepted June 2,1995
The CAG triplet repeat region of the Huntington's disease gene was amplified in 923 single sperm from three affected and two normal individuals. Average-size alleles (15-18 repeats) showed only three contraction mutations among 475 sperm (0.6%). A 30 repeat normal allele showed an 11% mutation frequency. The mutation frequency of a 36 repeat intermediate allele was 53% with 8% of all gametes having expansions which brought the allele size into the HD disease range (>=38 repeats). Disease alleles (38-51 repeats) showed a very high mutation frequency (92-99%). As repeat number increased there was a marked elevation in the frequency of expansions, in the mean number of repeats added per expansion and the size of the largest observed expansion. Contraction frequencies also appeared to increase with allele size but decreased as repeat number exceeded 36. Our sperm typing data are of a discrete nature rather than consisting of smears of PCR product from pooled sperm. This allowed the observed mutation frequency spectra to be compared to the distribution calculated using discrete stochastic models based on current molecular ideas of the expansion process. An excellent fit was found when the model specifled that a random number of repeats are added during the progression of the polymerase through the repeated region.
Huntington's disease (HD) is a hereditary neurodegenerative disease, usually with an onset in middle age, and associated with progressive disordered movements, decline in cognitive function and emotional disturbance. It is one of nine triplet repeat diseases (1) which includes fragile X (FRAXA, FRAXE), myotonic dystrophy (DM), spino-bulbar muscular atrophy (SBMA), spinocerebellar ataxia type 1 (SCA1), dentatorubral-pallidoluysian atrophy (DRPLA), Haw River Syndrome (HRS) and most recently, Machado-Joseph disease (2). In each case, an increase in number of triplets is associated with the disease phenotype, but significant differences exist among the diseases in the number and range of triplet repeats that cause disease, somatic and gonadal mosaicism, and time in development at which mutations occur.
Mosaicism in an individual's sperm DNA relative to lymphocyte DNA has been established in HD (3-5). Heretofore, studies on sperm have only been on purified DNA samples from semen, revealing a radioactive smear of PCR products on acrylamide gels and denoting a heterogeneous triplet repeat length. The extent of triplet expansion could only be evaluated on a gross level, with maximum, minimum and mode values of allele size deduced from densitometric analysis. A problem with this approach is that faint signals derived from a small subset of the sperm population may be obscured by the inherent PCR stutter from more abundant allele sizes. Perhaps more serious is the possibility that data from amplification of semen DNA will be strongly biased towards smaller alleles since they will have a competitive advantage during PCR.
Single sperm typing has been used to measure germline mutation frequencies at the SBMA and DM loci (6,7). We performed single sperm PCR analysis to examine the allele size variation in sperm cells within and among HD affected and unaffected individuals. We made an exact count of the number of mutant alleles and their sizes, and without the bias mentioned above. We compared stochastic models of the mutation process to the data, an approach which depends on the availability of discrete data on allele size rather than the continuously distributed smears obtained from PCR of semen DNA. Finally, using sperm from a donor carrying an intermediate allele (IA), we were able to estimate the risk of having affected offspring.
We determined the mutation frequency in HD individuals by comparing the size of individual sperm PCR product to the somatic allele size from the same individual obtained by amplifying lymphoblast DNA. Three investigations (3,4,8) concluded that little if any somatic variation exists among different tissues of HD patients. A single report has proposed that different regions of the brain can vary in repeat number but in that study (5) no differences were found among 14 other tissues which included whole testis and spleen.
Table 1 shows the results of typing 923 sperm, from three individuals with HD and two normal individuals representing nine different allele sizes, using a sperm typing protocol modified for analysis of trinucleotide repeat disease alleles (7). Among the diseased individuals, 316 normal alleles were detected, while 287 HD alleles were observed, which is not statistically different (P >0.20) from the expected 1:1 segregation. Assuming no meiotic segregation distortion, our PCR conditions are capable of amplifying large and small alleles with almost equal efficiency. The lack of any meiotic segregation distortion was confirmed when data on 522 sperm coamplified for the tightly linked dinucleotide repeat marker D4S127 revealed 1:1 segregation (data not shown).
Among the 287 sperm carrying HD chromosomes, 96% differed in size from the donors' somatic HD allele. Of these, 267 were expansions and nine were contractions. Representative allele-sizing data are shown in Figure 1. Two thirds of all the samples were studied to distinguish between alterations in the number of CAG and CCG repeats since the latter are polymorphic in the human population (9,10) and some mutations might be the result of CCG repeat expansions or contractions. However, no example of a CCG mutation was detected (data not shown).
An intermediate allele (36 CAG repeats) from an unaffected individual in his fifth decade showed 42% expansions and 11% contractions for an overall mutation frequency of 53%. Sixteen percent were mutations with > 38 repeats. Mutations of an allele with 30 repeats were less frequent (9% expansions and 2.5% contractions). No mutations into the HD range were detected among these 80 sperm. We observed seven expansions (+ 1 repeat) and two contractions (-1 and -3 repeats). Representative data for two expansion events for this donor (B) are shown in Figure 2. Among 475 sperm from 5 alleles in the 15-18 repeat range, only three small contractions (1 or 2 repeats) were observed.
The CAG repeat number distribution in single sperm for the diseased and one intermediate allele donors are shown in Figure 3. There is a significant change in the size distribution of the expansion mutations with increasing allele size. It can be seen to progress from an apparently normal distribution around the somatic DNA size in the case of the 36 repeat allele (Fig. 3d), to a markedly more uniform distribution of expanded alleles up to twice the size of the somatic DNA in the case of the 49 and 51 repeat CAG tracts (Fig. 3a and 3b). Contraction sizes were generally limited to six repeats or less.
It is well known that PCR stutter occurs during the amplification of microsatellite repeats. One potential problem is that Taq polymerase errors during the many amplification cycles required for single sperm analysis could produce PCR products with an allele size different from that of the input sperm DNA molecule. It is easy to imagine how contraction mutations might result from Taq artifacts since smaller PCR products would have a selective amplification advantage. Control experiments involving amplification of normal alleles at the SBMA locus in single sperm showed however that the observed contraction mutations were not artifacts of PCR (6). However, the probability of an artifact would be expected to depend upon the number of repeats in the input DNA template. The vast majority of the mutations we observed at the HD locus were large expansions. Although it is difficult to imagine selective pressures that would allow molecules expanded in vitro to take over the population of PCR product, we nevertheless performed control PCR experiments to examine this question.
Analysis of HD sperm from donor HD2 (51 repeats) showed a mean change of +21 repeats. Single molecule dilution experiments were carried out on somatic DNA from this donor. Among the 30 disease-length molecules examined, we detected seven events with a change of -1 repeats and three events with a change of +1 repeats for a mean change of -0.13 repeats. We do not know whether the variation in allele size is due to PCR artifact, whether there is some variation in allele size among individual lymphoblast cells or whether the variation seen is due to the inherent error in the exact measurement of molecular weight using the A.L.F.TM. Regardless, the variation in repeat number seen in the somatic DNA of HD2 is insignificant relative to the large variation in repeat number observed among the sperm cells of this donor. Thus, the occurrence of PCR artifacts would have little if any effect on the estimates of mutation frequency and size distribution. Single molecule dilution studies on somatic DNA from donor HD 232 SP (36 repeats) showed that PCR artifacts would also have little if any impact on our estimate of the mutation frequency spectrum of the smaller alleles. Among 35 molecules tested we detected one event with + 1 repeat change and one event with -1 repeat change. Again, the exact source of this variation is not known.
The single sperm typing data on disease alleles can be compared with HD paternal transmissions studied in families. In five reports (3,8,11-13) exact repeat number changes could be determined in 141 paternal transmissions. Of these, 72% showed a contraction or expansion mutation. The single sperm data for the three HD alleles we studied gave a 96% average mutation frequency. Expansions account for 97% of the sperm mutations and 91% (range: 83-100%) of the mutations detected in paternal transmissions (3,8,11,12). An average of 5.6 repeats was added for each expansion observed in the pedigrees (3,8,11-13) (range: 3.0 to 9.0) while 12.1 repeats was the average size of an expansion detected in sperm from HD2, HD6 and HD15. The differences between the sperm and family data could result from the small number of HD sperm donors studied and that two of them had an HD allele size significantly larger than the average paternal HD allele size in the families. There may also be differences among the populations studied (3,8,11-13). We note that the sperm data, overall, corresponds best to the family data (3) (83% mutation frequency; 83% of mutations were expansions and an average of nine repeats were added per expansion) which includes a large number of Venezuelan pedigrees, the source of our sperm donors. Some of the differences between the sperm and family data could also reflect selection against sperm with very large alleles in terms of survival or ability to carry out fertilization, or postzygotic selection.
Alleles in the 33-37 repeat size range have been found in parents of HD offspring in families with no previous history of the disease. These so called 'intermediate alleles' are thought to be a source of new HD mutations (8,13,14-16). Among the 163 sperm analyzed from the individual carrying a 36 and 17 repeat allele, 8% of all the gametes had expanded into the HD size range. Additional studies on other intermediate alleles are necessary. It is important clinically to assess the risk of having a child carrying an HD allele in other males found to carry intermediate alleles. The degree to which the risk may vary among chromosomes carrying the same allele size, but with different haplotypes (17) in the Huntington gene region, must be specifically addressed. Such studies could lead to information on potential cis-acting elements that affect mutation frequency (18,19).
Comparison of the sperm typing data on alleles in the 1518 repeat range to the results of family studies are difficult, given the low mutation frequency of these events. Our estimate of the mutation frequency is 0.6%, whereas a recent estimate based on pedigree analysis was 0.2% (12). The reliability of these estimates is uncertain considering the small number of mutations observed. In addition, sperm typing measures the mutation frequency of specifically chosen alleles whereas in family studies, mutation frequency estimates represent transmissions encompassing the full distribution of alleles in the population.
Single sperm data from one SBMA individual with 47 repeats shows an 81% mutation frequency with 66% expansions and 15% contractions (7). Most of the mutation events were +/1 or 2 repeats. The largest expansion seen in this SBMA individual was + 11. The sperm data on the HD allele closest in size (49 repeats) shows a 95% expansion and a 3% contraction frequency with an average 10.8 repeats added per expansion. The largest expansion was +33 repeats. Even the 38 repeat HD allele has a higher mutation frequency, a greater average change in repeat number per mutation and a broader distribution in mutant allele size than the 47 repeat SBMA allele. Alleles at the SBMA locus in the high normal range (28-31 repeats) show only a 0.4% expansion frequency (7) compared to the 30 repeat allele we describe in this paper which has a 9% expansion frequency.
These differences show that factors in addition to overall repeat size must be important contributors to the measured triplet expansion frequency in sperm. Interrupted repeat tracts, as seen in SCAN (20), DM (21) and FraxA (22,23) cannot contribute to this difference since no interruptions of the CAG stretches have been reported for SBMA or HD. It remains to be determined whether differences exist between the loci in terms of selection against large alleles during gametogenesis or the molecular mechanism itself (3), including the possibility of cis-acting elements (18).
Studies on normal alleles at the SBMA locus showed that an increase in contraction frequency accompanies an increase in expansion mutations. As allele size increases from 21 (normal) to 47 (SBMA) repeats, the contraction frequency goes up seven fold while the expansion frequency increases 200 fold (6,7). At the huntingtin locus the contraction frequency of the 36 repeat allele appears higher than that of smaller alleles. However, HD alleles with expansion frequencies approaching 100% have reduced contraction frequencies relative to the 36 repeat allele. Additional data on contraction frequencies as a function of repeat size are needed.
It is clear from Figures 3A-3D that the mutation frequency spectrum of individuals with different somatic allele sizes vary significantly. Because we have been able to generate discrete (count) data on the distribution of allele sizes in sperm, we can compare our data to mutation frequency spectra generated from discrete stochastic models of the expansion process based on current molecular ideas on triplet repeat expansion.
A number of models may account for trinucleotide repeat instability. Both contraction and expansion events can result from unequal reciprocal recombination between homologs or sister chromatids. In either case, equal numbers of contractions and expansions should be observed, an expectation that is not fulfilled by the HD or SBMA data, although non-reciprocal recombination mechanisms (24) cannot be excluded. Instability due to replication slippage is a more likely possibility (22,23,25). Recent experimental evidence on dinucleotide repeat mutations in yeast and studies on colon cancer also support this type of mechanism (26-29).
The proximal cause of a slippage event, the number of repeats added or deleted per slippage event and the number of slippage events that take place as the polymerase traverses the repeated region is unknown. Repeated DNA may be capable of secondary structure formation (30,31) (Petruska, Arnheim and Goodman, unpublished data). These structures could lead to a block and subsequent stalling of polymerase elongation and be accompanied by a backwards slippage of the newly synthesized strand on the repeated template followed by further nucleotide incorporation. If the advance of the polymerase through the repeated region was punctuated by such episodes, sizeable expansions could be created even though the number of repeats added per slippage event was small. It is likely that with increasing repeat number more secondary structures will be encountered and the complexity of these structures could increase. A higher mutation rate and an increase in the mean number of repeats added per expansion event could result.
We propose that as the polymerase advances through a repeated region containing L repeats there is a probability 1-a that it successfully replicates a given trinucleotide without being blocked. The number of triplets for which a block occurs therefore has a binomial distribution with parameters L and a, which we approximate by a Poisson distribution with mean T = La. When a block is encountered, the original triplet is copied and a random number of additional repeats (N) are added by the stutter event. Our stochastic model ignores the rare contraction events since they account for less than 4% of the mutations of HD alleles. We suppose that N has a geometric distribution with parameter p, so that N can take any value from 0,1,2... The mean number of additional repeats added per stutter event is (1-p)/p. The probability p can be thought of as the chance that the stutter event results only in the copying of the original triplet with no additional repeats added. We expect that p will decrease as the number of repeat units in an allele increases due to increased complexity of secondary structures.
Under our model, the probability distribution of the number of repeats added is given by the distribution of a Poisson sum of geometrically distributed random variables, known as the Pólya-Aeppli law (32). Each HD and intermediate somatic allele has its own values of the parameters T = La and p. These are estimated from the sperm typing data shown in Figure 3 using the method of maximum likelihood. A recursive algorithm for calculating the required probability distributions was used (32).
The estimated values for T, p and a and their approximate standard errors are shown in Table 2. A detailed study of the size distribution of the observed expansions with the distribution of the sizes predicted by our model, using standard X² goodnessof-fit tests, shows that the fit of the model is satisfactory in all cases. (None of the X² tests for lack of fit of observed to expected frequencies was significant at the 5% level). We note that, based on our single molecule dilution experiments of somatic DNA, our model ignores the small effect that possible PCR artifacts could have on the observed mutation frequency and spectra estimates.
The average number of additional repeats added per stutter event [(1-p)/p] was found to be directly proportional to repeat size: 0.2 (36 allele), 0.4 (38 allele), 2.5 (49 allele) and 3.2 (51 allele). We take this to mean that the larger the repeat number, the more complex the blocks are on average. It is this increased complexity that results, on average, in more repeats added with each slippage event. It is interesting to note in Table 2 that there is no statistically significant difference among allele sizes in the probability a. Thus the chance that a block is encountered over the range of 36 to 51 repeats appears to remain the same per triplet replication, while the average total number of blocks encountered of course increases with allele size, as does the number of repeats added per slippage event.
We tested our data against a variation of the above model. In this model the number of repeats that can be added when a slippage event occurs was uniformly distributed over the number of template triplets that had already been replicated. Under this model the number of repeats that can be added is, on average, greater as replication of the repeated region nears completion since more template repeats are available to anneal with the 3' end of the nascent strand in a slippage event. For values of the parameters that make the average predicted expansion size equal to the observed average size, the variance in the size distribution for each allele is much greater than observed in the data and suggests that this model does not adequately describe the data.
In conclusion, single sperm as well as single molecule and small pool (33) PCR studies have the potential to provide detailed information on mutation frequency spectra for individual allele sizes. Discrete count data based on large sample sizes can be useful for modeling the molecular details of the triucleotide repeat mutation process and assist in forrnulating hypotheses to be tested by biochemical, biophysical and genetic studies.
Since the PCR protocol used here can amplify alleles with 100 triplet repeats from single sperm cells, and analysis can be done on agarose gels without radioactivity, this method may be applied to preimplantation genetic diagnosis. It must first be demonstrated that the efficiency of amplification of the nonnal allele is not so high as to preclude observation of the much larger HD allele for diploid cells. That this is likely is supported by the detection of both alleles in PCR studies on mixtures of 20 sperm from affected individuals.