Pervasive function and evidence for selection across standing genetic variation in S. cerevisiae | Nature Communications
|
Results.Mapping QTN that drive complex metabolic innovation.S. cerevisiae strains have recently adapted to a wide range of ecological niches, having shared an ancestor with S. paradoxus only some 5–10 million years ago16,17,18,19. Of the traits thought to be responsible for such adaptation, for instance to anaerobic fermentation or to the host mucosa, one of the most important10,11,12 is metabolic innovation. To dissect the genetic basis of growth on diverse carbon sources, we examined a highly inbred cross between parents derived from two different ecological niches: RM11-1a, from a vineyard in California20, and YJM975, from an immunocompromised patient in Italy21. The two strains differ by only 12,054 polymorphisms despite their distinct niches. We previously sequenced the genomes of 1125 F6 haploid progeny of this cross, enabling high-resolution genetic mapping with single-gene, and often single-nucleotide, resolution8. We examined the growth of these progeny in quadruplicate on a diverse set of carbohydrates that included sugars and nonfermentable carbon sources: glucose, galactose, raffinose, maltose, glycerol, ethanol, and sucrose.We next determined the QTLs responsible for growth on each carbon source. Because complex traits are driven by many alleles of small effect, it can be challenging to identify the underlying QTLs using classical approaches22. To map the causal variants for each trait, we used a forward selection procedure, followed by fine mapping using in silico reciprocal hemizygote analysis [Fig.?1a]8. Of 195 causal loci identified, we could resolve 62.1% (N?=?121) to single genes (that is, within 1?kbp) and 54.9% (N?=?107) to single nucleotides [Supplementary Figure?1]. The QTNs responsible for these diverse traits included missense and regulatory variants, but also many synonymous variants in coding regions [Fig.?1b].Fig. 1Mapping complex metabolic traits to single-nucleotide resolution. a Schema of the crossing strategy, phenotyping conditions, genetic mapping, and phylogenetic analysis procedure employed herein. b Fraction of QTN of each functional class for each growth condition tested; “all other” includes all other types of polymorphisms, e.g., premature stop codons, frameshifts, loss of a start codon, etc. Indicated at top is the number of QTNs identified for each trait. c Variance explained per QTL normalized to the maximum variance explained for each growth condition tested (pink); analogous data for growth in the presence of various drug and other stressors are included as a reference for the complexity of drug-resistance traits8 (gray). Also indicated is the mean?±?s.e.m. of number of QTLs identified per trait for each class. d Effect of selected example QTNs in conditions as indicated. Shown are normalized, Z-scored colony size for each allele. Line indicates the mean. Blue: RM11 allele; orange: YJM975 allele. Source data are provided as a Source Data fileFull size image The statistical power of our approach enabled us to readily identify causal polymorphisms explaining as little as 0.3% of phenotypic variance [Fig.?1c, Supplementary Figure?1]. Furthermore, because we could identify many QTLs for each trait, we could explain up to 72% of the total phenotypic variance despite the genetic complexity of the traits we examined [Supplementary Figure?1]. Metabolic phenotypes were comparably, and more consistently, complex as compared to growth in the presence of a battery of drugs and other chemical insults (27.7?±?5.65 QTLs for metabolic traits; 26.0?±?22.0 QTLs for other traits; mean?±?s.e.m.; p?0.003 by F-test)8, emphasizing the genetic complexity of metabolic innovation [Fig.?1c]. Shown in Fig.?1d are example QTNs with diverse molecular mechanisms across the quantitative traits we interrogated.Multiple QTNs in a compound QTL for metabolic innovation.Nonlinearities in the fitness effects of multiple alleles have been known for decades23, and the predictive accuracy of QTL mapping procedures in yeast can be improved by accounting for these interactions24,25. Yet the molecular underpinnings of such nonlinearities often remain mysterious due to the limited resolving power of genotype-to-phenotype mapping. Indeed, most approaches disregard epistatic effects. We sought to harness the power of our technology to characterize these types of interactions. While we lacked the power to survey all QTN–QTN pairs for interacting effects, we did examine our mapping data for instances of QTNs for the same trait residing very close to one another. We reasoned that such cases were likely to represent compound QTLs at which other mapping approaches would likely only resolve a single causal locus.We noted a striking example of such a locus in a pair of QTNs for fitness in sucrose and raffinose, both located in the SUC2 gene encoding invertase, a key sucrose metabolic enzyme in S. cerevisiae. We expected a priori that one of the QTNs, a frameshift at residue 131 that explained up to 15.6% of the phenotype variance, would impact sucrose metabolism as it is located immediately 5′ of the known catalytic site of Suc226. The pronounced effect of the other QTN, a T?>?C transition at position ?6 in the 5′ UTR that explained up to 12.8% of the phenotypic variance, on the other hand, was more surprising [Fig.?2a]. In concordance with the apparently highly deleterious effect of each polymorphism for sugar metabolism, both were rare across a very large collection of more than 1000 sequenced S. cerevisiae isolates [Fig.?2b].Fig. 2Neighboring QTNs in a compound QTL. a) Growth phenotypes of segregants with the parental ditypes (SUC2?6T/394A, blue; SUC2?6C/394fs, orange) and nonparental ditypes (SUC2?6T/394fs and SUC2?6T/394fs, pink). Data shown are mean?±?s.e.m of normalized, Z-scored colony size for each ditype. b Prevalence of the causal SUC2 variants shown in (a) across 1011 sequenced S. cerevisiae isolates28. c Growth of representative segregants with genotypes as in (a). Shown are technical quadruplicates arrayed in squares; panels are representative of N?=?8 biological replicates. d Normalized invertase activity of representative segregants with genotypes as in (a) when grown in media containing raffinose (blue) or sucrose (red), as indicated. Data shown are N?=?3 biological replicates; bars show the mean. Source data are provided as a Source Data fileFull size image To confirm that both QTNs impinged on Suc2 activity and to link this activity to the observed phenotype, we selected representative segregants to examine in detail. To avoid confounding effects from other segregating QTLs, these were chosen to be isogenic at the other major QTL for growth in raffinose, located at the ATG19 gene. We first confirmed the growth phenotypes of the segregants on glucose, raffinose, and sucrose: while there was no evident growth defect on glucose, both QTNs were associated profound growth defects on both sucrose and raffinose [Fig.?2c]. This is consistent with a defect in sucrose catabolism, as the trisaccharide raffinose is first decomposed into galactose and sucrose. To further confirm that Suc2 activity was the molecular phenotype responsible for the growth defect, we assayed the total cellular invertase activity in each of the segregants when propagated in raffinose and sucrose. In concordance with our hypothesis, segregants with either or both QTNs exhibited greatly reduced invertase activity [Fig.?2d]. Indeed, segregants bearing the nonparental ditypes at the two loci were comparably compromised as compared to the YJM975 ditype. While we did not exhaustively survey the genetic variation in our mapping panel for such interactions, our finding of a strong compound QTL in our limited search suggests that the phenomenon may be common. Continued improvements in the resolution of genetic mapping approaches will likely reveal many more examples of compound QTLs, as were recently described for genetic variation segregating in the BY4741 and RM11 strains27.Evidence of positive selection on metabolic traits.Although the traits we examined are likely to be important in many environments, it is impossible to completely recapture in the laboratory the selective forces that have driven S. cerevisiae evolution in the wild. We therefore turned to a powerful statistical test for selection, inspired by Orr and predicated on the idea that positive selection on a given trait in one lineage should enrich it for QTLs of coherent effect28,29. This test is implemented by calculating the fraction of variants from the same parent that have a coherent effect as a function of genetic distance and comparing this enrichment to the random expectation of no coherence. We observed a striking enrichment for nearby variants from the same parent to have the same effect on phenotype (p?0.004 by binomial test) even at distances of up to 500 markers [Fig.?3a]. Moreover, the same was not true of QTLs for the battery of drugs and other stressors tested previously [Fig.?3b]. While we previously found that such QTLs are closer to one another than expected by chance (possibly reflecting a ghost of ancient operons8), they have evidently not been subject to sufficient selective pressure to result in detectable coherence. Together these data suggesting that the metabolic phenotypes mapped here are on average more ecologically relevant to S. cerevisiae in the wild.Fig. 3Directional test for selection on metabolic and other traits. Fraction of alleles from the same parent with coherent effects on phenotype as a function of distance between QTLs (in markers) for a metabolic traits examined here and b drugs and other stressors examined previously. Shown is mean?±?s.e.m. across all mapped traits. p values by binomial test against random expectation (shown in black dashed line). Source data are provided as a Source Data fileFull size image .Many QTNs are not unique to the parent strains.While directional tests can address the question of whether traits have been subject to positive selection, it remains unclear which molecular variants in particular may have been subject to enrichment. Phylogenetic analysis, aided by recent growth in the availability of wild yeast genome sequence data15,16,18, can thus provide powerful complementary information. To assess the ecological relevance of our causal variants, we examined the prevalence of our QTNs in a phylogenetically diverse panel of wild yeast isolates from many different ecological niches15,16. As one might expect, some QTNs were unique to RM11 or YJM975. By contrast, 25 QTNs were not singletons, but were instead present in multiple S. cerevisiae strain backgrounds. For example, the Tyr336Phe missense mutation in Ima1, an enzyme of the isomaltase family30, conferred improved growth on raffinose and sucrose. IMA1 is important for growth on sucrose in the absence of SUC2, and here may be playing a similar “moonlighting” role31. The mutation, located distal to the enzyme active site [Fig.?4a], may disrupt the packing of the Tyr residue, which ordinarily is oriented toward the interior of the enzyme with its hydroxyl group in proximity to the hydroxyl of Thr29032. The Phe variant, despite being highly deleterious to growth on raffinose, sucrose, and maltose [Fig.?4b], is present in several isolates from the Saccharomyces Genome Resequencing Project (SGRP)15 [Fig.?4c]. Also shown in Fig.?4d is a neighbor-joining tree constructed for the genomic neighborhood of IMA1, showing that the Tyr336 variant appears to have re-emerged in the wine/European clade even after accounting for mosaicism among SGRP isolates. This variant also proved to be common across a larger collection of more than 1000 isolates33, with more than 200 strains bearing at least one copy of an alternate allele to the reference Tyr336 variant [Fig.?4e]. Extending this analysis to causal variants for all traits examined here, 23.3% of metabolic the QTNs were shared by other isolates in the SGRP collection [Fig.?4f], and were sometimes present in multiple strains [Fig.?4g].Fig. 4Prevalence of IMA1Tyr336Phe and other QTNs across S. cerevisiae. a Crystal structure of Ima1 in complex with isomaltose; highlighted is the Tyr336 residue that is mutated to Phe (PDB ID: 3AXH). b Growth phenotypes of segregants with Phe336 (blue) and Tyr336 (orange) in raffinose, sucrose, and maltose. Data shown are normalized, Z-scored colony size for each allele; bars show the mean. c Phylogeny of the IMA1Tyr336Phe variant across the SGRP collection; the neighbor-joining tree is constructed on the basis of all segregating polymorphisms. d Phylogeny of the IMA1Tyr336Phe variant across the SGRP collection; the neighbor-joining tree is constructed on the basis of only the segregating polymorphisms within 250 variants of the IMA1 variant on the genome. Scale bars show neighbor-joining distance. e Prevalence of the IMA1 variant shown in (a) across 1011 sequenced S. cerevisiae isolates28. f Fraction of QTN that are unique to RM11 or YJM975 (singletons; green) or shared with another SGRP isolate (purple), for each growth condition tested. g Histogram of the number of strains within the SGRP collection bearing the alternate allele at each locus identified as a QTN. Source data are provided as a Source Data fileFull size image .Phylogenetic evidence for selection on QTNs.The apparent re-emergence of the IMA1Tyr336Phe allele raised the intriguing possibility that independent emergence events have repeatedly produced the same genetic innovations across the S. cerevisiae phylogeny. To test this hypothesis, we examined all the metabolic innovation QTNs for evidence of selection34. We have reported above and previously8 that many QTNs are synonymous variants. Accordingly, we avoided methods that assume the neutrality of synonymous mutations (e.g., Ka/Ks and McDonald–Kreitman criteria35). Instead, we exploited the fact that detailed genomic and phylogenetic information is available for many wild yeast isolates, allowing the inference of repeated emergence by direct comparison of variant and strain phylogeny15,16. The concept of our analysis is simple: if the distribution of alleles on the strain tree is inconsistent with an allele emerging on only one branch, we can in principle infer that the allele has been subject to positive selection leading to multiple fixation events [Fig.?5a]. However, there are several key complications that must be accounted for.Fig. 5Using local phylogeny to accurately infer homoplasy. a Example phylogeny of a hypothetical variant that (left) emerged only once and (right) independently emerged twice. b Schematic of the in silico mutagenesis and mating procedure used to generate simulated admixed populations. c Schematic of the sliding phylogenetic window approach used to account for admixture. d Actual and inferred homoplasy based on this approach for N?=?25 simulated populations with structure similar to that of the SGRP collection; bars show the mean. Source data are provided as a Source Data fileFull size image First, it is known that the sequenced isolates of the wild strain collection have been subject to substantial admixture, with many strains evidently being the product of mating between other sequenced isolates. Thus, the whole-genome neighbor-joining phylogeny is not consistent with the phylogeny based on particular chromosomes or chromosomal regions. This discrepancy will lead to spurious apparent multiple emergence if not accounted for. We therefore adopted a sliding-window approach to determining phylogeny, building local neighbor-joining trees in the vicinity of each segregating polymorphism based on a 500 variant-wide sliding window. To confirm that this approach properly accounts for the possibility of admixture, we conducted simulations based on in silico populations quantitatively similar to the wild yeast collection. Briefly, each simulation began by instantiating 30,000 random mutations in each of five founder strains [Fig.?5b]. Next, ten mosaic strains were generated by random mating of the five founders, followed by in silico meiosis consisting of one crossover per chromosome and the generation of haploid progeny, of which one was retained. Finally, a further 10,000 random mutations were deposited in all 15 simulated isolates. We can then assess the true extent of homoplasy and compare it to that detected by our inference procedure. Across N?=?25 such simulations, we found that our procedure did account for admixture, and indeed returned a mild underestimate of the extent of homoplasy (since a fraction of homoplasy events are phylogenetically indistinguishable from a single emergence) [Fig.?5c, d]. Moreover, our approach accurately captured known mosaicism in the wild strain collection, e.g., that of chromosome II of the Y55 and SK1 strains15,16,33 [Figure?6a–c].Fig. 6Widespread apparent multiple emergence of natural variation. a Plots of sequence identity (500-variant sliding window) of Y55 (top) and SK1 (bottom) to DBVPG6044 (blue) and wine/European clade modal genotype or Y12, respectively (red). Neighbor-joining trees for all strains in the SGRP collection based on 500-variant sliding windows centered about b chrII position 202,724 and c chrII position 750,709. Scale bars show neighbor-joining distance. Fraction of variants segregating within all strains of the SGRP d S. cerevisiae and e S. paradoxus collections that are (left) unique (green) or shared with another isolate (purple) and (right) inferred by phylogeny to have emerged only once (gray) or more than once (pink). The neutral expectation in the absence of selection was calculated separately for each strain collection by ten independent simulations; the bars show the mean across ten simulations and the results of each of the ten simulations are shown; p?10?16 by permutation test. f True extent of homoplasy (blue) as compared to inferred apparent multiple emergence (orange) as a function of the fraction of polymorphisms that are subject to balancing selection (fancestral). MAFbalanced is set to 2?1. g True extent of homoplasy (blue) as compared to inferred apparent multiple emergence (orange) as a function of the strength of balancing selection (MAFbalanced). fancestral is set to 10?2. h Inferred apparent multiple emergence as a function of both fancestral (as indicated; shaded lines) and MAFbalanced (abscissa). Results shown are mean?±?s.e.m. Source data are provided as a Source Data fileFull size image We therefore applied our approach to infer the extent of apparent multiple emergence of the QTNs we identified. As noted above, 23.3% of QTNs from each condition were shared with other SGRP isolates [Fig.?4f, g]. When we included QTNs identified previously8 we found that 26.8% (N?=?128) of variants identified as QTNs in our cross are shared with other yeast isolates. Of these, 21.1% (N?=?27) of these apparently emerged multiple times in S. cerevisiae.A signature of widespread selection across natural variation.Drift-centric standard models of population genetics, while intuitively appealing and analytically tractable, are often at odds with the rapid phenotypic diversification typical of microbes like S. cerevisiae13,36. The findings described above led us to wonder whether a larger fraction of genetic variation in S. cerevisiae has been subject to selection than is generally thought37. Therefore, we analyzed all variants segregating in the wild strain collection using our emergence inference method, in which the phylogeny was recalculated for each variant based on a sliding genomic window [Figure?6b, c]. In total, 112,285 of 210,363 variants (53.5%) were shared, of which 38,549 (34.3%) were inferred to have emerged more than once [Fig.?6d]. The neutral expectation in the absence of selection predicts only 51,334?±?201 shared variants and 1452?±?30 parallel emergence events (mean?±?s.d. for N?=?10 simulations). This strongly suggests that a substantial amount of the extant variation in S. cerevisiae has been subject to selection.To confirm that the pattern of variant sharing we observed was not a sampling artifact of the strain collection we examined, we also assessed the sharing of variants in the 1002 Yeast Genomes Project collection33. In concordance with the observations described above, only 44% (N?=?772,398) of variants were singletons, whereas 56% (N?=?982,468 loci with homozygous alternate allele) occurred in at least two isolates [Supplementary Figure?2]. This observation is striking in light of the declining numbers of distinct variants observed with increasing numbers of sequenced isolates [Supplementary Figure?2] and is consistent with strong purifying selection having acted on abundant variation in the S. cerevisiae lineage.Interestingly, the pattern of apparent multiple emergence was quite different in the nearest relative of S. cerevisiae, Saccharomyces paradoxus, for which genome sequences of a 23-strain S. paradoxus collection are available15. In contrast to the S. cerevisiae strain collection, whose members are ecologically diverse, all but one of the sequenced S. paradoxus strains were isolated from Quercus (oak trees). Others have suggested that this may have resulted in reduced selective pressure among the S. paradoxus isolates to adapt to new ecological niches38, and we reasoned that this might in turn have reduced the parallel emergence of novel adaptive variants. We analyzed the prevalence of multiple emergence events in this collection (which harbors 464,307 segregating variants) by the second inference method described above and again determined phylogeny locally for each variant using a 200-variant sliding window. In contrast to our findings for S. cerevisiae, only 146,819 variants were shared between isolates, and 19,628 multiple emergence events were inferred [Fig.?6e]. The neutral expectation was of 215,300?±?354 shared variants and 3429?±?43 emergence events (mean?±?s.d. for N?=?10 simulations). The relative paucity of apparent multiple emergence events in S. paradoxus, the closest extant relative of S. cerevisiae, suggests that ecological diversification has played a role in generating the parallelism that we found to be commonplace in S. cerevisiae.Balancing selection as an alternative to multiple emergence.Homoplasy is not the only explanation for the prevalence of apparent multiple emergence events. The last common ancestor of the wild isolates we analyzed likely harbored substantial polymorphism, presenting two other important sources of modern genetic variation: neutral polymorphisms maintained due to incomplete lineage sorting39 and functional polymorphisms maintained by balancing selection40. The first possibility is likely not a major contributor, as most neutral variation is expected to resolve to reciprocal monophyly within 10 Ne generations (only ~100,000 years for S. cerevisiae)39,41. The latter, on the other hand, is important to consider as an alternative to homoplasy. We therefore simulated this possibility for a range of strengths and extents of balancing selection. Following the same procedure as above, we first constructed highly mosaic in silico yeast populations consisting of 15 strains and Ntotal ~300,000 total random mutations population-wide. In the context of these strains, we simulated the addition of fancestralNtotal ancestral polymorphisms maintained at a balancing minor allele frequency of MAFbalanced. For each polymorphic locus, the “sequenced” locus detected in each strain was simulated by a random draw based on MAFbalanced.Little is known regarding the quantitative strength and extent of balancing selection in the wild40, so we simulated the presence of balanced polymorphisms using parameter estimates spanning several orders of magnitude (fancestral?=?0.1–10?5; MAFbalanced?=?0.5–2?5). As expected, increasing both the fraction of loci subject to balancing selection and the equilibrium minor allele frequency increased the extent of apparent multiple emergence inferred from the simulated genotype data [Fig.?6f–h]. Quantitatively, however, even for substantial balancing selection (fancestral?=?0.1 and MAFbalanced?=?0.5) the simulated extent of apparent multiple emergence was less than that observed in the wild-strain collection: only 5743?±?57 (mean?±?s.d.; N?=?5 simulations) apparent multiple emergence events were inferred. This suggests that homoplasy is likely a key contributor to the observed level of apparent multiple emergence (38,549 apparent events), even in the presence of high levels of ancestral polymorphism and balancing selection.On evolutionary timescales, the likelihood of a given nucleotide mutation having occurred at some point in a S. cerevisiae population is high. Each base pair should be mutated approximately every 103–104 generations in a typical S. cerevisiae effective population size (Ne ~106) with a mutation rate of 5?×?10?10 per base pair per division42,43. This notion is supported by the reproducible advent of beneficial genetic variants in laboratory evolution experiments44,45,46,47. On evolutionary timescales, therefore, the exploration of the genotypic space near the reference genome is likely nearly complete. However, the wild strains we examined contain only ~240,000 segregating variants across a genome of approximately 12?Mb (~2% of loci) and RM11?×?YJM975 contains only ~12,000 variants (~0.1% of loci). Even the very large 1002 Yeast Genomes Project collection contains only ~1,700,000 total variants (~14% of loci), an even smaller fraction relative to the total number of strains sampled [Supplementary Figure?2]33. These patterns suggest that S. cerevisiae has been subject to strong purifying selection15. Our conservative analysis, based on the inference of apparent multiple emergence events, concludes that a substantial number (>5%; N?=?27) of all QTN we have identified thus far (in a limited survey of growth conditions) have been subject to selection in natural S. cerevisiae populations. This estimate is a conservative lower bound on the fraction of variants in our cross that have been subject to selection, but still suggests RM11?×?YJM975 alone harbors at least 600 ecologically relevant variants, and likely many more.Selection for adaptation to ecological niche.Finally, we investigated connections between the phylogenetic evidence for selection and the actual selective forces at play in nature. If adaptation to ecological niche is a relevant selective pressure, variants adaptive in a certain niche should be enriched in isolates from that environment. Therefore, we evaluated the enrichment of alternate alleles in strains isolated from particular ecological niches. For every shared variant, we assessed whether the multiple occurrences were more likely to have occurred in strains isolated from the same ecological environment (N?=?26,671 and N?=?20,089 variants present in two and three strains, respectively) [Fig.?7a]. Indeed, both doubly and triply occurring alternate alleles were far more likely to have occurred solely within a single niche than would be expected by chance (p?10?16 by permutation test) [Fig.?7b]. The enrichment was slightly stronger when only synonymous variants were considered (N?=?10,014 doubly occurring and N?=?7599 triply occurring variants). Taken together, these data provide strong evidence that a substantial subset of standing variants in S. cerevisiae have been subject to selection (and have presumably been beneficial), facilitating the recent adaptation of this organism to diverse environments.Fig. 7Enrichment of shared alternate alleles in common ecological niches. a Schematic of a phylogeny in which the alternate (RM11) allele occurred twice, and only in the fermentation niche. b Fraction of variants segregating in the SGRP collection for which the alternate allele occurs only in strains from same ecological niche (blue: all variants; red: only synonymous variants), for loci with the alternate allele occurring (left) in two strains or (right) in three strains; the neutral expectation was calculated by 50 independent simulations; the mean of the simulations and the results of each simulation are shown in gray and with black dots, respectively. p values by permutation test. Source data are provided as a Source Data fileFull size image .
|
From:
|
|
|