Population genomic analysis of mango ( Mangifera indica ) suggests a complex history of domestication

(cid:1) Humans have domesticated diverse species from across the plant kingdom, yet much of our foundational knowledge of domestication has come from studies investigating relatively few of the most important annual food crops. Here, we examine the impacts of domestication on genetic diversity in a tropical perennial fruit species, mango ( Mangifera indica ). (cid:1) We used restriction site associated DNA sequencing to generate genomic single nucleotide polymorphism (SNP) data from 106 mango cultivars from seven geographical regions along with 52 samples of closely related species and unidentiﬁed cultivars to identify centers of mango genetic diversity and examine how post-domestication dispersal shaped the geographical distribution of diversity. (cid:1) We identify two gene pools of cultivated mango, representing Indian and Southeast Asian germplasm. We found no signiﬁcant genetic bottleneck associated with the introduction of mango into new regions of the world. By contrast, we show that mango populations in introduced regions have elevated levels of diversity. (cid:1) Our results suggest that mango has a more complex history of domestication than previously supposed, perhaps including multiple domestication events, hybridization and regional selection. Our work has direct implications for mango breeding and genebank management, and also builds on recent efforts to understand how woody perennial crops respond to domestication.

One of the central dogmas of domestication is that crops undergo an often-severe decrease in genetic diversity in response to three key bottleneck (or founder) events (Ladizinsky, 1985;Cooper et al., 2001;Doebley et al., 2006;Van de Wouw et al., 2010;Miller & Gross, 2011). During the initial stages of cultivation, as important traits are selected for or against, crops generally undergo a 'domestication bottleneck' (Cooper et al., 2001;Van de Wouw et al., 2010). Compounding the primary loss of diversity, many crops experience a secondary 'dispersal bottleneck' when they are introduced into new geographical regions (Cooper et al., 2001;Van de Wouw et al., 2010). Soybean, for example, was subjected to an intense introduction bottleneck when it was introduced from Asia into North America (Hyten et al., 2006). The concept of a dispersal bottleneck is connected to Vavilov's premise of crop 'centers of origin', which posits that the geographical origin of a crop contains the greatest variation of morphological types (Vavilov, 1987), thereby implying a loss of diversity as a crop is dispersed. As breeding and cultivation intensify, some crops suffer a tertiary 'improvement bottleneck' (Cooper et al., 2001;Van de Wouw et al., 2010). The drastic reductions in diversity incurred during these three bottleneck events (primary, secondary, tertiary) can negatively impact a crop's ability to adapt to novel environments, pests and diseases (e.g. Abbo et al., 2003;Esquinas-Alc azar, 2005). However, the relative impacts of each bottleneck vary both within and among crops, depending in large part on the biology of the species itself.
Perennial crop species have recently received increased attention highlighting their relatively different trajectories under domestication compared to annuals (Miller & Gross, 2011;Gaut et al., 2015). In general, woody perennials retain greater levels of genetic diversity under cultivation than do annual species (Miller & Gross, 2011). For example, recent genome-wide analyses of peach (Prunus dulcis) and its close relative almond (Prunus persica) showed no evidence of genetic bottlenecks associated with domestication in either species (Velasco et al., 2016), and similar results have been found for grape (Vitis vinifera; Myles et al., 2011) and apple (Malus x domestica; Gross et al., 2014). The relatively weak primary domestication bottleneck observed in many perennial species is largely a result of characteristics common to the perennial life history: a long generation time and the predominance of self-incompatibility (Miller & Gross, 2011). The former means that perennial crops have experienced fewer generations of selection under domestication than their annual counterparts (Pickersgill, 2007), whereas the latter explains how perennials maintain high levels of heterozygosity despite the fact that their per-unit-of-time mutation rates are far slower than in annual species (Savolainen & Pyh€ aj€ arvi, 2007). In addition, clonal propagation techniques common in woody perennial cultivation allow any individualincluding F 1 hybrids, triploids and sterile or seedless parthenocarpic individualsto be preserved for posterity, effectively halting the domestication process in that clone and potentially limiting the loss of genetic diversity in perennial species (Zohary, 2004;Miller & Gross, 2011). However, not all perennial crops retain high levels of diversity: the tropical species coffee (Coffea arabica), cacao (Theobroma cacao) and pigeonpea (Cajanus cajan) have all experienced significant losses of diversity during domestication (Anthony et al., 2002;Aerts et al., 2013;Yang et al., 2013;Varshney et al., 2017).
The mango, Mangifera indica L. (Anacardiaceae) is a perennial fruit tree that has been cultivated on the Indian subcontinent for an estimated 4000 yr, where it is called 'The King of Fruits' (Mukherjee, 1949). This timeline places the domestication of mango contemporaneously with that of citron (Citrus medica), walnut (Juglans regia), peach (Prunus persica), sweet orange (Citrus x sinensis), lychee (Litchi chinensis), lemon (Citrus limon) and jujube (Ziziphus jujuba), and before that of the other domesticated species in the poison ivy family: pistachio (Pistacia vera), cashew (Anacardium occidentale), Peruvian peppertree (Schinus molle), and jocote (Spondias purpurea) (Meyer et al., 2012). Unpruned, mango trees can reach over 30 m in height and live for more than a century, producing tons of fruit throughout their lifetime.
Most authors presuppose a single domestication event for cultivated M. indica (DeCandolle, 1884;Mukherjee, 1972;Vavilov, 1987;Mukherjee & Litz, 2009;Singh, 2016), and on the basis of historical documents and artifacts, M. indica is thought to have been cultivated in India for thousands of years before it was introduced elsewhere (Mukherjee, 1949;Fig. 1). Buddhist monks were likely the first to introduce mango outside its original range of cultivation during their trips to Southeast Asia in the 4 th and 5 th centuries (Mukherjee, 1949). The mango began its westward journey much later, when Persian traders brought the tree to East Africa in the 9 th or 10 th centuries (Mukherjee, 1949). In the 16th Century, as global botanical trade continued to grow, the Portuguese likely reintroduced the mango into East Africa from their territory in Goa (Mukherjee, 1949). The Portuguese would continue to facilitate mango's range expansion, transporting it to West Africa, and then to Brazil sometime around 1700 (Popenoe, 1920;Mukherjee, 1949). From there, mango spread throughout the Caribbean, reaching Barbados in 1742 and Jamaica by 1782 (Popenoe, 1920;Mukherjee, 1949). As a Spanish colony, Mexico had an unique history of introductions, with mangoes arriving from the Caribbean as well as directly from the Philippines, which also was under Spanish rule at the time (Popenoe, 1920;Mukherjee, 1949). It was not until 1833 that the first mango reached the shores of Florida (Popenoe, 1920). In the 1900s, mango became the subject of intensive breeding programs in South Florida, which produced many of today's most important commercial cultivars including 'Tommy Atkins', 'Haden', 'Keitt' and 'Kent' (Knight et al., 2009). For this reason, South Florida has been termed a secondary center of domestication for mango (Knight & Schnell, 1994).
Today, mango is one of the world's most important fruits and is grown in tropical and subtropical climates across the world (FAO, 2003;FAOSTAT, 2018), with two primary cultivar types, Indian and Indochinese, being differentiated by a suite of morphological characters (Crane & Campbell, 1994). Indian cultivars tend to have an apparent color change when ripe, turning orange or red, and are rounded with fibrous, strong-flavored flesh. They also generally have a seed that is monoembryonic, producing a single seedling. By contrast, Indochinese cultivars tend to turn yellow or remain green when ripe, display a prominent "nose" or "beak", and have flesh that has little fiber and is mild in flavor. Indochinese cultivars also typically have polyembryonic seeds, containing a single zygotic embryo and multiple embryos derived from the maternal nucellar tissue (Mukherjee & Litz, 2009). Nucellar embryony is a rare trait in angiosperms, although the phenomenon has been observed in at least three other species of Mangifera (M. odorata, M. laurina, M. casturi; (Kostermans & Bompard, 1993;Mukherjee & Litz, 2009;Lim, 2012a,b) and is found in another cultivated genus within the order Sapindales, Citrus .
Despite its importance as a global food crop and its cultural significance in many regions of the world, current ranges of wild M. indica are not well-characterized. Although wild populations have been reported from northeastern India, Bangladesh, Bhutan and Nepal, and may extend into Myanmar and northern Thailand (Kostermans & Bompard, 1993), these populations have not been recently surveyed, have never been studied in a genetic framework and are not represented in germplasm collections anywhere in the world. The IUCN's red list currently categorizes wild M. indica as 'data deficient' (IUCN, 2012). Other species in the genus Mangifera are found from India to the Solomon Islands, with the region of highest diversity in Malesia.
Phylogeographical studies can elucidate the origins of crops and reveal the impacts of domestication on these species (e.g. Olsen & Schaal, 1999;Salamini et al., 2002;Londo et al., 2006;Gunn et al., 2011;Kassa et al., 2012;Loor Solorzano et al., 2012). Although a lack of accessible wild M. indica populations precludes investigations of a primary bottleneck associated with the initial domestication of mango, the recent and well-documented history of mango's human-mediated migration into new regions of the world provides an opportunity to determine whether the species experienced a secondary genetic bottleneck during successive founder events. Although many previous studies have provided insight into the molecular diversity and genetic structure of mango cultivars within specific regions, including Kenya (Sennhenn et al., 2013), Myanmar (Hirano et al., 2010), China (Luo et al., 2011), Colombia (Diaz-Matallana et al., 2009), Brazil (Dos Santos Ribeiro et al., 2012, Iran (Shamili et al., 2012) and, especially, India (Ravishankar et al., 2000Kumar et al., 2001;Karihaloo et al., 2003;Damodaran et al., 2012;Vasugi et al., 2012;Surapaneni et al., 2013), only a handful have examined mango cultivars originating across a broad geographical range. Works by Schnell et al. (2006) and Dillon et al. (2013), both of which used microsatellite markers, found Southeast Asian mango cultivars to be the most differentiated from other populations, whereas Sherman et al. (2015) found population structure between Asian and Western mango cultivars using single nucleotide polymorphisms (SNPs).
Here, we use SNP markers from double digest restriction site associated DNA sequencing (ddRADseq; Peterson et al., 2012) to explore geographical patterns of diversity in mango cultivars within genebank collections that originated from different geographical regions. As a reduced representation genomic technique, RADseq identifies SNPs from across the genome (Miller et al., 2007;Baird et al., 2008), and has proven to be a useful tool for investigating population structure and phylogeography in nonmodel organisms, including crop species (e.g. Xu et al., 2014;Atchison et al., 2016;Pan et al., 2016;Singh, 2016;Gao et al., 2017;Stetter et al., 2017). We aim to (1) determine the geographical distribution of genetic diversity in mango, (2) test whether India represents a 'center of diversity' for mango, and (3) quantify the secondary genetic bottleneck mango underwent during its migration to Africa and the Americas. Our work has a three-fold impact, informing management practices for mango germplasm resources, providing a better understanding of the genomic impacts of domestication on cultivated mango, and adding to the growing body of literature that seeks to understand how perennial plants evolve under domestication.

Sampling
In order to explore the geographical distribution of genetic diversity in mango, we selected 113 cultivars from mango genebanks in South Florida (Fairchild Tropical Botanic Garden (FTBG), US Department of Agriculture's Subtropical Horticulture Research Station (USDA)) that originated in seven different geographical regions: India, Southeast Asia (Indochina (Myanmar, Thailand, Cambodia, Laos, Vietnam) and Malesia (Malaysia, Indonesia, the Philippines)), Africa (limited germplasm required pooling of all African samples), South America, Mexico, the Caribbean (Cuba, Jamaica, Haiti, the Dominican Republic) and Map of the human-mediated migration of the mango across the globe. Colors represent the geographical populations of mango cultivars analyzed in this study and correspond to labels used throughout the results. Times shown were estimated based on historical documentation (Mukherjee, 1949). The mango is thought to have originated in India, Nepal, Bangladesh, and Bhutan (red), and domesticated in India c. 4000 yr ago. It was first dispersed into Southeast Asia (blue, Indochina; green, Malesia) during the 4 th -5 th centuries BCE, then into East and West Africa (purple) between the 9 th and 17 th centuries, South America (Brazil, orange) in 1700, the Caribbean (pink) and Mexico (yellow) during the mid-1700s, and Florida (brown) during the mid-1800s. Mexico received introductions both from the Caribbean and from the Philippines. Florida ( Fig. 2; Table S1). We attempted to sample the most morphologically and geographically diverse and characteristic mangoes from each region, emphasizing historical cultivars whenever possible. Additionally, we collected leaves of 54 samples of unidentified cultivars of Mangifera indica and closely related Mangifera species from FTBG, Miami-Dade Fruit and Spice Park, Singapore Botanic Garden, Gardens by the Bay (Singapore), Purwodadi Botanic Garden (East Java, Indonesia), Bogor Botanic Garden (West Java, Indonesia), the Forestry Research Institute of Malaysia (Kepong Malaysia), Pasoh Forest Arboretum and Reserve (Simpang Pertang, Malaysia), and individual collectors (Table S1). Fresh leaf samples were stored at À80°C or dried in silica and stored at 4°C. DNA was extracted from each sample using the DNEasy plant mini kit (Qiagen) or, when necessary, a modified CTAB protocol (Doyle & Doyle, 1990).

RADseq library preparation and locus assembly
Three ddRADseq libraries were prepared following the protocol of Peterson et al. (2012). The 167 samples for this study were combined with 121 other samples (sequenced for a complimentary study). High molecular weight DNA (300-1000 ng) was digested with NlaIII and MluCI (New England Biolabs, Ipswich, MA, USA). Custom-designed oligonucleotides containing unique barcode sequences were ligated onto each individual before pooling eight samples into 12 separate sublibraries per lane (36 sublibraries across three lanes total). Pippin Prep (Sage Science, Beverly, MA, USA) was used to size-select 350-bp inserts (tight size selection, 425 bp, external marker). Short-cycle PCR was performed in sextuplicate to amplify and add an unique index to sublibraries, which were then quality-checked on an Agilent Bioanalyzer DNA High Sensitivity Chip (Agilent, Santa Clara, CA, USA). For libraries where overamplification was observed, nontarget DNA was removed by size-selection on Pippin Prep, with a subsequent Bioanalyzer quality-check. Each of the three libraries was sequenced at The University of Southern California's Genome and Cytometry Core in a rapid run of Illumina HiSeq 2500 as a single lane of 150-bp single-end reads.
The program FASTQC v.0.11.4 (Andrews, 2010) was used to check the overall quality of raw fastq files for each sublibrary. After demultiplexing reads within each sublibrary based on the individual barcode, seven individuals from this study were excluded based on low sequencing coverage; additionally, two individuals were removed from the cultivar dataset after preliminary analysis showed them to be outliers and likely misidentified Mangifera species. In total, 158 samples were analyzed: 106 samples from known mango cultivars and 52 from closely related species or unidentified accessions (Table S1).
Raw reads were processed using the ipyRAD bioinformatic pipeline (Eaton, 2014) on Florida International University's high performance computing cluster (FIU HPCC) using default parameters except for: maxdepth = 1000, max_barcodes_mis match = 1, filter_adapters = 2, and clust_threshold = 0.95 using de novo clustering. The threshold for clustering reads within and between individuals was set to 0.95 to account for previous reports of high heterozygosity within mango (Sherman et al., 2015;Singh, 2016;Kuhn et al., 2017) and because the full dataset included closely related Mangifera species. For population genetic analysis of the 106 mango cultivars, ipyRAD was used to produce a file containing a single randomly selected (unlinked) single nucleotide polymorphism (SNP) from each locus for downstream analyses. To produce a dataset for phylogenetic analysis, which can tolerate relatively large amounts of missing data, we performed filtering (ipyRAD step 7) for the complete dataset of 158 individuals using the parameter min_samples_locus = 33. For analysis of the full dataset of 158 individuals with STRUCTURE software, we used a custom python script to remove loci that had < 10% missing data and individuals < 50% missing data per individual. Because population genomic analyses are less tolerant of missing data than phylogenetic analysis, we filtered the dataset for the subset of 106 mango cultivars in ipyRAD using the parameter min_samples_locus = 4, then used a custom python script to filter loci that contained > 10% missing data and individuals that had > 50% missing data across all loci, and finally filtered out loci with a minor allele frequency < 0.01 using the R/POPPR package (Kamvar et al., 2014).

Phylogenetic analysis
A maximum-likelihood phylogeny for the dataset of 158 individuals (min_samples_locus = 33) was estimated from the concatenated SNP dataset (64 331 variable sites, 40 767 parsimonyinformative sites) using IQ-TREE (Nguyen et al., 2015) including model selection performed with an ascertainment bias to correct for only including variable loci (-m TEST+ASC; Kalyaanamoorthy et al., 2017), 1000 ultrafast bootstrap replicates (-bb 1000; Hoang et al., 2018) and 1000 bootstrap replicates of the Shimodaira-Hasegawa approximate likelihood ratio test (-alrt 1000; SH-aLRT; Guindon et al., 2010). The model selection implemented in IQ-TREE identified TVM+F+ASC+G4 as the best-fit model according to Bayesian Information Criterion (BIC). The phylogeny was rooted with the species M. gedebe, which has been identified as sister to all other sampled species (E. Warschefsky & E.J.B. von Wettberg, unpublished) using the program MESQUITE (Maddison & Maddison, 2018). The tree was visualized and annotated using the R/APE (Paradis et al., 2004) and R/GGTREE  packages.

Population structure and admixture
In order to detect population structure and admixture within the 106 mango cultivars, K-means clustering was conducted in the Bayesian software STRUCTURE v.2.3.4 (Pritchard et al., 2000;Falush et al., 2003;Hubisz et al., 2009). For the dataset, lambda was estimated by averaging the mean value of lambda with K = 1 across 10 independent runs of 100 000 iterations with a 10 000 step burn-in period. Using the estimated value of lambda for the dataset, 10 runs of 100 000 iterations followed by a 10 000 step burn in were performed for K = 1-8. The optimal value of K was determined using STRUCTUREHARVESTER v.0.6.94 (Earl & vonHoldt, 2012) 2005). Results were summarized with CLUMPP v.1.1.2 (Jakobsson & Rosenberg, 2007) using the greedy option (M = 2) for K = 1-8, with G 0 similarity and 1000 random permutations. The results were visualized using DISTRUCT v.1.1 (Rosenberg, 2004), and individuals were labeled by population. Genetic structure also was analyzed for the full dataset in a similar manner. Additionally, principal component analysis (PCA) was used to visualize population structure within the dataset of 106 mango cultivars in the R/ADEGENET package (Jombart, 2008;Jombart & Ahmed, 2011). Analysis of population structure was performed for the filtered full dataset (158 individuals, min_samples_locus = 4 with < 10% missing data per locus, < 50% missing data per individual; 612 unlinked SNP markers) using the same methods as for the dataset of 106 individuals. For the dataset of 106 cultivars and the full dataset, the average population assignment for each region/species also was calculated from STRUCTURE results.

Indices of genetic diversity
Common measures of genetic diversity were calculated for the seven populations of mango cultivars using the dataset of 364 SNPs. Observed heterozygosity (H O ), gene diversity (H E , the expected heterozygosity within subpopulations assuming Hardy-Weinberg Equilibrium), and the inbreeding coefficient (F IS ) were calculated in the R/HIERFSTAT package (Goudet, 2005

Population differentiation
In order to test for significant genetic differentiation between mango cultivars originating from the geographical regions represented in this dataset (India, Southeast Asia (Indochina & Malesia), Africa, the Americas, the Caribbean, Mexico, and Florida), pairwise values of population differentiation (F ST of Weir & Cockerham, 1984) between populations of cultivars were calculated in GENODIVE v.2.0b27 (Meirmans & Van Tienderen, 2004) and significance was evaluated with a Bonferoni correction for multiple tests. To examine population differentiation within 106 mango cultivars, we performed AMOVA (Excoffier et al., 1992;Michalakis & Excoffier, 1996) in the software GENODIVE v.2.0b27 (Meirmans & Van Tienderen, 2004) under an infinite allele model and with 999 permutations to test for significant differences. Before the AMOVA, missing data were filled in with randomly drawn alleles determined by the overall allele frequencies.

Sequencing and assembly
We obtained 201 811 265 raw reads for the 158 individuals (excluding a total of nine low coverage and outlier samples from the original 167) analyzed in this study (average 1 277 286, standard deviation 541 376; Table S1; NCBI Bioproject PRJNA517351; NCBI SRA Accessions SRR8521837-SRR8521844). The FASTQC results indicated that reads were of high quality across the entire 150 bp length. Because RADseq datasets often have large amounts of missing data, filtering parameters can have a major impact on the overall size of the dataset. The complete dataset for 158 individuals, which included all variable sites and allowed for large amounts of missing data, contained 64 331 SNPs; after filtering loci with > 10% missing data and individuals with > 50 missing data, the dataset included 612 unlinked SNP markers. The subset for 106 mango cultivars recovered 364 unlinked SNPs that had a minor allele frequency > 0.01% from the 994 loci (some invariant) recovered in at least 90% of individuals.

Phylogenetic hypothesis
The maximum-likelihood phylogeny of the full dataset of 158 individuals provides information at both the intraspecific and interspecific levels (Fig. 3). Ultrafast bootstrap support values can be considered strong only when > 95% (Hoang et al., 2018), whereas SH-aLRT bootstrap values can be considered strong at > 80% (Guindon et al., 2010), and nodes with strong support from both measures were identified (Fig. 3). The species M. pentandra, M. casturi, M. gedebe and M. zeylanica were recovered as monophyletic (some with unidentified or putatively identified individuals included in their monophyletic groups), with high support from both ultrafast bootstrapping and SH-aLRT bootstrapping. A number of clades consisting solely of samples of uncertain identity also were recovered. Within the core clade of M. indica, three subclades were recovered, though support values for these clades were low. In general, Indochinese and Malesian samples were recovered in separate clades compared to cultivars from other regions of the world. The first clade consists of two Indonesian ('Aeromanis', 'Gedong Ginco'), two African ('Diab', 'Hindi Besanara') and two Indochinese ('Golek', 'Sig Siput') cultivars along with samples identifed as M. lalijiwa, and unidentified samples, most of which were collected in Indonesia. The second clade consists of primarily Indochinese samples and also includes one Floridian sample ('Joellen') and one African sample ('Ewais'). Notably, within the second clade two Mexican cultivars ('Ataulfo' and 'Manila') form a monophyletic group with the lone Philippine cultivar ('Carabao'), corroborating the historical documentation which indicates that some Mexican mango germplasm was introduced directly from the Philippines. The third clade of M. indica primarily contains cultivars from India, Florida, South America, the Caribbean and Africa, but also includes five Indochinese cultivars ('Myatrynat', 'Swethintha', 'Saigon', 'Maha Chanok' and 'Cac') and the remaining three Mexican cultivars ('Oro', 'Manila' and 'Esmeralda'). Although some cultivars from within particular regions are recovered as closely related to one another, there is not strong geographical structure within the clade.

Population structure
The population structure of the subset of 106 mango cultivars was first examined with the program STRUCTURE to identify genetic groups, which found K = 3 to be optimal using the DK method of Evanno, though additional informative structure is observed for K = 2 and 4 ( Fig. 4a-c). For K = 3, populations from Florida and India have similar compositions, with high levels of ancestry from group two, moderate levels of admixture from group three, and low levels of admixture from group one. The majority of individuals in the Caribbean and South American populations have the highest level of ancestry from group three, with moderate levels of admixture from population two and low levels of admixture from group three. As a whole, the Southeast Asian cultivars are distinct from cultivars from other regions, with high levels of ancestry from group one and low to moderate contributions from groups two and three. African cultivars are inferred to have high levels of admixture among the three groups, with great variability in inferred ancestry across individuals. Similarly, most cultivars from Mexico show high levels of admixture from the three groups. Indicative of the ongoing exchange of germplasm across the world, all populations include some individuals that deviate from the overall pattern for that population. These results also are supported when examining the average of all individuals within each population (Table S2a-c).
Population structure also was examined for the full dataset of 612 unlinked SNPs from 158 individuals (Fig. 4d-f). Individuals were sorted and labeled by species or population. Using the DK method of Evanno, K = 2 was found to be the optimal number of populations, though we observed additional informative structure for K = 3 and 4. For K = 3, mango cultivars from Florida, the Caribbean, South America, Africa (with the exception of two individuals), and India show high levels of shared ancestry from a single group and only a few individuals indicate low levels of admixture with a secondary group. By contrast, almost all cultivars from Indochina and Malesia show high levels of admixture with the second group. Admixed ancestry from groups one and two was also found in M. casturi, M. pentandra and M. lalijiwa. Both M. gedebe and M. laurina are assigned to group three with little evidence of admixture. A few individuals, including three cultivars from Africa, both samples of M. zeylanica, and multiple unidentified samples from Florida and Malesia, were inferred to be of admixed ancestry between groups one and three or between all three groups. The unidentified accessions in Florida and Malesia were highly variable, with individuals assigned to group one, group three, or showing admixture between two or more of the populations. Of note, no individuals are inferred to have > 60% ancestry from group two. The average assignment of individuals from geographical regions showed similar results (Table S3a-c).

New Phytologist
For analysis of the 106 mango cultivars using PCA, the first principal component explained 9.58% of the variance whereas the second explained 5.84% (Fig. 5). The PCA clustered cultivars from Florida with those from India, whereas cultivars from the Caribbean and South America showed some differentiation. Mango cultivars from Southeast Asia were the most distinct, with little overlap between Southeast Asian cultivars and those from other regions. Cultivars from Africa and Mexico were the most widely distributed, providing further evidence of the high variation in individual genetic composition for these populations. Together, the results of clustering analyses indicate that Southeast Asian cultivars contain unique genetic diversity that is not well represented in cultivars from other regions of the world.

Genetic diversity and population differentiation
Measures of genetic diversity were calculated for the seven populations of mango cultivars (Table 1) from the dataset of 364 unlinked SNPs. In general, levels of diversity were similar across all populations, although the levels of diversity for the African population were consistently high compared to other populations, whereas diversity in the Floridian population was relatively low. Levels of H O were highest for the Caribbean and South America (0.2156 and 0.2154, respectively) and lowest for Florida and India (0.1833 and 0.1815, respectively).
Africa had the highest levels of gene diversity (H E 0.2169), whereas Florida had the lowest (0.1711). Values for the inbreeding coefficient F IS ranged from 0.0425 (Africa) to À0.0411 (Florida), indicating relatively low levels of inbreeding in mango cultivars. Values of allelic richness differed little between populations, with the highest levels found in the African and Caribbean populations (1.2106 and 1.2107, respectively) and the lowest found in the Floridian population (1.1690). We observed the highest nucleotide diversity in the African population (0.0671) and the lowest in the Floridian population (0.0293). Percentage polymorphism varied from 83.52% in the Southeast Asian population to 54.40% in the Mexican population. The number of private alleles was highest in the Indochinese population (74), for which we measured nearly five times as many private alleles as the next highest population, India, which had 15.
Many pairs of populations were significantly differentiated from one another by pairwise calculations of F ST ( Table 2). The Floridian, Indian and Southeast Asian populations were significantly different from all other populations. Additionally, the Caribbean and African populations were significantly different. The AMOVA found that a significant amount (7.6%) of the total variation was segregated between populations (F = 0.076, P = 0.001), with the majority of variation (91.8%) shared across individuals (Table 3). PC1: 9.58% PC2: 5.84%

Discussion
Here, we analyzed mango cultivars and closely related Mangifera species to describe phylogeographical patterns of diversity, explicitly test whether India represents a 'center of diversity' for mango, and quantify the genetic bottleneck that mango underwent as it was introduced into new regions of the world. Collectively, our results provide insight into global mango diversity as well as the process of domestication in one of the world's most important perennial fruit crops.

Geographical distribution of diversity
Patterns of genetic diversity in crops can tell us about their history of domestication. Our analysis of genetic structure within cultivated mango germplasm identified two primary groups, corresponding to Indian and Southeast Asian cultivars, with a third, less defined group representing Caribbean and South American cultivars. The differentiation between Indian and Southeast Asian cultivars supports previous genetic analysis of mango germplasm diversity (Schnell et al., 2006;Dillon et al., 2013) and traditional classification of mango cultivar types as Indian or Indochinese (Crane & Campbell, 1994). Furthermore, the differentiation of South American and Caribbean cultivars aligns with another recent analysis of mango germplasm diversity, which found differentiation between Asian and Western cultivars (Sherman et al., 2015).
In addition to the three groups of cultivars, our analysis of population structure, principal components and nucleotide diversity show that the African and Mexican cultivar populations have high levels of admixture and diversity. In support of historical documentation indicating that Mexico received germplasm directly from the Philippines (Mukherjee, 1949), two of the five Mexican cultivars cluster closely with the lone Philippine cultivar. Whereas the Philippines is considered part of Malesia, the group of Mexican and Philippine cultivars clusters with Indochinese rather than Malesian cultivars in the phylogeny. Notably, the African population has relatively high levels of diversity and includes individuals that cluster with Indian and Southeast Asian populations. The diversity of African populations may be an artefact of sampling cultivars that are modern introductions rather than historical cultivars, which are rare in germplasm collections. Additional effort should be made to examine the diversity of mango cultivars in Africa and identify traditional cultivars.

Centers of diversity and dispersal bottlenecks
Traditionally, crops are thought to have a center of diversity near where they were originally domesticated (Vavilov, 1987) and experience a loss of this baseline diversity as the result of introduction bottlenecks (Cooper et al., 2001;Van de Wouw et al., 2010). However, relatively few studies have sought to quantify the introduction bottlenecks experienced by perennial species during domestication or test for centers of origin for these species. Whereas most scholars believe that mango was domesticated in India, the existence of two morphologically distinct mango cultivar types has previously led some to suggest that Indochina played an important role in the origin and domestication of M. indica (Bompard, 2009;Iyer & Schnell, 2009). Analyzing cultivars from seven geographical regions, we find little evidence that mango has a center of diversity in India or that it experienced a secondary genetic bottleneck during its dispersal into new regions of the world. In fact, by most measures, the Indian population of mango cultivars has lower diversity than populations from other regions of the world (Table 1). Similarly, although we find that the Southeast Asian (Indochinese and Malesian) population contains unique genetic variation, including a large number of private alleles (Table 1; Fig. 5), it did not consistently have the highest measures of diversity. Rather than mango germplasm having a center of genetic diversity that aligns with a purported center of origin in India or Southeast Asia, many measures of genetic diversity are slightly elevated in regions where mango is introduced: Africa, South America, the Caribbean and Mexico.
In the early 1900s, mango cultivation and breeding programs intensified in the Americas, especially in South Florida, which went on to produce many of today's most commercially important cultivars. The novel characteristics of these cultivars and their success in the global market led South Florida to be dubbed a secondary center of domestication (Knight & Schnell, 1994),

Research
New Phytologist although previous molecular work has shown this to be unfounded (Schnell et al., 2006). Our results confirm that Florida is not a center of mango genetic diversity. In fact, across all measures of population structure and genetic diversity, we found Floridian mangoes to have relatively low diversity compared to other populations (Table 1; Figs 3, 4). Additionally, phylogenetic analysis (Fig. 2) indicates that many of the Floridian cultivars appear to be closely related to one another, including the three most commercially important Floridian cultivars in this study, 'Tommy Atkins', 'Kent' and 'Keitt'. This finding highlights an important concern in perennial crop cultivation: the loss of diversity at the population level, rather than the individual level. Although most perennial species have high within-individual heterozygosity, they are often clonally propagated and therefore commercial orchards have virtually no population-level diversity, putting them at risk for disease outbreaks (Gross, 2012). The lack of diversity in commercial orchards is exacerbated when the most important commercial cultivars come from a narrow genetic base, as is the case for the three Floridian cultivars.

Insight into mango domestication history
Collectively, our results suggest that the history of domestication in mango has been more complex than assumed previously, and may follow one or two other trends seen in perennial crops: multiple domestications and interspecific hybridization with congeneric species (Miller & Gross, 2011;Warschefsky et al., 2014). Both of these phenomena are common in the course of perennial fruit crop domestication, a process that likely occurs on a broader geographical scale and over a longer period of time than it does in annual species (Miller & Gross, 2011). As reviewed by Miller & Gross (2011), perennial fruit crops that are known to have multiple origins include breadfruit (Artocarpus altilis), pecan (Carya illinoinensis), hazelnut (Coryus avellana), coconut (Cocos nucifera), olive (Olea europaea), apricot (Prunus armeniaca), peach (Prunus persica), pear (Pyrus communis), red raspberry (Rubus idaeus), blackberry (Rubus spp.) and jocote (Spondias purpurea). The list of perennial fruit crops that are the result of hybridization events between congeneric species (reviewed in Miller & Gross, 2011) is much longer, but includes sweet orange (Citrus sinensis), fig (Ficus carica), walnut (Juglans regia), avocado (Persea americana) and grape (Vitis vinifera).
In the case of mango, we find evidence supporting two cultivated gene pools that combine to create regions of elevated diversity outside the center(s) of origin. Furthermore, our results indicate that some of the genetic diversity present in modern-day mangoes may not have originated in India: we find clear evidence from indices of genetic diversity (percentage polymorphic, number of private alleles), phylogenetic analysis and two clustering methods (principal components analysis, STRUCTURE), that Southeast Asian cultivars contain unique genetic diversity compared to other populations of mango cultivars. Although the phylogenetic relationships within the M. indica clade are not well supported, the maximum-likelihood topology suggests that Southeast Asian mango cultivars diverged earlier than M. indica cultivars from other parts of the world. Bompard (2009) previously proposed that, despite archaeological and linguistic evidence, M. indica might have been domesticated independently in India and Indochina. Another possibility is that mango was initially cultivated in Southeast Asia and later improved and further domesticated in India. Still, the high number of congeneric species endemic to Indochina and Malesia and previous evidence of interspecific hybrids in Mangifera (Kostermans & Bompard, 1993) suggest that the novel diversity seen in Indochinese cultivars could be the result of genetic introgression. However, given that Caribbean and South American populations of mango exhibit some differentiation from Indian populations, it remains possible that the divergence seen between Indian and Southeast Asian mango cultivars is the result of selection for environmental or cultural and culinary purposes. In Southeast Asia, for example, mango cultivars are commonly consumed in savory dishes at the immature, 'green' stage, and there is undoubtedly some selection for cultivars that are best when eaten at this early stage. Teasing  apart the seemingly complex history of domestication in mango requires more thorough sampling of wild M. indica, Indian, Indochinese, and Malesian mango cultivars and landraces, along with additional samples from closely related Mangifera species in India and Indochina, many of which were not included in the present study.

Remaining gaps and future goals
We observed neither a center of diversity in India or Florida nor a loss of diversity associated with mango's dispersal into Africa and the Americas, yet this line of inquiry deserves additional attention. Given that population structure has been observed within Indian mango germplasm (Ravishankar et al., 2000(Ravishankar et al., , 2015Kumar et al., 2001;Karihaloo et al., 2003;Damodaran et al., 2012;Vasugi et al., 2012;Surapaneni et al., 2013;Singh, 2016), we made an effort to include a diverse subset of Indian cultivars in our analysis; however, it is possible that the individuals included here do not fully encompass the diversity present in India. Additionally, sampling from within Africa was restricted because of the limited number of African cultivar accessions in the FTBG genebank. Future efforts should be made to address the lack of African germplasm in US collections and refine our understanding of the phylogeography of mango in Africa, particularly given the diversity which we observed in African germplasm. Simulation studies have shown metrics of diversity calculated from RADseq datasets may be inflated because of allele dropout and large amounts of missing data (Gautier et al., 2012;Arnold et al., 2013); therefore, we restricted the amount of missing data in our dataset. Contrary to these expectations, our estimates of gene diversity in mango were lower than those from the only other comparable report. Sherman et al. (2015) estimated gene diversity from transcriptome-derived single nucleotide polymorphism (SNP) markers in mango to have a median value of 0.28-0.43, roughly 1.5-2-fold higher than the average values calculated here. The explanation for this discrepancy is not immediately clear; however, more recent empirical work indicates that missing data may not inflate diversity indices in empirical datasets as much as was proposed initially (Hodel et al., 2017). One possibility for the observed differences in gene diversity between studies is that low sequence coverage and low tolerance for missing data at the interspecific level in the present study produced a dataset of highly conserved genomic regions, which are inherently less diverse (Huang & Knowles, 2016). As we progress toward a high-quality sequence of the mango genome (Singh, 2016;D. Kuhn, pers. comm.) better estimations of genome-wide heterozygosity in mango will be possible.
Here, we tested whether mango incurred a dispersal bottleneck by comparing cultivars from different regions of the world. However, the question of whether mango underwent a primary loss of diversity during the initial phases of domestication cannot be answered without including samples from mango's wild progenitors, although future analysis using coalescent simulations of demography may help shed light on this issue. For a number of reasons, it may be difficult to locate and identify mango's wild progenitor populations. As a result of intensifying land use in the native range of M. indica, it is possible that many populations of wild M. indica have been extirpated. Additionally, whether the individuals in this region truly represent wild M. indica or whether they are naturalized offspring of previously cultivated individuals may be difficult to determine. Naturalized mango trees are frequently observed in the Neotropics, and, to the casual observer, appear to be wild (Bompard, 2009). Further complicating this problem is the fact that many closely related Mangifera species bear remarkable resemblance to cultivated mango, and common names of these species are often translated to "wild mango" (Kostermans & Bompard, 1993;E. Warschefsky, pers. obs.). The identification and in situ and ex situ conservation of wild populations of M. indica and its closest relatives is of critical importance to understanding the history and improving the future of 'The King of Fruits'.

Supporting Information
Additional Supporting Information may be found online in the Supporting Information section at the end of the article.

Table S1
Metadata for samples analyzed.

Table S2
Average STRUCTURE group assignment for mango cultivars from geographical regions for three values of K.

Table S3
Average STRUCTURE group assignment for mango cultivars from geographical regions and Mangifera species for three values of K.
Please note: Wiley Blackwell are not responsible for the content or functionality of any Supporting Information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.