Revisiting the Iberian honey bee (Apis mellifera iberiensis) contact zone: maternal and genome‐wide nuclear variations provide support for secondary contact from historical refugia

Dissecting diversity patterns of organisms endemic to Iberia has been truly challenging for a variety of taxa, and the Iberian honey bee is no exception. Surveys of genetic variation in the Iberian honey bee are among the most extensive for any honey bee subspecies. From these, differential and complex patterns of diversity have emerged, which have yet to be fully resolved. Here, we used a genome‐wide data set of 309 neutrally tested single nucleotide polymorphisms (SNPs), scattered across the 16 honey bee chromosomes, which were genotyped in 711 haploid males. These SNPs were analysed along with an intergenic locus of the mtDNA, to reveal historical patterns of population structure across the entire range of the Iberian honey bee. Overall, patterns of population structure inferred from nuclear loci by multiple clustering approaches and geographic cline analysis were consistent with two major clusters forming a well‐defined cline that bisects Iberia along a northeastern–southwestern axis, a pattern that remarkably parallels that of the mtDNA. While a mechanism of primary intergradation or isolation by distance could explain the observed clinal variation, our results are more consistent with an alternative model of secondary contact between divergent populations previously isolated in glacial refugia, as proposed for a growing list of other Iberian taxa. Despite current intense honey bee management, human‐mediated processes have seemingly played a minor role in shaping Iberian honey bee genetic structure. This study highlights the complexity of the Iberian honey bee patterns and reinforces the importance of Iberia as a reservoir of Apis mellifera diversity.


Introduction
Clinal patterns in gene frequencies can be generated by random genetic drift under an isolation-by-distance scenario. Alternatively, clinal variation may be shaped by selection acting within a continuous population (primary intergradation) or, more frequently, may originate from contact between populations that diverged in allopatry (secondary contact). Distinguishing primary intergradation from secondary contact can, however, be a difficult undertaking because both processes may generate similar patterns of genetic variation (Endler 1977; Barton & Hewitt 1985. Population genomics provides a suitable framework in which to more effectively unravel such levels of complexity. In population genomics, outlier tests are applied to genome-wide sampling of multiple populations to dissect out adaptive variation, leaving a background of neutral and near-neutral variation (Luikart et al. 2003). Cline analysis can then help reveal whether dissected patterns of variation originated from secondary contact or primary intergradation. If multiple coincident clines are identified (Endler 1977;Barton & Hewitt 1985) and these clines reflect changes in neutral loci, there is strong support for recent secondary contact (Durrett et al. 2000). Unless many independent loci respond similarly to a single environmental gradient or mosaic, clinal patterns of neutral variation and multiple coincident clines are not expected when primary intergradation is the leading process shaping variation (Durrett et al. 2000).
The Iberian Peninsula provides one of the most interesting settings in Europe for studying contact zones. High geological, physiographical and climatic complexity and diversity, together with isolation from Europe and proximity to Africa (especially at the Strait of Gibraltar), made this southernmost European region an important refuge during the Quaternary glaciations (reviewed by Hewitt 2000) and a bridge, for the more vagile organisms, between the two continents (Carranza et al. 2004;Cosson et al. 2005;Guillaumet et al. 2006;Whitfield et al. 2006;Wallberg et al. 2014). These features made Iberia not only a place of divergence during periods of isolation but also a contact zone during periods of expansion as reported for a wide array of plant and animal taxa (extensively reviewed by , including the focal organism of this study: the Iberian honey bee, Apis mellifera iberiensis. Disentangling diversity patterns in populations that have possibly experienced recurrent cycles of contraction, expansion, admixture, and adaptation, typical of long-term glacial refugia, is a challenging endeavour. Contemporary human-mediated processes, which in the case of the honey bee may involve movement of colonies within (transhumance) and between lineages (introduction of commercial queens), selective breeding, and accidental introductions of exotic pests and diseases, may further complicate this effort by erasing or obscuring the genetic signatures imprinted by evolutionary and demographic processes. Fortunately, however, Iberia is the best-studied refugial area in Europe, and common patterns are emerging from comparative phylogeography (G omez & Lunt 2007) that are of great assistance in elucidating patterns exhibited by the Iberian honey bee. Additionally, the honey bee genome has been sequenced and a SNP panel is available for conducting genome-wide sampling of multiple populations (Whitfield et al. 2006;Ch avez-Galarza et al. 2013).
Differential and complex diversity patterns emerged from the numerous biparental and maternal surveys of Iberian honey bees and the underlying processes shaping genetic variation remain controversial. Arguments based on selection, demography, and contemporary human-mediated processes have been favoured by different authors. Morphology (Ruttner et al. 1978;Cornuet & Fresnaye 1989) and the allozyme malate dehydrogenase (Nielsen et al. 1994;Smith & Glenn 1995;Arias et al. 2006) exhibited a smooth latitudinal cline extending from North Africa to France supporting a hypothesis of primary intergradation (Ruttner et al. 1978). In contrast, the abrupt transition from highly divergent M mitotypes in the northeastern half of Iberia to A mitotypes in the southwestern half (Garnery et al. 1995;Franck et al. 1998;Arias et al. 2006;Miguel et al. 2007;C anovas et al. 2008) was more compatible with secondary contact (Smith et al. 1991). To complicate matters further, microsatellites did not capture the signal of a contact zone in Iberia (Franck et al. 1998;Garnery et al. 1998b;C anovas et al. 2011;Miguel et al. 2011), but detected instead a sharp disruption between Iberian and northern African populations (Franck et al. 1998). This latter finding prompted a third hypothesis that explained occurrence of A mitotypes in Iberia by human-assisted introductions of African colonies during Muslim occupation, with selection acting to maintain the morphological, allozymic and maternal clines (Franck et al. 1998). The hypothesis of selection as the driving force shaping the Iberian cline was recently addressed in a genome-wide SNP scan conducted in a fine-scale sample that covered the entire Iberian honey bee range (Ch avez-Galarza et al. 2013). This study detected signatures of selection in the Iberian honey bee genome, suggesting that this evolutionary force has had an important role in structuring Iberian honey bee diversity.
Here, we built from those findings to provide, at both geographic and genomic levels, the most comprehensive characterization of the Iberian honey bee diversity patterns performed until now. We employed multiple clustering approaches and cline analysis to examine the genome-wide SNP data set using a population-genomics framework. After analysing the patterns of variation generated by the complete SNP data set, we removed any SNPs putatively associated with selection identified by Ch avez- Galarza et al. (2013) and then used concurrently a mtDNA locus and 309 remaining neutrally tested SNPs to address the following questions: (i) How effective are SNPs in capturing clinal variation?a signal that microsatellites have failed to detect, (ii) How concordant are the patterns generated by the complete and the neutral SNP data sets? (iii) Do neutral SNPs capture the clinal signal? (iv) Does the mtDNA marker confirm the presence of a cline formed by two highly divergent lineages, as documented by earlier studies? and How concordant are the patterns of neutral and maternal variation? If variation originated from secondary contact, then we expect neutral SNPs to detect clinal patterns and multiple coincident clines. In contrast, coincident neutral clines are not expected if variation originated via primary intergradation. Further evidence for secondary contact will come from comparisons of mtDNA and nuclear DNA. If a maternal cline formed by two highly divergent lineages is observed and this cline is paralleled by nuclear DNA variation, then there is strong support for secondary contact. However, given that Iberian honey bees are managed organisms, it is possible that human-mediated processes have obscured historical patterns. Therefore, we also asked the question: (v) To what extent do contemporary human-mediated forces influence the Iberian honey bee structure?

Sampling
Sampling in Iberia was conducted in 2010 across three north-south transects (Fig. 1). One transect extended along the Atlantic coast [Atlantic transect (AT)], one through the centre [central transect (CT)], and another along the Mediterranean coast [Mediterranean transect (MT)]. A total of 711 honey bee haploid males were collected in the three transects from 23 sites (AT=8, CT=9; MT=6) representing 237 apiaries and 711 colonies. Samples were stored in absolute ethanol at -20°C until molecular analysis. Global positioning system (GPS) coordinates were recorded in the field for each apiary. Further details on sampling procedure can be found in Ch avez- Galarza et al. (2013).

DNA extraction and marker genotyping
Using a phenol-chloroform-isoamyl alcohol (25:24:1) protocol (Sambrook et al. 1989), total DNA was extracted from the thorax of the 711 individuals, each representing a single colony. Analysis of mtDNA was based on the very popular tRNA leu -cox2 intergenic locus. This region was amplified using the primers and PCR conditions detailed elsewhere (Garnery et al. 1993). After PCR amplification, the products were sequenced in both directions. Analysis of nuclear DNA was based on SNPs, which were genotyped using Illumina's Bead-Array Technology and the Illumina GoldenGate â Assay with a custom Oligo Pool Assay (Illumina, San Diego, CA, USA) following manufacturer's protocols. Genotype calling was performed using ILLUMINA'S GENOMESTUDIO â data analysis software.
Of the 1536 SNPs included in the GoldenGate array, 383 passed the quality filtering and were polymorphic in A. m. iberiensis, using a cut-off criterion of > 0.98 for the most common allele. The 383 SNPs (referred hereafter as the complete SNP data set) were scanned for signatures of selection using four multiple-population F ST -based methods and the spatial method matSAM (Ch avez-Galarza et al. 2013). The 74 outlier loci detected by at least one of the five methods were removed from the complete data set, leaving 309 neutrally tested SNP loci (referred hereafter as the neutral SNP data set). Chromosomal locations in honey bee linkage groups and a summary of physical distances of the SNPs are shown in Table S1 (Supporting information). Details of SNP genotyping and detection of outlier loci can be found in Ch avez- Galarza et al. (2013).

Mitochondrial DNA analysis
The sequences produced for the tRNA leu -cox2 intergenic locus were aligned using MEGA version 5.03 (Tamura et al. 2011). For the identification of mitotypes, the sequences were compared with those available in GenBank. Each mitotype was then assigned to western European (M), eastern European (C), and African (A) lineages or to an African sublineage (A I , A II , A III ), according to the complex architecture of the tRNA leu -cox2 intergenic region described elsewhere (Garnery et al. 1993;Franck et al. 1998Franck et al. , 2001Rortais et al. 2011;Pinto et al. 2012). In short, the intergenic region is composed of two elements: the P (size varies between~53 and 68 bp) and the Q (size varies between~194 and 196 bp). A combination of point mutations and indels in the P element distinguishes honey bee subspecies from different lineages. The number of Q elements can vary between one and four, although the number of repeats is not lineage specific. Given the highly variable size of the sequences resulting predominantly from the variable number of Q elements, the sequences were trimmed at the end of the first Q with additional Qs coded as present/absent. Therefore, the~627-bp sequence fragment analysed herein encompassed the 5 0 end of the tRNA leu gene, the P element, the first Q element, the coding relative to the other Q elements, and the 3 0 end of the cox2 gene. Relationships among the sequences were inferred using the median-joining network algorithm (Bandelt et al. 1999), as implemented in the program NETWORK version 4.6.1.1 (Fluxus Engineering, Clare, UK; http://www.fluxusengineering.com). The trimmed~627-bp sequences examined in this study did not diverge from those downloaded from GenBank. Therefore, GenBank sequences were not included in the network analysis.

Estimation of structure by nonspatial approaches
Structure was inferred from the complete and the neutral SNP data sets using two approaches: the Bayesian model-based STRUCTURE and the model-free discriminant analysis of principal components (DAPC). These approaches were used to estimate the proportion of an individual's genome (Q) that originated from a given genetic group or cluster.
The Bayesian clustering approach was implemented in STRUCTURE 3.4 (Pritchard et al. 2000) for haploid data using the admixture ancestry and correlated allele frequency models run with the unsupervised option. The program was set up for 750 000 Markov chain   Monte Carlo (MCMC) iterations after an initial burn-in of 250 000, which was sufficient to reach convergence. Over 20 independent runs for each number of clusters (K), from 1 to 7, were performed to confirm consistency across runs. The Greedy algorithm, implemented in the software CLUMPP 1.1.2 (Jakobsson & Rosenberg 2007), was used to compute the pairwise 'symmetric similarity coefficient' between pairs of runs and to align the 20 runs for each K. The means of the permuted results were plotted using the software DISTRUCT 1.1 (Rosenberg 2004). The optimal K value was determined using Evanno's DK (Evanno et al. 2005) and Campana's DF st methods (Campana et al. 2011) in STRUCTURE HARVESTER web v0.6.93 (Earl & Von Holdt 2012) and CORRSIEVE 1.6-8 package (Campana et al. 2011), respectively.
The DAPC clustering approach was implemented in ADEGENET 1.3-9 package for R (Jombart 2008). Simulation studies have shown that DAPC performs as well or better than STRUCTURE, particularly under more complex structuring scenarios (Jombart et al. 2010;Klaassen et al. 2012). DAPC provides a description of the genetic clustering using coefficients of the alleles (loadings) in linear combinations and seeks to maximize betweengroups variance and minimize within-group variance in these loadings (Jombart et al. 2010). Successive K-means clustering runs (from 1 to 40) were also incorporated in the analysis to estimate the optimal number of distinct clusters (K) based on the Bayesian information criterion (BIC). The optimal K value is associated with the lowest BIC value (Jombart et al. 2010).

Estimation of structure by spatial approaches
Spatial structure was inferred from the complete and the neutral SNP data sets using two approaches that explicitly incorporate information on geographic coordinates for genotyped individuals: the model-free multivariate spatial principal component analysis (sPCA) and the Bayesian model-based TESS. The sPCA is a modification of PCA which takes into account not only the genetic variance of individuals or populations but also their spatial autocorrelation (measured by Moran's I). This approach disentangles global structures (clines, patches or intermediates) from local structures (strong genetic differences among neighbours), and from random noise (random distribution of allelic frequencies among individuals or populations on a connection network). While global structures display positive spatial autocorrelation (high positive eigenvalue), local structures display negative spatial autocorrelation (high negative eigenvalue) ).
The sPCA was performed in ADEGENET using the K-nearest neighbours to model the spatial connectivity among individuals. To test for global and local structures, a Monte Carlo test was implemented using 10 000 permutations. The Bayesian model-based clustering approach implemented by the software TESS (Chen et al. 2007) incorporates spatial population models assuming geographic continuity of allele frequencies by including the interaction parameter Ψ, which defines the intensity of two neighbouring individuals belonging to the same genetic cluster. The incorporation of trend surface and a Gaussian autoregressive residual term allows for capturing global and local patterns. The software TESS 2.3.1 was run for haploid data using the convolution admixture model (BYM), correlated allele frequency and a trend degree surface of 1. A Euclidean distance matrix was used to weight the spatial connectivity among individuals. Five runs were carried out at each K, from 2 to 7, with 5 000 000 MCMC total sweeps including a burn-in of 1 250 000 sweeps. For each run, the deviance information criterion (DIC) was calculated and the values of all runs were averaged and plotted against K. The first lower DIC value represents the optimal K for the data. As in STRUCTURE, the software programs CLUMPP and DISTRUCT were used to obtain the average matrix of membership proportions (Q) for each K and for graphical representation.

Geographic cline analysis
Geographic clines were estimated for mtDNA frequency, individual SNPs frequency, and mean Q per sampling site inferred from both the complete and the neutral SNP data sets by STRUCTURE at K = 2. Cline analysis of individual SNPs was performed on a subset of loci with an absolute allele frequency difference > 0.2. The rationale behind this choice was that the SNP panel used in this study was ascertained from the reference genome of A. mellifera (sequenced from the North American DH4 strain, which was primarily A. m. ligustica, a subspecies belonging to the C lineage) and genome sequence traces of Africanized honey bees (largely the African A. m. scutellata admixed with the genomes of A. m. ligustica and the M-lineage A. m. mellifera), and is thus underrepresented for diagnostic SNPs and SNPs with large allele frequency differences between the two maternal lineages identified in Iberian honey bees. Using a subset of loci with larger allele frequency difference between the groups, we expected to increase the ancestry information of individual loci for cline analysis. Of the 383 SNPs, 33 (17 neutral and 16 under selection, as identified by Ch avez-Galarza et al. 2013) conformed to the frequency criterion.
Sampling sites were arranged along a transect beginning at the westernmost location (Lisbon, Portugal) and ending at the easternmost location (Girona, Spain) ( Fig. 1). Each sampling site was assigned a distance along this transect, which corresponded to the shortest straight-line distance between it and Lisbon, calculated using the 'harversine' approach (www.movable-type. co.uk/scripts/latlong.html). The cline shape was modelled using the package HZAR v2.5 (Derryberry et al. 2014), which fits allele frequency data to equilibrium geographic cline models (Szymura & Barton 1986Barton & Gale 1993;Gay et al. 2008) using the Metropolis-Hastings Markov chain Monte Carlo algorithm. The following cline shape parameters were estimated: centre (c, distance from sampling location), width (w, 1/maximum slope), delta (d, distance between the centre and the tail) and tau (s, slope of the tail). The allele frequencies at the top and bottom of the cline (Pmin and Pmax) were either fixed or free to vary. Three sets of five cline models were fitted: model set 1 had no scaling (Pmin = 0, Pmax = 1), model set 2 had fixed scaling (Pmin = observed minimum, Pmax = observed maximum), and model set 3 allowed Pmin and Pmax to vary. Within each model set, scaling and tails were fixed or free to vary. These models were compared to a null model of no clinal transition using the Akaike Information Criterion corrected (AICc). The best-fitting model had the lowest AICc value. To evaluate coincidence among cline centre positions and concordance among cline widths, the composite likelihood method (Phillips et al. 2004) was used. Likelihood profiles were constructed for both c and w to compare alternative hypotheses across loci: H1, all loci are characterized by statistically indistinguishable c and w values and are likely to share a common c/w; and H2, each locus has its own independent c and w values. Composite log-likelihood profiles were constructed by summing log-likelihood (ML) profiles for all individual SNP loci ML(H1). This composite log-likelihood profile was compared to the sum of all maximum-likelihood estimates for individual SNP loci ML(H2) using a likelihood ratio test (LRT). If the clines of individual SNP loci coincide and have the same c/w values, ML(H1) is not significantly different from ML (H2) (ML = ML(H2) -ML(H1) % 0). Conversely, if the clines do not coincide, ML(H1) is significantly smaller than ML(H2) (ML > 0). The significance of any difference of ML(H1) and ML(H2) was determined using a chi-square test with nÀ1 degrees of freedom (a = 0.05). This approach was similarly employed to evaluate coincidence and concordance of mtDNA and both SNP data sets.

Linkage disequilibrium and genetic diversity
Linkage disequilibrium (LD) between all pairs of neutral SNPs was estimated using the statistic r 2 (Hill & Robertson 1968), as implemented by the software DNASP 5.10.1 (Librado & Rozas 2009). Significant LD was identified at the 5% level using Fisher's exact test. Unbiased haploid genetic diversity (uh) for the neutral SNPs was calculated using the program GENALEX 6.5 (Peakall & Smouse 2012). Values of LD and uh were calculated for each sampling site and then projected along the Lisbon-Girona transect (Fig. 1).

Statistical tests
Differences in individuals' Qs between clustering approaches were assessed using the Mann-Whitney-Wilcoxon test (Wilcoxon 1945;Mann & Whitney 1947). Genetic structure inferred by the different clustering approaches was compared using Pearson's correlation coefficient (r). Whenever applicable, statistical significance levels were adjusted for multiple comparisons using the Bonferroni procedure to correct for type I error (Weir 1996). These analyses were implemented in R (R Development Core Team 2013).

Structure estimated by nonspatial approaches
Genetic structure inferred from the 711 Iberian honey bee individuals and the 309 neutral SNP data set using the Bayesian model-based clustering algorithm, implemented in STRUCTURE, and the model-free DAPC clustering algorithm, implemented in ADEGENET, is shown in Fig. 2a (for K = 2) and Fig. S1 (Supporting information) (K = 3 to 5). The optimal number of clusters (K) varied between two, when estimated by DF st and BIC, and four, when estimated by DK (Fig. S2, Supporting information). Incongruent optimal K values are often obtained by different methods (Campana et al. 2011), especially in the presence of low levels of population differentiation (Waples & Gaggiotti 2006), which is the case of the Iberian honey bee (global Φ PT = 0.020; pairwise Φ PT values ranged from 0.000 to 0.046, but see Table S2 and Fig. S3, Supporting information for pairwise comparisons across the study area). Given that two of three measures agreed on an optimal K = 2 and the presence of two maternal lineages in Iberia (this and previous studies), it is likely that the number of clusters that best represents the maximum population structure is two.
At the optimal K = 2 (Fig. 2a, Fig. S4, Supporting information), a concordant geographic pattern was produced by DAPC and STRUCTURE (r = 0.79, P-value = 0.0000 for individual Q values), although a deeper subdivision was inferred by the former than by the latter clustering approach, as measured by Q (P-value = 0.0000 for comparisons of individual Q values, Mann-Whitney-Wilcoxon test). Membership proportions estimated by STRUCTURE showed that most individuals (ranging from 53.3% in CT2 to 88.9% in MT1; see the bar below each clustering plot in Fig. 2a) from sampling sites near the Pyrenees were assigned with high posterior probability (Q ≥ 0.80) to the blue cluster. The percentages increased considerably when Q was inferred by DAPC, ranging from 50.0% in CT3 to 93.3% in MT2. Individuals with Q ≥ 0.80 in the red cluster were common in the southern sampling sites of the Atlantic transect (ranging from 60.6% in AT5 to 63.3% in AT7) and rare in the Mediterranean transect (ranging from 0% in MT1-2 to 13.3% in MT6). However, again, the percentages increased considerably when Q was inferred by DAPC. Individuals exhibiting admixed proportions (Q to any cluster ≤ 0.80) prevailed in the northern part of the Atlantic transect and in the southern part of the central and Mediterranean transects when inferred by STRUCTURE, although they were rare when inferred by DAPC (Fig. 2a).
When genetic structure was inferred from the complete SNP data set (Fig. 2b), a deeper phylogeographical signal was captured by both clustering approaches, as measured by Q (P-value = 0.0000 for comparisons of individual Q values inferred from the complete and neutral data sets with STRUCTURE and DAPC; Mann-Whitney-Wilcoxon test). Nonetheless, inclusion of the 74 putatively selected SNPs in the data set did not qualitatively change the overall geographic patterns of hybridization across Iberia, while providing additional support for an optimal K = 2, this time simultaneously estimated by the three methods DK, DF ST and BIC (Fig. S2, Supporting information). The mean Q estimated with STRUCTURE from both the neutral and complete SNP data sets is shown at the sampling site level in Fig. 2c (see Fig. S5, Supporting information for the corresponding DAPC plot). While further confirming the nearly concordant patterns across Iberia, this representation revealed a more abrupt transition from the blue to the red cluster in the central and Mediterranean transects than in the Atlantic transect, which suggests a contact zone located towards the northeastern part of Iberia.

Comparing maternal pattern with neutral structure
A median-joining network of a~627-bp fragment of the tRNA leu -cox2 intergenic mitochondrial region confirms the presence of the two highly divergent African (A) and western European (M) lineages in Iberia (Fig. 3a). The two lineages show a highly structured geographic pattern of distribution in Iberia. Mitotypes belonging to the M lineage were predominant in the northeastern half, whereas mitotypes belonging to the A lineage were fixed or almost fixed in the southwestern half of Iberia. While some sampling sites displayed a mixture of M and A mitotypes, the geographic distribution of the maternal lineages reveals a sharp northeasternsouthwestern trend.
Membership proportions inferred by STRUCTURE (Fig. 3b, c) and DAPC (Fig. S6, Supporting information) from neutral SNPs were contrasted with mtDNA data, at both sampling site and individual levels. At the sampling site level, the partitioning of neutral SNP variation into two clusters corresponded remarkably to M and A maternal lineages (r = 0.81 for both DAPC and STRUCTURE vs mtDNA; P-value < 0.0000). At the individual level, the correlations were weaker (r = 0.46 for DAPC, r = 0.60 for STRUCTURE), yet significant (P-value < 0.0000), suggesting differential gene flow among genomic compartments. Genome partitioning of individuals at greater values of K produced increasingly complex patterns (Fig. 3c and Fig. S1, Supporting information). At K = 4, a pronounced east-west structuring of neutral variation was revealed. The Atlantic populations were clearly distinct from the other populations, and a north-south trend becomes more apparent in this transect. The nuclear pattern is consistent with maternal variation partitioned into African sublineages (Fig. 3c). Sublineage A III mitotypes were common in northern Atlantic populations and were gradually replaced by sublineage A I mitotypes towards the south. In contrast to Atlantic populations, sublineage A III mitotypes were virtually absent in populations of central and Mediterranean transects, which were dominated by sublineage A I mitotypes.

Structure estimated by spatial approaches
A number of studies have questioned the use of STRUC-TURE for studying populations exhibiting continuous spatial distribution of genetic diversity (Serre & P€ a€ abo 2004;Rosenberg et al. 2005). To address this issue, patterns of variation in Iberia were further investigated using spatially explicit approaches implemented by sPCA (Fig. 4) and TESS (Fig. 5).
Analysis of neutral SNPs using sPCA showed that one global axis and one local axis were retained, indicating the existence of both global and local spatial structures in Iberia. The interpolation of the first global score, which was associated with a strong autocorrelation (Moran's I = 0.639), detected two clusters forming a cline (Fig. 4a) concordant with nonspatial approaches. The second global score (Moran's I = 0.560) clearly differentiated the four northernmost sampling sites of the Atlantic transect and the southern half of central and Mediterranean transects (Fig. 4b). The third global score (Moran's I = 0.443) further partitioned the Atlantic populations into two groups (north and south) and the southern half of central transect (Fig. 4c). The northern half of the central transect was differentiated by the fourth global score producing a Moran's I = 0.392 (Fig. 4d).
While the global test corroborated the presence of global spatial structure (max(t) = 0.0017; P-value = 0.0001), there was also structure at the local level (max (t) = 0.0019; P-value = 0.0001). The first local score (Moran's I = -0.075) highlighted the differences among individuals of northern Atlantic and central transects (Fig. 4e), while the second local score (Moran's I = À0.071) differentiated the individuals from sampling sites in the middle part of the central transect (Fig. 4f).
The additional spatial approach performed using TESS further confirmed the neutral patterns obtained previously (Fig. 5). Two major clusters that largely overlapped those of nonspatial approaches (r = 0.76 for STRUCTURE vs TESS and r = 0.64 for DAPC vs TESS using individual Q values, P-value < 0.0000) were identified by TESS at each simulated K (Fig. S7, Supporting information). At the optimal K = 3 (Fig. S8, Supporting information), one additional minor cluster (mean Q = 0.023 in the white cluster) further partitioned the nuclear genomes of individuals mainly from the southern portion of the central transect (Fig. 5a). While TESS supported the major northeastern-southwestern cline and the contrasting patterns exhibited by Atlantic and Mediterranean populations, it did not capture the partitioning within the Atlantic transect, which was detected by the other clustering approaches and by mtDNA analysis.
The spatial patterns inferred from the complete SNP data set using both sPCA and TESS were largely concordant with those inferred from the neutral SNP data set, although, as observed with the nonspatial clustering approaches, a deeper phylogeographical signal was captured by the complete SNP data set (Figs S9 and S10, Supporting information).

Geographic cline analysis
The geographic clines were modelled for 33 (17 neutral and 16 selected) individual SNPs, mean Q obtained with the complete and neutral SNP data sets, and mtDNA (Fig. 6, Fig. S11, Supporting information). There was considerable variation in the identity of the best-fitting model among individual SNPs, SNP data sets and mtDNA (Table S3, Supporting information). The model 'Pmin/Pmax observedno tails' was fitted to 16 of the 33 SNPs and to the neutral SNP data set, whereas 'Pmin/Pmax fixedright tail' and 'Pmin/ Pmax fixedno tails' were fitted to the mtDNA and the complete SNP data set, respectively. The AICc values obtained for the null model of no clinal variation were higher for mtDNA (cline model = 255.1, null model = 516.3), complete SNP data set (cline model = 25.2, null model = 203.5) and neutral SNP data set (cline model = 33.2, null model = 137.2) than for any individual SNP (Table S3, Supporting information).
Estimates of cline centre positions and widths for the 33 individual SNPs, complete SNP data set, neutral SNP data set and mtDNA were highly variable (Table  S3, Supporting information). Coincidence analysis of the 33 SNPs revealed that 18 (Table S3, Supporting information), of which nine were neutral, could be constrained to share a common centre (LRT same-diff. = 26.88, 17 d.f., P-value > 0.05) at the consensus position of 665.2 km, as estimated by the likelihood profiles. The consensus centre of the 18 SNPs was coincident with those estimated for mtDNA (706.7 km) and for the complete (714.7 km) and neutral (725.7 km) SNP data sets (LRT same-diff. = 0-3.03, 1 d.f., P-value > 0.05 for all pairwise comparisons). Concordance analysis of the 33 SNPs revealed that 23 (Table S3, Supporting information), of which 11 were neutral, exhibited a similar width (LRT same-diff. = 1.60, 22 d.f., P-value > 0.05) of 1350 km. The consensus width of the 23 SNPs was concordant (LRT same-diff. = 0.3 -3.18, 1 d.f., P-value > 0.05 for all pairwise comparisons) with those estimated for the complete (1283.5 km) and neutral SNP data sets (1047.6 km), but not with that estimated for the mtDNA (580.9 km), which was significantly narrower (LRT same-diff. = 10.10-26.56, 1 d.f., P-value < 0.05 for all pairwise comparisons).

Linkage disequilibrium and genetic diversity
Genome-wide analysis of linkage disequilibrium (LD) between all possible pairs of neutral SNPs in each sampling site produced low levels of LD with mean r 2 values varying between 0.014 and 0.045 (Table S4, Supporting information). From a total of 701,362 pairwise comparisons, 14 687 pairs (2.09%), ranging from 1.49% to 3.41%, exhibited significant LD before Bonferroni correction (a single pair in CT4 remained significant after Bonferroni correction). Pairwise comparisons performed by linkage group also produced low LD values (data not shown). Levels of unbiased haploid diversity (uh) were low, ranging from 0.281 in the Atlantic transect (AT8) to 0.313 in the Mediterranean transect (M6 , Table S4, Supporting information). Interestingly, despite the low levels of LD and uh across the study area, a trend of elevated values was observed towards the centre of the cline and overlapping the consensus centre location (Fig. 7).

Discussion
Genetic studies of Iberian honey bees have revealed complex and often incongruent patterns of variation, which have led to competing hypothesis of primary intergradation (Ruttner et al. 1978) and secondary contact (Smith et al. 1991) as the leading mechanisms shaping patterns of variation. We examined a large number of individuals with a maternal locus and genome-wide SNPs and provided the most comprehensive portrait of clinal change across the entire Iberian honey bee distributional range. Our results support a signature of origin via secondary contact, which was still detectable despite intense beekeeping practices involving selective breeding and large-scale movement of colonies.

Nuclear and maternal patterns and cline origin
The multiple clustering approaches and the geographic cline analysis implemented on genome-wide SNPs collectively revealed a well-defined clinal pattern bisecting Iberia along a northeastern-southwestern axis, contrasting with the lack of microsatellite structure documented earlier (Franck et al. 1998;C anovas et al. 2011;Miguel et al. 2011).
The most commonly suggested mechanisms underlying clinal patterns in gene frequencies are random genetic drift with isolation-by-distance effects, selection across an environmental gradient (primary intergradation), and secondary contact between previously isolated and genetically divergent populations. The SNP patterns exhibited by the Iberian honey bees could be explained by any of these mechanisms. However, several aspects of our data are more consistent with an alternative model of secondary contact and introgression between divergent populations previously isolated in glacial refugia, as proposed for a growing list of other Iberian taxa (reviewed by G omez & Lunt 2007; Godinho et al. 2008;Gonc ßalves et al. 2009;Pinho et al. 2009;Miraldo et al. 2011;Carneiro et al. 2013;among others). First, while a deeper structure was retrieved by the complete SNP data set, excluding the 74 SNPs with signatures of selection (Ch avez-Galarza et al. 2013) did not qualitatively change the clinal pattern of variation. A similar historical signal emerged when selected loci were removed from the genome-wide SNP data set.
Second, the geographic cline analysis revealed that a large proportion (9 of 17) of the neutral individual SNPs and both SNP data sets share a common cline centre, indicating considerable genome-wide coincidence. Existence of multiple coincident clines argues for secondary contact (Barton & Hewitt 1981), especially if some of the clines reflect changes in selectively neutral loci (Durrett et al. 2000). Simulations on the origin of contact zones show that a signature of secondary Fig. 6 Map of the Iberian Peninsula with pie charts summarizing frequency data for each sampling site and plot of maximum-likelihood geographic cline for four neutral SNP loci (marked in bold), three selected SNP loci (see Fig. S11, Supporting information for the remaining 26 SNPs), mtDNA, and the Q values estimated with STRUCTURE from the neutral and the complete SNP data sets. The symbols * and + indicate the loci or data sets with coincident centre and concordant width, respectively (see Table S3, Supporting information). The dashed line placed in each map represents the transect traced from Lisbon (0 km) to Girona (1400 km) for the geographic cline analysis. contact, which is characterized by clinal variation at neutral loci and extensive disequilibrium at the centre of the contact zone, can persist for thousands of generations if neutral loci were tightly linked to selected loci (Durrett et al. 2000). If neutral loci were not tightly linked to selected loci, the initially steep clines, formed at the moment the two divergent groups meet, will gradually widen as the intermixing proceeds. In contrast, in the presence of primary intergradation, neutral loci will not vary clinally and disequilibrium between a neutral locus and a closely linked locus under selection will decay quickly (Durrett et al. 2000).
Further support for secondary contact is provided by LD and diversity patterns. Both parameters show a trend of elevated values towards the centre of the cline and overlapping the consensus centre location, as expected when two divergent populations meet. It should be noted, however, that the LD levels at the centre of the contact zone were unexpectedly low for recent contact, which may simply reflect the sparse distribution of the SNPs and the short scale of LD in honey bees. Indeed, the exceptionally high recombination rate in honey bees (Beye et al. 2006) would lead to a rapid decay of LD after admixture, as observed in the Africanization process in the New World (Pinto et al. 2005).
Finally, while the observed patterns can also be explained by alternative processes involving isolation by distance with some migration or by divergence in parapatry, the ultimate support for secondary contact comes from mtDNA. Congruent with previous surveys (Smith et al. 1991;Garnery et al. 1992Garnery et al. , 1995Franck et al. 1998;C anovas et al. 2008;Miguel et al. 2007), two highly divergent mtDNA lineages (African and western European) were identified and these lineages form a cline that closely parallels that of genome-wide SNPs, as revealed by both the clustering and the cline analyses. The mtDNA cline centre coincided with that of both the complete and the neutral SNP data sets and with most individual SNPs (18 of 33). In contrast, the mtDNA cline width was not concordant with any of the SNPs (either individual or data sets), which largely formed wider clines. Narrower maternal clines could arise from stronger drift on the smaller N e of the haploid marker (Polechova & Barton 2011), from divergent selection on mtDNA types (Yuri et al. 2009), or from the greater gene flow expected of nuclear loci (Endler 1977). Our results of narrower maternal clines compared to nuclear clines add to a body of research showing discordant cytonuclear transmission across contact zones (Gowen et al. 2014 and many references therein).
Whether secondary contact resulted from ancient range expansions from North Africa following climate amelioration of the last postglacial period (De la R ua et al. 2002;Pinto et al. 2013) or from recent introductions of the North African subspecies A. m. intermissa by the Arabs during Muslim occupation (Franck et al. 1998) is a matter of debate. A STRUCTURE analysis found no signs of A. m. intermissa genes, belonging to the African lineage, in Iberian populations, excepting for a residual component detected in sampling sites CT8 and CT9 nearby the Strait of Gibraltar (see Fig. S3, Supporting information in Ch avez-Galarza et al. 2013). This observation together with a report of deep differentiation between A. m. iberiensis and A. m. intermissa SNPs (Whitfield et al. 2006) means that a recent colonization event would have to be accompanied by a complete replacement of the nuclear, but not the mitochondrial, genomes of colonizers. This hypothesis assumes longterm male-biased gene flow, which would erode a signal of subdivision at the nuclear but not at the maternal level. The problem is that the latter scenario is not consistent with honey bee reproductive biology, as females have an important role in long-distance dispersal (Winston 1987). A much earlier event is therefore more likely to have been responsible for the patterns we see today.
The complexities and incongruences of Iberian honey bee patterns revealed by distinct genetic markers suggest an ancient history of allopatric divergence in Iberian refugia followed by postglacial range expansions and secondary contact. Iberia served as an important refuge during the cold periods of the Pleistocene in Europe (reviewed by . During this epoch, repeated cycles of contraction into and expansion out of multiple refugia shaped diversity patterns of great complexity in a variety of Iberian animal taxa (see Miraldo et al. 2011 and references therein), among which the honey bee is seemingly no exception. Estimates of genetic divergence from mtDNA (Arias & Sheppard 1996) and whole-genome nuclear DNA (Wallberg et al. 2014) suggest that the split among the four honey bee evolutionary lineages occurred between 670 000 and 300 000 years ago, respectively. Accordingly, colonization of Iberia across the Strait of Gibraltar (Ruttner et al. 1978;Whitfield et al. 2006;Han et al. 2012), from an origin in either Africa (Whitfield et al. 2006) or western Asia (Wallberg et al. 2014), likely occurred during Middle Pleistocene. Given dispersal abilities of the honey bees, it is plausible that they dispersed across the Iberian territory during interglacial periods and retreated to refugia during the glacial periods. Evidence from comparative phylogeography suggests that multiple refugia existed in Iberia ('refugia within refugia' paradigm of G omez & Lunt 2007). Two such refugia, one in the Mediterranean coast of northeastern Spain, possibly close to the Ebro valley, and another in the Betic ranges of southern Spain, were inferred from overlapping subdivision patterns exhibited by several Iberian taxa (G omez & Lunt 2007; see Fig. 1). The blue and the red clusters identified in Fig. 3 are consistent with the existence of these two putative refugia. Although the presence of multiple honey bee refugia is a tentative result, it is an idea that can be further explored and tested using the power of multiple gene genealogies analysis.

Influence of human-mediated processes in shaping variation
Complicating the interpretation of diversity patterns in honey bees are contemporary human-mediated processes. Honey bees native to Europe have long been subjected to human manipulation (Crane 1999), with a variable impact in their genetic composition (reviewed by De la R ua et al. 2009). In western Europe north of the Pyrenees, human-mediated movements of colonies between lineages (introduction of commercial queens) promoted variable levels of C-lineage introgression (Jensen et al. 2005;De la R ua et al. 2009;Soland-Reckeweg et al. 2009;Oleksa et al. 2011;Pinto et al. 2014) and even replacement of the native A. m. mellifera subspecies in some areas (Jensen et al. 2005). In contrast, the Iberian honey bee is relatively free of C-lineage genes (Miguel et al. 2007(Miguel et al. , 2011C anovas et al. 2008Pinto et al. 2013). A single colony harbouring a C-derived mitotype was scored in this study, and no signs of introgression were detected at the nuclear level in that colony and in the remaining 710 (but see Figs S3 and S4, Supporting information in Ch avez- Galarza et al. 2013).
While movements of colonies between lineages have not yet seriously threatened the Iberian honey bee genetic integrity, the lack of microsatellite structure (Franck et al. 1998;C anovas et al. 2011;Miguel et al. 2011) has been interpreted as an indication of high levels of gene flow aided by within-lineage movements associated with transhumance (C anovas et al. 2011). This interpretation, however, is inconsistent with our results that show congruent cytonuclear subdivision, local structure detected by the sPCA, and relatively low levels of LD, none of which support the large-scale influence of transhumance in the Iberian honey bee gene pool. A possible explanation for microsatellite patterns is that saturation of the mutation spectrum homogenized allele size distributions (Nauta & Weissing 1996). Such homogenization has been suggested before to explain a similar pattern in the European rabbit (Queney et al. 2001). Almost identical to what is observed for the Iberian honey bee, the European rabbit exhibits a northeastern-southwestern cline for mtDNA (Branco et al. 2000), allozymes (Campos et al. 2007;, and nuclear sequence data (Branco et al. 2002;Geraldes et al. 2008;Carneiro et al. 2013), but no clinal pattern for microsatellites (Queney et al. 2001).
The cytonuclear structure in Iberian honey bees is noteworthy given that in Spain, over one million colonies, representing~50% of existing colonies, have been yearly involved in wide-range movements, in the last decades (A. G. Pajuelo, personal communication). The fact that a marked clinal pattern in both the nuclear and mitochondrial genomes still persists indicates that human-mediated movements play a minor role in shaping Iberian honey bee genetic structure. Nonmutually exclusive explanations can be accounted for the observed pattern. Either transhumance takes place after the reproductive season, or some kind of reproductive barrier or local adaptation is preventing gene flow and long-term establishment of translocated colonies.

Concluding remarks
In this study, a well-defined northeastern-southwestern clinal pattern, revealed simultaneously by nuclear and maternal markers, provided support for the hypothesis of secondary contact proposed by earlier mtDNA studies. This finding, together with putative signatures of selection detected in a previous study (Ch avez-Galarza et al. 2013), suggests a complex interplay between adaptation and demography in shaping the Iberian honey bee patterns that we see today. Contemporary humanmediated processes do not seem to be dramatically changing these patterns, a scenario that might change if Spanish and Portuguese beekeepers adopt a strategy of using commercial C-lineage strains, as is occurring in several countries of western Europe (reviewed by De la R ua et al. 2009;Pinto et al. 2014). Iberian honey bees are providers of important environmental services through pollination and are number one honey producers in the European Union (European Commission 2013). More importantly, Iberian honey bees represent an important reservoir of diversity that not long ago colonized a broad territory in western Europe (Franck et al. 1998;Garnery et al. 1998a;Miguel et al. 2007). Understanding patterns and underlying processes shaping Iberian honey bee's diversity is an important first step towards preserving this subspecies and thereby the species Apis mellifera, an effort of unquestionable value as we face a worldwide honey bee crisis.

Data accessibility
SNP genotypes for the 711 Iberian honey bee individuals ordered by sampling location, from AT1 to MT6, in GenePop format: DRYAD entry doi 10.5061/ dryad.1kk2k.
Geographic coordinates of sampling locations and environmental data for the 711 Iberian honey bee individuals ordered by sampling location, from AT1 to MT6, in Excel format: DRYAD entry doi 10.5061/ dryad.1kk2k.
Aligned mtDNA sequences for 652 Iberian honey bee individuals ordered by sampling location, from AT1 to MT6, in Fasta format: DRYAD entry doi 10.5061/ dryad.21s3t.

Supporting information
Additional supporting information may be found in the online version of this article.

Table S1
Statistics of physical distances (bp) of the 309 neutral SNPs used in the genetic analysis of the Iberian honey bee.

Table S2
Pairwise Φ PT values among sampling sites estimated from the neutral SNP data set with GENALEX 6.5 (Peakall & Smouse 2012). Significance of φ PT estimates was assessed using 10,000 permutations. Global Φ PT value was 0.020 (P-value=0.001). Pairwise Φ PT values ranged from 0.000 to 0.046. Φ PT values marked in bold were significantly different from zero following Bonferroni correction. Sampling site codes are specified in Fig. 1.

Table S3
Cline parameter estimates for the best-fitting model of 33 SNP loci, complete SNP data set, neutral SNP data set and mtDNA. Cline width is presented as 1/maximum slope. Cline centre and width are measured in km, Pmin and Pmax are the allele frequencies at the ends of the cline, and d and s are the shape parameters for the mirror (M), left (L) and right (R) tails. Two log-likelihood unit support limits are presented in parentheses. The symbol * indicates coincident centre and the symbol + indicates concordant width, based on LRTs (Pvalue > 0.05). The left side AICc corresponds to the best-fitting model, and the right side AICc corresponds to the null model. Neutral SNP loci are marked in bold.

Table S4
Linkage disequilibrium (LD) and unbiased haploid genetic diversity (uh) estimated from the 309 neutral SNPs for the 23 sampling sites in the Iberian Peninsula (see Fig. 1 for location of sampling sites), as measured by r 2 and percentage of pairs of loci showing significant LD with Fisher's exact test [P LD (Fisher)] before Bonferroni correction (only a single pair remained significant after Bonferroni correction). None of the SNP pairs exhibiting significant LD before Bonferroni correction, for which there is genomic information, are physically linked as they are located in different chromosomes.  Graphical display of the three methods (DK, DF st and BIC) to predict the optimal K for the analysis of A. m. iberiensis population structure using the neutral SNP data set (309 loci) and the complete SNP data set (309 neutral plus 74 putatively selected loci identified by Ch avez-Galarza et al. 2013).

Fig. S3
Heat map of pairwise Φ PT values between Iberian sampling sites estimated from the neutral SNP data set (309 loci) using GENALEX 6.5 (Peakall & Smouse 2012). The heat map clearly highlights northeastern populations (CT1-3, MT1-2) and CT8 as the most differentiated across Iberia. (a) The 23 sampling sites are arranged from north to south in each of the three transects (see Fig. 1). (b) The 23 sampling sites are arranged along the west-east transect traced for the geographic cline analysis (see dashed line in Fig. 6).  Fig. 6). Plots represent each of the 711 individuals by a vertical bar partitioned into two coloured segments (blue and red) corresponding to membership proportions (Q) in each of the two clusters. Black lines separate indi-viduals from the 23 sampling sites, which are arranged from high Q (left) to low Q (right) in the blue cluster.      (a) Plot of individuals' Q at the optimal K = 3 clusters. Each of the 711 individuals included in the analysis is represented by a vertical bar partitioned into three coloured segments (blue, red, and white) corresponding to Q in each of the three clusters. Maternal data (M lineage in blue, C lineage in orange, and A lineage in red) are shown at the bottom. Sampling sites and individuals within sampling sites are arranged as in Fig. 3. (b) Map of the Iberian Peninsula showing the two major clusters (Q ≥ 0.5) interpolated by TESS. Dots represent the locations of sampled apiaries across the Atlantic (AT1-8), central (CT1-9) and Mediterranean (MT1-6) transects.

Fig. S11
Map of the Iberian Peninsula with pie charts summarizing frequency data for each sampling site and plot of maximum-likelihood geographic cline for neutral (marked in bold) and selected SNP loci. The symbols * and + indicate the loci or data sets with coincident centre and concordant width, respectively (see Table S3). The dashed line placed in each map represents the transect traced from Lisbon (0 km) to Girona (1400 km) for the geographic cline analysis.