Preservation methods of honey bee-collected pollen are not a source of bias in ITS2 metabarcoding

Pollen metabarcoding is emerging as a powerful tool for ecological research and offers unprecedented scale in citizen science projects for environmental monitoring via honey bees. Biases in metabarcoding can be introduced at any stage of sample processing and preservation is at the forefront of the pipeline. While in metabarcoding studies pollen has been preserved at − 20 °C (FRZ), this is not the best method for citizen scientists. Herein, we compared this method with ethanol (EtOH), silica gel (SG) and room temperature (RT) for preservation of pollen collected from hives in Austria and Denmark. After ~ 4 months of storage, DNAs were extracted with a food kit, and their quality and concentration measured. Most DNA extracts exhibited 260/280 absorbance ratios close to the optimal 1.8, with RT samples from Austria performing slightly worse than FRZ and SG samples (P < 0.027). Statistical differences were also detected for DNA concentration, with EtOH samples producing lower yields than RT and FRZ samples in both countries and SG in Austria (P < 0.042). Yet, qualitative and quantitative assessments of floral composition obtained using high-throughput sequencing with the ITS2 barcode gave non-significant effects of preservation methods on richness, relative abundance and Shannon diversity, in both countries. While freezing and ethanol are commonly employed for archiving tissue for molecular applications, desiccation is cheaper and easier to use regarding both storage and transportation. Since SG is less dependent on ambient humidity and less prone to contamination than RT, we recommend SG for preserving pollen for metabarcoding. SG is straightforward for laymen to use and hence robust for widespread application in citizen science studies.


Introduction
Pollen collected by honey bees (Apis mellifera L.), and further sampled from hives equipped with pollen traps, is frequently investigated in order to understand the floral environment (Bilisik et al., 2008;Danner et al., 2017;Drummond et al., 2018;Jones et al., 2021;Requier et al., 2015;Tosi et al., 2018). The collected pollen typically contains a wealth of information on biodiversity of different landscapes and on important plant sources and their seasonal variation (Brodschneider et al., 2019;Coffey & Breen, 1997;Danner et al., 2017;Donkersley et al., 2014;Lau et al., 2019). A honey bee forager usually prefers to collect pollen within a 1-to 1.5-km radius (3 to 7 km 2 ), although longer foraging distances by honey bees have also been recorded (Beekman & Ratnieks, 2000;Garbuzov et al., 2015). This vast territory that can be explored by hundreds of foragers makes honey bee colonies powerful environmental samplers not only of the floral resources but also of the substances accumulated on bee forage plants, like pesticides or pollutants (Tosi et al., 2018).
Forager bees collect pollen grains in specialised structures of their hind legs (corbiculae) and transport them to the hive. There, they are deposited in wax cells and the processed pollen (beebread) remains the only source of proteins and lipids for the colony (Brodschneider & Crailsheim, 2010). Numerous studies have sought to gain a broader understanding on honey bee biology and health by examining botanical diversity of pollen loads transported by foragers into the colony (Avni et al., 2014;Danner et al., 2016;Di Pasquale et al., 2013;Smart et al., 2016). For decades, this goal has been addressed by identifying honey bee-collected pollen loads through visual inspection of the grains' exine under a light microscope. However, because this method is time consuming, labour intensive, expert-knowledge dependent and frequently lacks taxonomic resolution, it is gradually being replaced by alternative approaches (Dunker et al., 2021) of which DNA metabarcoding is gaining increasing popularity (e.g. Bell et al., 2017;Cornman et al., 2015;Danner et al., 2017;Jones et al., 2021;Keller et al., 2015;Macgregor et al., 2019;Potter et al., 2019;Richardson et al., 2019;Smart et al., 2016).
Pollen DNA metabarcoding can achieve high taxonomic identification accuracy (Bell et al., 2018;Hawkins et al., 2015;Kraaijeveld et al., 2015;Richardson et al., 2015b) and, because it is based upon high-throughput sequencing, it allows simultaneous analysis of large numbers of samples at a much faster pace and lower cost than light microscopy (reviewed by Bell et al., 2016). While it is broadly acknowledged that metabarcoding can provide accurate lists of plant species (qualitative data) represented in bee-collected pollen mixtures, there is no clear consensus as to what extent it produces reliable estimates on their relative abundances (quantitative data) (Bell et al., 2018;Keller et al., 2015;Kraaijeveld et al., 2015;Pornon et al., 2016;Richardson et al., 2015b;Smart et al., 2017). Biases of varying sources can be introduced at any step of the sample processing pipeline, leading to inaccurate quantitative results (Bell et al., 2018). However, the major bias in DNA metabarcoding typically occurs in the polymerase chain reaction (PCR) step, when sequence variation in primer-binding sites or taxon-specific differences in amplicon length causes differential amplification rates between species (Bell et al., 2018;Pawluczyk et al., 2015;Piñol et al., 2019;Pompanon et al., 2012). In addition to PCR, biases can also be introduced downstream in the sample processing pipeline, during sequencing and taxonomic classification (Banchi et al., 2020;Richardson et al., 2017), and upstream, during DNA extraction step (Brooks et al., 2015;Pornon et al., 2016;Schiebelhut et al., 2017;Swenson & Gemeinholzer, 2021) or even further earlier during the tissue storage step (Delavaux et al., 2020;Feinstein et al., 2009;Rubin et al., 2013;Weißbecker et al., 2017). Poorly preserved tissue samples are more prone to DNA degradation and this may limit the length of fragments that can be successfully amplified by PCR, potentially influencing qualitative and quantitative outcomes of metabarcoding analysis (Pompanon et al., 2012). This type of bias is 1 3 particularly important if species in a mixed sample are differentially sensitive to degradation, which in the case of pollen may be dependent on the protection provided by the exine (Pacini & Hesse, 2005). To minimize preservation bias, samples should be collected and stored using protocols that prevent active DNA degradation (Liu et al., 2020). However, this can be a challenging endeavour in projects that rely on citizen scientists for collecting pollen samples throughout the honey bee season and storing them until molecular analysis.
Citizen science is becoming increasingly applied in ecological studies and also in bee research (Koffler et al., 2021;Miller-Rushing et al., 2012;Moro et al., 2021). The involvement of interested citizens has many advantages. In studies of pollen diversity in different landscapes, the most striking advantage is the extension of the available range of sampling sites, including also private land. Beekeepers have already participated in several studies involving pollen sampling via pollen traps attached to the hives of their colonies (Brodschneider et al., 2019;Drummond et al., 2018;Tosi et al., 2018). A spatially and temporally comprehensive study involving beekeepers collecting pollen has recently been undertaken in the framework of a research project aiming at developing a citizen science protocol for honey bee colonies as environmental bio-samplers. Between April and September of 2020, over 81 citizen scientists collected trapped pollen every two weeks in each of nine apiaries located in ten European countries . In this project, citizen scientists diligently collected and stored trapped pollen in − 20 °C freezers at their own premises and later on shipped the samples to the laboratory for metabarcoding analysis, always maintaining a cold chain. To enhance and upscale citizen science, simple, safe and reliable methods for preserving pollen DNA are needed. Simple and safe in this context means that cost-effective preservation materials are used that can safely and easily be handled by laymen. For instance, cold chain preservation, handling of ethanol or buffers by citizen scientists should be avoided.
The main goal of tissue preservation is to avoid DNA denaturation and degradation, which is accomplished by freezing, desiccating or buffering. Freezing can be done at − 20 °C or at − 80 °C for longterm archiving. At − 20 °C, there is still enzymatic activity and degradation, which can be minimised by adding 95% ethanol (Nagy, 2010). This is the most frequently used method for archiving animal tissue but not plant tissue, especially leaves or twigs (Alsos et al., 2020;Bressan et al., 2014;Chase & Hills, 1991;Doyle & Dickson, 1987;Murray & Pitas, 1996). Samples immersed in ethanol can also be kept at room temperature, although in this case preservation is only good for short-term periods (Nagy, 2010). Desiccation can be achieved by using simple (e.g. air drying, calcium sulphate, silica gel beads) or more complex physical processes (e.g. lyophilisation, cryodesiccation) or by using chemical desiccants (e.g. amyl acetate, xylene). Buffering is an alternative to freezing and desiccation and involves preservation in different buffers, e.g. EDTA, SDS and CTAB (see Nagy, 2010;Prendini et al., 2002, where all these methods are thoroughly reviewed). Pollen preservation for metabarcoding applications has typically been achieved by storing samples at − 20 °C (e.g. Bell et al., 2017;Cornman et al., 2015;Danner et al., 2016;Smart et al., 2016). Whether this is the best method to keep preservation bias to a minimum or whether other methods of storing pollen are suited to metabarcoding applications was unknown until this study.
To facilitate storage at the citizen scientist's premises of a possibly large number of pollen samples collected throughout the honey bee season, while at the same time assuring sample integrity for downward metabarcoding analysis, here we compared four different methods for preserving pollen, namely (1) freezing at − 20 °C, (2) storage in ethanol, (3) desiccation at room temperature and (4) desiccation with silica gel beads. Poorly preserved samples may hinder PCR amplification due to various forms of DNA damage, which reduce the average length of intact template for polymerase action (Pompanon et al., 2012). This may introduce a bias during PCR, affecting floral spectra determined for pollen samples from sequence reads generated by high-throughput sequencing. Therefore, it is important to test how the method of preserving bee-collected fresh pollen mixtures influences qualitative and quantitative performance of metabarcoding. The results will inform citizen science projects on the simplest and cheapest method for preserving pollen without compromising the accuracy of downstream botanical identification via metabarcoding.

Materials and methods
Preliminary test: determining the amount of silica gel for efficient pollen desiccation Prior to comparing performance of the four different pollen preservation methods, we tested the amount of silica gel (SG) required for efficient desiccation of 5 g of freshly collected pollen. Therefore, pollen was harvested on July 13th 2020 from traps placed at beehive entrances in an apiary in Denmark (Fig. 1). The pollen gathered for a ~ 12-h period was thoroughly mixed and split into 30 homogeneous replicates of 5 g each. Twenty-three replicates were transferred to individual porous paper 'tea' filters and placed into individual 125-mL capped vials together with a sachet containing 1 g (N = 7), 5 g (N = 8), or 10 g (N = 8) of SG. Six replicates were exposed at room temperature (RT). Finally, one replicate was set aside for determining the baseline values for water activity and content on the sampling day (Table 1).
Water activity was measured in a total of 13 replicates (Table 1) on July 13th (day 0), 15th (day 1.5), 17th (day 3.5) and 20th (day 6.5) using a LabSwiftaw (Novasina, Lachen, Switzerland) water activity meter (non-destructive method). Water activity is a measure of the amount of free water (available for microbial growth), which ranges from zero to one (pure water). Water content was determined for the same samples as weight percentage of water by drying at 105 °C for 90 min (destructive method) in a KERN DAB 100-3 (Balinger, Germany) moisture analyser. The remaining 17 samples were kept in a dark cabinet (to avoid DNA damage from UV light) for 10 days after sampling until DNA extraction, as described below.
Comparison of four different pollen preservation methods: 96% ethanol, freezing, silica gel and room temperature Pollen traps were set up in August and September of 2020 in two apiaries (one in Denmark and another one in Austria) aiming to collect enough pollen for comparing the four different preservation methods ( Fig. 1), namely (1) placement in a 15-mL tube filled with 96% ethanol and kept at room temperature (EtOH), (2) placement in a 15-mL tube kept at − 20 °C (FRZ), (3) placement in a porous 'tea'  Table 1 Sample sizes used in the pre-test experiment for determining the amount of silica gel (SG) for efficient pollen desiccation One subset (N = 17) was used for assessing DNA parameters (yield and quality) and the other (N = 13) for water activity and content

Method
Water activity and content DNA Baseline 1 0 RT 3 3 1 g SG 3 4 5 g SG 3 5 10 g SG 3 5 Total 13 1 3 filter inside a 125-mL capped vial supplied with 12 g of silica gel and kept at room temperature (SG), and (4) placement in a fine gauze/filter paper exposed to room temperature for 1 week followed by transfer to a 15-mL tube kept closed at room temperature (RT). Pollen harvested over a ~ 12-h period was thoroughly mixed and split into 88 replicates of 5 g each. Sample size per preservation method varied between countries and sampling dates (Table 2). In Denmark, there was a single sampling event on August 11th (N = 40). In Austria, due to pollen availability constraints, sampling took place on multiple dates in September (total N = 48). All 88 samples were stored for ~ 4 months at room temperature in the dark, except those frozen at − 20 °C, until molecular analysis.
Sample preparation and DNA extraction DNA was isolated from only 87 samples because one of them contained a Lepidoptera larva feeding on the pollen, presumably a wax moth (Galleria mellonella or Achroia grisella). Prior to DNA extraction, a homogenous pollen solution was prepared in a magnetic stirrer using 2 g (for RT, SG and FRZ samples) or 3 g (for EtOH) of pollen sample and 4 mL of sterile ultrapure water. A volume of 200 μL of this solution (~ 50 mg of pollen) was placed in a 1.5-mL tube and centrifuged at 21,206 × g for 3 min. After centrifugation, the supernatant was discarded and 1 mL of absolute ethanol was added to all 87 samples, which were then stored at − 20 °C until DNA extraction.
Immediately before extraction with the Macherey-Nagel NucleoSpin Food Kit, the tubes containing ~ 50 mg of pollen were centrifuged at maximum speed for 3 min and the ethanol discarded. Pollen samples were then transferred to a 2.0-mL screwcap tube containing a mix of zirconia beads of varying sizes to target different pollen grain sizes. A volume of 550 μL of the NucleoSpin lysis buffer was added to the 2.0-mL tube and the mixture was ground in a Precellys 24 tissue homogeniser (Bertin Instruments) three times at 6200 rpm for 5 s. The following steps of DNA extraction were implemented according to manufacturer's instructions. After extraction, quality and yield were measured in a SPECTROstar Nano (BMG Labtech) and the DNA extracts were diluted to 10 ng/μL before PCR.
DNA metabarcoding DNA metabarcoding was performed using the internal transcribed spacer 2 (ITS2). PCR was carried out in triplicate for each sample using the primers ITS-S2F (Chen et al., 2010) and ITS-S4R (White et al., 1990) and a two-stage process. Stage one PCR was performed in a 10-μL total volume containing 5 μL of Q5 High-Fidelity 2X Master Mix (New England Biolabs), 0.5 μL of each primer at 10 μM, and 1 μL of DNA at 10 ng/μL. Thermal cycling conditions were 98 °C for 3 min, 35 cycles of 98 °C for 10 s, 52 °C for 30 s and 72 °C for 40 s, and a final extension of 72 °C for 2 min. The amplicons were purified using 0.8 × reversible immobilization paramagnetic beads (Agencourt AMPure XP) per microlitre of PCR product and then subjected to the secondstage PCR for incorporation of the unique indexes. PCR was prepared in a 10-μL total volume containing 5 μL of KAPA HiFi HotStart ReadyMixPCR Kit (Kapa Biosystems), 0.5 μL of each oligonucleotide at 1 μM and 2 μL of 1:10 dilution of the purified amplicons. Thermal cycling conditions were 95 °C for 3 min, followed by 10 cycles of 95 °C for 30 s, 55 °C for 30 s, 72 °C for 30 s and a final extension of 72 °C for 5 min. Indexed amplicons were purified with the paramagnetic beads, as before, quantified in the Epoch Microplate Spectrophotometer (Bio Tek Instruments), normalised to a final concentration of 10 nM, and then pooled (one pool per 96-well plate). The amplicon size distribution was determined for each pool on a TapeStation 2200 using the HS D1000 kit (Agilent Technologies). Pools were quantified by a SYBR green quantitative PCR assay using the KAPA Library Quantification kit (Kapa Biosystems) and then combined equimolarly into one single sequencing library containing all samples. The sequencing library was diluted to 2 nM, spiked with 10% Illumina-generated PhiX control library and then sequenced on the Illumina MiSeq using the 2 × 250 cycles v2 chemistry, according to manufacturer's instructions.

Bioinformatics
Pools were de-multiplexed in BaseSpace Sequence Hub based on their unique indexes incorporated in the stage two PCR. Raw sequence reads (Fastq files) were processed using VSEARCH v2.15.2 (Rognes et al., 2016). Reads R1 and R2 were merged using the fastq_mergepairs. Low-quality reads (length < 200 bp and > 500 bp, ambiguous base pairs) and chimeras were discarded using uchime3_denovo. After filtering, sequence reads were classified directly to the species level using usearch_global, with sequence similarity set at 97%. Unclassified reads were subjected to hierarchical classification, with a similarity of minimum level of bootstrap support for the taxonomic rank of 90%. The ITS2 reference database used in the taxonomic classification was updated from Sickel et al. (2015), with sequences retrieved from the National Center for Biotechnology Information (NCBI) platform (https:// www. ncbi. nlm. nih. gov). A community matrix format table, with columns representing samples and rows species, and a file with the taxonomic lineage of each species were created and imported into R-Studio v1.2.5033 (Team, 2015) using the package Phyloseq v1.27.6 (McMurdie & Holmes, 2013).

Data analysis
The relative abundances (RA) of high-quality reads were used as a proxy of pollen quantity estimates at both family and species levels, after discarding taxa below 1% abundance. A species accumulation curve was used to assess the sequence depth of each sample using the package ranacapa (Kandlikar et al., 2018) in R-Studio v1.2.5033 (Team, 2015). As a measure of pollen diversity, the Shannon-Wiener index (H′) was calculated for each sample replicate from both family and species classification data. H′ values, which combine a component on the number of taxa (richness) with another on their relative abundances (evenness), were computed using the BiodiversityR package v 2.13-1 (Kindt & Coe, 2005) in R-Studio v1.2.5033 (Team, 2015). Statistical comparisons among the four preservation methods were also performed in R-Studio by using Kruskal-Wallis and χ 2 tests. When statistical differences were found, multiple comparisons were performed using Dunn's test with Bonferroni P value adjustment, using the FSA package v0.8.32 (Ogle et al., 2021) in R-Studio v1.2.5033 (Team, 2015). All descriptive statistics are reported as the median and interquartile range (IQR). The level of statistical significance for all tests was set at α = 0.05.

Results and discussion
Determining the amount of silica gel for efficient pollen desiccation The baseline values for water content and activity measured soon after sampling (day 0) were 21.5% and 0.69, respectively (Table 3). These values decreased substantially after a week of pollen desiccation with silica gel (SG) or at room temperature (RT). The lowest values of water content were observed when pollen was dried with 10 g of SG, reaching 9.4% at the last measurement on July 20th (day 6.5). This was lower than the water content obtained for pollen dried at RT (14.8%) and almost half of the value obtained with 1 g of SG (18.0%) for the same period. Water activity < 0.6 is required to prevent microbial growth in pollen (Beuchat, 1983). This threshold was already exceeded at day 1.5 for RT and with 5 and 10 g of SG. However, while water activity levelled out around 0.5 for RT, with some variation probably due to changes in environmental humidity, it continued steadily decreasing with time for pollen dried with 5 and 10 g of SG. In contrast, it was only at day 6.5 when 1 g of SG was able to lower water activity below 0.6. While 1 g of SG was clearly insufficient for drying 5 g of pollen, 10 g of SG enabled the fastest and most efficient desiccation, preventing moulds from developing. Fungal contamination can be a problem for pollen samples requiring botanical identification via ITS2 metabarcoding. The DNA of fungi will be extracted alongside that of pollen and might eventually be co-amplified in the PCR. Due to high taxonomic resolution, ITS2 has been one of the markers of choice in pollen metabarcoding (e.g. Bell et al., 2017;Cornman et al., 2015;Danner et al., 2017;Jones et al., 2021;Keller et al., 2015;Milla et al., 2021;Nürnberger et al., 2019;Richardson et al., 2015b;Sickel et al., 2015). The problem is that PCR using ITS2 barcodes may suffer from non-specificity, resulting in co-amplification of fungal contaminants (Cheng et al., 2016). While these contaminants will not prevent ITS2 plant fragments from being sequenced in high-throughput sequencing platforms, they will consume sequence reads and thereby limit the number of pollen samples that can be pooled in a sequencing run (Bell et al., 2016;Cornman et al., 2015).
Yield and quality of the DNA isolated from the 17 samples, stored for 10 days at RT and with varying amounts of SG, are shown in Fig. 2 (see Online Resource 1 for details). The highest DNA concentration was observed for pollen desiccated with 10 g of SG (41.69 (6.85) ng/μL) and the lowest for RT (29.48 (4.23) ng/μL), although the differences across the four methods were non-significant (P value = 0.080, Kruskal-Wallis test). Likewise, there were no statistical differences in DNA quality (P = 0.408), with all methods showing 260/280 absorbance ratios slightly below the optimal 1.8 value (median = 1.7; Online Resource 1). These results suggest that all pollen desiccation methods are able to properly preserve DNA for downstream metabarcoding analysis. Using more samples might lead to a significant difference between desiccation with 10 g of SG and RT in terms of DNA concentration.
Although 10 g of SG was sufficient for desiccating 5 g of bee-collected fresh pollen, in the ensuing experiment pollen samples were preserved with two pre-prepared sachets of 6 g each available on the market. According to the SG manufacturer, one 6-g sachet of well-dried SG can absorb 1 g of water, corresponding to 100% water in 5 g of pollen with 20% water content. Since water content in bee-collected fresh pollen typically varies between 15 and 30% (Canale et al., 2016;Herbert & Shimanuki, 1978), one sachet would suffice to desiccate 5 g of pollen and two would allow over-absorbance, hence assuring Fig. 2 Boxplots for a DNA concentration (ng/µL) and b quality (260/280 absorbance ratio) measured for pollen desiccated at room temperature (RT; N = 3) and using 1 g (N = 4), 5 g (N = 5) and 10 g (N = 5) of silica gel (SG). The optimal DNA quality is obtained when the 260/280 absorbance ratio reaches 1.8 proper DNA preservation in a wide range of environmental conditions.
Comparing concentration and quality of DNA across preservation methods DNA quality and concentration measured for extracts prepared from the 87 pollen samples stored in ethanol (EtOH), with 12 g of silica gel (SG), at − 20 °C (FRZ), and at room temperature (RT) are shown for Austria and Denmark in Fig. 3  . Statistical differences were also found for the comparisons between EtOH and FRZ in both countries (P > 2.012 × 10 −5 ) and EtOH and SG in Austria (P = 0.042).
The downward bias observed for DNA extracted from pollen preserved in EtOH could be due to a technical artefact. In the homogenisation step, a lower amount of pollen was often collected at the bottom of the 1.5-mL tube upon centrifugation, despite the 33% excess of starting material (3 g instead of 2 g of sample) to compensate for the ethanol mass. It is also possible that samples exposed at RT were contaminated by organisms that feed on pollen. The observation of a Lepidoptera larva developing in a RT sample is a living proof of such a circumstance. While insect larvae can be detected during sample preparation for DNA extraction, eggs would go unnoticed and therefore extracted alongside with pollen.
Pollen contamination may result in higher DNA yields, depending on the amount of tissue and genome size of contaminants (Bell et al., 2018;Pornon et al., 2016). This is especially the case when employing kits for isolating DNA from food matrices, as here, which are tailored for efficient DNA recovering from a wide range of organisms. It should be noted, however, that non-plant contaminants will not hamper downstream metabarcoding analysis as they will never be co-amplified with pollen when using Fig. 3 Boxplots for a DNA concentration (ng/µL) and b quality (260/280 absorbance ratio) measured for pollen stored in ethanol (EtOH; Austria N = 11, Denmark N = 10), frozen at − 20 °C (FRZ; Austria N = 11, Denmark N = 10), at room temperature (RT; Austria N = 12, Denmark N = 10) and with 12 g of silica gel (SG; Austria N = 13, Denmark N = 10). The optimal DNA quality is obtained when the 260/280 absorbance ratio is 1.8. Different letters above boxplots indicate significant differences (P < 0.05) in DNA quality and concentration among preservation methods 1 3 plant-specific ITS2 primers (Bell et al., 2016;Cheng et al., 2016). The problem for metabarcoding studies on bee-collected pollen arises when contamination originates from airborne pollen grains (Kraaijeveld et al., 2015), as it may comprise an important proportion of the sequencing reads. Therefore, while desiccation at RT offers an attractive alternative for preserving pollen in citizen science projects, the risk of contamination (plant or non-plant) makes this method less suitable for recommendation for use in metabarcoding studies.

Comparing pollen preservation methods by ITS2 metabarcoding
The sequencing run in the Illumina MiSeq platform produced a total of 2,549,419 high-quality reads (calculated over all samples and countries, N = 87) and all of them were classified as Viridiplantae. Of these, 2,437,347 reads were classified at the species level and 112,072 did not pass beyond the genus level. The median number of reads per sample was 29,507.0 (14,038.0) at the family level (Online Resource 3) and 28,360.0 (12,970.0) at the species level (Online Resource 4). This sequencing depth allowed sequencing each sample to saturation, with the lowermost 1660 number of reads obtained for a RT sample (Online Resource 4) well above the plateau reached at ~ 1000 reads in the species accumulation curve (Fig. 4). These results suggest that a relatively complete list of species-level assignment was recovered for all the samples with ~ 1000 high-quality reads, not far from the 3000 reads reported by Sickel et al. (2015) and below the ~ 50,000 reads reported by Cornman et al. (2015).
The distribution of the total number of reads was similar across methods for both Austria (P = 0.238, Kruskal-Wallis test) and Denmark (P = 0.120), as shown in Fig. 5. This sequencing result suggests that DNA integrity was not differentially affected by the mode of preserving pollen samples.
The classification algorithm identified a total of 51 plant species, belonging to 40 genera and 22 families, in the 87 pollen samples. Floral spectra sampled by honey bees differed between Austria and Denmark, with only nine families and seven species overlapping (Online Resources 5 and 6). Nonetheless, all detected taxa have been listed elsewhere as foraging sources for honey bees (Brodschneider et al., 2019;Danner et al., 2017;Lau et al., 2019), suggesting that pollen preserved using a range of methods can be reliably identified via metabarcoding. Plant diversity was also distinct between the two countries with the Shannon index (H′) showing higher values, and therefore higher diversity, in Denmark, with median of 2.14 (0.10) for families and species, than in Austria, with median of 1.80 (0.10) for families and 1.89 (0.12) for species (calculated over all samples).
Relative abundances (RA) are depicted for the 22 families and 51 plant species in Fig. 6, facilitating comparisons on the performance of the four preservation methods at both taxonomic levels (see Online Resources 7 and 8 for details). A total of 15 families were detected in Austria and 16 in Denmark, and these numbers lie within the range that has been reported for single sampling events in central Europe (Brodschneider et al., 2019;Danner et al., 2017). Most Austrian (9) and Danish (10)   Asteraceae, Brassicaceae and Papaveraceae are often listed as ubiquitous pollen sources for honey bees, Balsaminaceae is less common (Brodschneider et al., 2019;Coffey & Breen, 1997;Danner et al., 2017;Dimou & Thrasyvoulou, 2007;Potter et al., 2019;Richardson et al., 2019). However, the relative importance of forage varies with land cover and season, and all these herb families include late summer bloomers (Bilisik et al., 2008;Coffey & Breen, 1997;Dimou & Thrasyvoulou, 2007;Donkersley et al., 2014;Lau et al., 2019). Classification at a lower taxonomic level identified 25 plant species (24 genera) in Austria and 33 (24 genera) in Denmark (Online Resource 8). This is a notable number of taxa, given the temporally limited sampling undertaken herein, with pollen traps activated < 12 h at the single sampling event in August, in Denmark, and at any of the three sampling events in September, in Austria. Yet, this level of richness is supported by reports that honey bees tend to collect greater diversity of pollen later in the season, to compensate for lower availability of mass-blooming sources (Danner et al., 2017;Rasmussen et al., 2021). Despite the high number of visited plants, most pollen was collected from only three species in Austria and four in Denmark, which accounted for 60.2% (14.0) and 54.6% (9.1) of the total abundance in the former and latter country, respectively (median calculated over all samples; Online Resource 8). While our results are consistent with studies that have observed foraging preferences for a few plant sources (Bilisik et al., 2008;Brodschneider et al., 2019;Danner et al., 2017), they also further support the claim that honey bees forage for a diverse pollen diet to ensure colony health (Alaux et al., 2010;Di Pasquale et al., 2013Goulson et al., 2015;Kaluza et al., 2018;Omar et al., 2017).
Among the 22 families, 19 (11 in Austria and 15 in Denmark; Figs. 6 and 7) were detected across all methods whereas three very rare families [median of 0% (0), Online Resources 5 and 7] were detected only in pollen preserved in (1) EtOH and SG (Tropaeolaceae), (2) SG and RT (Boraginaceae), and (3) EtOH, SG and FRZ (Chenopodiaceae). The same pattern was observed at the species level; of the 51 species, 17 occurring at very low abundances [median of 0% (0), Online Resources 6 and 8] could only be identified in samples preserved by one (four species), two (seven species) or three (five species) Fig. 5 Boxplots for the total number of high-quality sequence reads obtained for pollen samples stored in ethanol (EtOH; Austria N = 11, Denmark N = 10), frozen at − 20 °C (FRZ; Austria N = 11, Denmark N = 10), at room temperature (RT; Austria N = 12, Denmark N = 10) and with 12 g of silica gel (SG; Austria N = 13, Denmark N = 10) Fig. 6 Relative abundances (%) estimated from 87 pollen samples collected in a Austria and b Denmark and preserved in ethanol (EtOH; Austria N = 11, Denmark N = 10), frozen at − 20 °C (FRZ; Austria N = 11, Denmark N = 10), at room temperature (RT; Austria N = 12, Denmark N = 10) and with 12 g of silica gel (SG; Austria N = 13, Denmark N = 10). Relative abundances, shown here for species and families, were inferred from sequence reads obtained by ITS2 metabarcoding. In Austria, Chenopodiaceae was only detected in 05/09/20, Hydrangeaceae in 10/09/20, and Papavaraceae and Tropaeolaceae in 15/09/20 samples ◂ Environ Monit Assess (2021) 193: 785 Page 10 of 20 785 methods ( Fig. 7a; Online Resource 8). Notably, the highest richness was detected in SG (46 species) and RT (47 species) samples whereas the lowest was detected in FRZ (40 species) and EtOH (41 species) samples, although the differences were not statistically significant (P = 0.474 for Austria and P = 0.744 for Denmark, χ 2 test). Eight low-abundance plant species, exclusively detected in RT and/or SG replicates, were major contributors to richness differences among preservation methods (Online Resource 6). Four species were singletons in samples preserved at RT (Ageratum houstonianum, Hydrangea sargentiana and Rumex stenophyllus) or with SG (Rosa multiflora), whereas four other species were simultaneously detected in several RT and SG samples (Centaurea cyanus, Chelidonium majus, Echium plantagineum and Helianthus annuus).
At least three mutually inclusive factors could explain the greater species richness detected in samples desiccated at RT and with SG. First, it is possible that these samples were contaminated by airborne pollen and any trace of contamination, in a typically small size pollen sample, is able to generate a misleading result when using high-throughput sequencing (Bell et al., 2016(Bell et al., , 2018Pornon et al., 2016). Contrary to FRZ and EtOH samples, which were placed in capped vials soon after sampling, preparation of SG samples involved greater manipulation and RT samples were more exposed to airborne pollen. In a metabarcoding study of airborne pollen, Kraaijeveld et al. (2015) identified over 85% of honey bee plant taxa in the samples analysed, suggesting that contamination could have occurred during SG preparation and, to a greater extent, during ambient exposure of RT samples.
Second, it is possible that the bioinformatics pipeline assigned a wrong taxonomy to the sequence reads generated for those replicates, resulting in false-positive species. Such bias may occur when sequences are misidentified in the reference database (Bell et al., 2018). We tested this hypothesis for the sequence reads generated by the MiSeq platform for the eight species detected exclusively in SG and/or RT preserved samples, using the Basic Local Alignment Search Tool (BLAST) available from the NCBI platform. While BLAST confirmed the identity of seven plant species, strikingly, the sequence reads classified by the pipeline as Rumex stenophyllus for one RT replicate aligned with the airborne fungus Alternaria infectoria, with 99.38% identity. This finding highlights the importance of having accurately curated sequences in the reference databases used in metabarcoding analysis (Banchi et al., 2020) and suggests a contamination event for that replicate preserved at RT.
Third, and most likely, because samples desiccated at RT and with SG have lower water content than those placed fresh in the freezer, and samples preserved in EtOH are heavier, it is possible that the higher amount of starting material observed for several RT and SG preserved samples during the DNA extraction step allowed rare pollen grains to be represented in the extracts. While studies of pollen metabarcoding have employed variable amounts of starting material during DNA extraction, with values ranging from 3 to 50 mg (Danner et al., 2017;Keller et al., 2015;Richardson et al., 2015aRichardson et al., , 2015bSickel et al., 2015), it is unclear to what extent this factor influences the number of plant taxa identified in the extracts. Our results suggest that detection of rare taxa might benefit from using greater amounts of starting material, although investigation is necessary for further understanding this factor.
As illustrated in Fig. 6, deviations in relative abundances across preservation methods were greater in Denmark than in Austria. This finding might be due to more efficient blending of the pollen harvest, performed prior to its splitting into replicates or earlier sampling in Denmark. Yet, the results cannot be directly compared between the two countries because floral spectra at the different sampling dates differed substantially, with only a few families and species overlapping, and taxa-specific biases might occur (Bell et al., 2018). It is, however, note-worthyF that two shared families, Brassicaceae and Ranunculaceae, were consistently over-and underrepresented, respectively, in EtOH samples from both countries. Whether the relative abundances estimated from EtOH samples for these two families are more accurate than those obtained for the other methods deserves further scrutiny.
Among the top six families, the greatest differences in median relative abundances were found in Brassicaceae comparing between EtOH [16.9% (3.5); Online Resource 7] and SG [9.7% (2.5)] and in Hydrangeaceae comparing between RT [10.7% (6.8)] and SG [4.8% (5.4)], with deviations between the two most dissimilar methods nearly doubling. On the other hand, for the top three families and species, deviations were lower than 3% in both countries. Of these, Papaveraceae/Papaver rhoeas displayed the greatest homogeneity across methods, with only 0.5% difference between EtOH [19.5% (1.0)] and SG [19.0% (1.3)] samples. Regardless of the magnitude of the deviations, no statistical differences could be detected among preservation methods for any family or species either for Austria (P > 0.161 for families and species, Kruskal-Wallis tests) or Denmark (P > 0.206 for families and P > 0.100 for species).
Cumulative abundances, depicted in Fig. 8 for families and species, reveal close resemblances in floral spectra among preservation methods in both countries. This finding is further supported by the parameter H′, in which diversity is expressed by combining richness and evenness (Fig. 9). When computed from family data, the median H′ ranged from 1.76 (0.11), for SG, to 1.82 (0.14), for RT, in Austria and from 2.10 (0.08), for EtOH, to 2.17 (0.13), for RT, in Denmark (Online Resource 9). A similar pattern was found at the species level, with median H′ values ranging from 1.89 (0.10), for SG, to 1.91 (0.10), for RT, in Austria and from 2.10 (0.08), for EtOH, to 2.17 (0.13), for RT, in Denmark (Online Resource 10). However, H′ differences among preservation methods were found to be non-significant for both Austria (P = 0.471 for families and species, Kruskal-Wallis test) and Denmark (P = 0.470 for families and species), suggesting that bee-collected fresh pollen can be adequately preserved for ITS2 metabarcoding applications by any of the methods tested herein.
Ultra-low temperature freezing (e.g. − 80 °C) is the gold standard for high-quality tissue archiving of a wide range of samples for use in molecular genetic analyses (Nagy, 2010;Prendini et al., 2002). While long-term tissue preservation at higher temperatures can be compromised by enzymatic activity and DNA degradation, ultra-low temperature freezers are usually not available at citizen scientists' premises and storing at − 20 °C in a household freezer becomes the only alternative. However, here this method was revealed to be effective for pollen, at least for medium-term preservation, and most studies on pollen metabarcoding have typically used material stored at − 20 °C (Bell et al., 2017;Danner et al., 2017;Smart et al., 2016). The problem arises when 1 3 sample size becomes large, as large numbers of samples will take too much storage space and will involve electricity expenses for the citizen scientist. Besides, to avoid a cycle of freezing-thawing-freezing, which can be harmful to sample integrity, transportation to the analytical laboratory requires a cold chain, further increasing sample handling costs. Fig. 8 Cumulative abundances across the four preservation methods. Relative abundances (%) were calculated for a families and b species from classifying sequence reads obtained by ITS2 metabarcoding. Relative abundances were calculated for pollen samples collected in Austria (left) and Denmark (right) and preserved in ethanol (EtOH; Austria N = 11, Denmark N = 10), frozen at − 20 °C (FRZ; Austria N = 11, Denmark N = 10), at room temperature (RT; Austria N = 12, Denmark N = 10) and with 12 g of silica gel (SG; Austria N = 13, Denmark N = 10) until metabarcoding analysis ◂ Fig. 9 Boxplot for Shannon index (H′) estimated for a families and b species from the 87 pollen samples collected in Austria and Denmark and preserved in ethanol (EtOH; Austria N = 11, Denmark N = 10), frozen at − 20 °C (FRZ; Austria N = 11, Denmark N = 10), at room temperature (RT; Austria N = 12, Denmark N = 10) and with 12 g of silica gel (SG; Austria N = 13, Denmark N = 10) until metabarcoding analysis Ethanol is one of the most popular methods for storing animal tissue, allowing long-term preservation especially when samples are placed at − 20 °C (Nagy, 2010). Although EtOH has not been commonly used for archiving plant tissues such as leaves, shoots and seeds (reviewed by Bressan et al., 2014;Murray & Pitas, 1996;Nagy, 2010;Prendini et al., 2002), here it is revealed to be adequate for preserving pollen for downstream metabarcoding analysis. However, this method might not be affordable in citizen science projects because of costs of obtaining ethanol or difficulty for a layman to obtain it, and, more importantly, because (national or international) transportation of flammable or hazardous fluids is strictly regulated, further increasing shipping costs and burden of paper work requirements.
Of the four methods tested here, desiccation at RT has no costs and does not require any material or equipment. However, while this study indicates that pollen stored at RT provides quality data comparable to the other preservation methods, there is a risk of sample contamination that cannot be overlooked. The finding of a Lepidoptera larva feeding on a RT sample is a living example of such risk. Moreover, environmental temperature and humidity may vary across seasons and regions, influencing the efficiency and velocity of the desiccation process, which may compromise sample stability.
Finally, SG offers the best solution for pollen desiccation and storage, emerging as the most promising method for pollen preservation in citizen science studies. When dispensed in adequate silica-to-sample ratios (here 12 g:5 g), it provides effective drying required for archival storage (Alsos et al., 2020;Chase & Hills, 1991), although silica beads should be monitored regularly for dryness when samples are kept at room temperature. The use of commercially available 6-g sachets makes use of SG straightforward for citizen scientists. Besides, SG is inexpensive and shipping to the analytical laboratory is less costly than with the freezing and ethanol methods because dry material is lighter and no special regulations or conditions are applicable.

Conclusion
While pollen metabarcoding studies have typically worked with samples preserved at − 20 °C, this is the first study to examine how different preservation methods affect molecular identification of mixed pollen samples by high-throughput sequencing. Overall, the results obtained in this experiment suggest that the methods involving desiccation, which are cheaper than use of ethanol and freezing, can be used by the citizen scientist for medium-term pollen storage for downstream applications involving DNA metabarcoding. Given that relative humidity at room temperature may vary temporally and geographically, and one sample dried at room temperature in this study was lost due to moth infestation, we recommend using silica gel for preserving beecollected fresh mixed pollen samples. The method is also straightforward for laymen to use in practice, and therefore it is a robust option for widespread use in citizen science studies involving collection of pollen.