Publication
Semi-automated sequence curation for reliable reference datasets in ITS2 vascular plant DNA (meta-)barcoding
| dc.contributor.author | Quaresma, Andreia | |
| dc.contributor.author | Ankenbrand, Markus J. | |
| dc.contributor.author | Garcia, Carlos Ariel Yadró | |
| dc.contributor.author | Rufino, José | |
| dc.contributor.author | Honrado, Mónica | |
| dc.contributor.author | Amaral, Joana S. | |
| dc.contributor.author | Brodschneider, Robert | |
| dc.contributor.author | Brusbardis, Valters | |
| dc.contributor.author | Gratzer, Kristina | |
| dc.contributor.author | Hatjina, Fani | |
| dc.contributor.author | Kilpinen, Ole | |
| dc.contributor.author | Pietropaoli, Marco | |
| dc.contributor.author | Roessink, Ivo | |
| dc.contributor.author | Steen, Jozef van der | |
| dc.contributor.author | Vejsnæs, Flemming | |
| dc.contributor.author | Pinto, M. Alice | |
| dc.contributor.author | Keller, Alexander | |
| dc.date.accessioned | 2024-05-03T13:16:53Z | |
| dc.date.available | 2024-05-03T13:16:53Z | |
| dc.date.issued | 2024 | |
| dc.description.abstract | One of the most critical steps for accurate taxonomic identification in DNA (meta)-barcoding is to have an accurate DNA reference sequence dataset for the marker of choice. Therefore, developing such a dataset has been a long-term ambition, especially in the Viridiplantae kingdom. Typically, reference datasets are constructed with sequences downloaded from general public databases, which can carry taxonomic and other relevant errors. Herein, we constructed a curated (i) global dataset, (ii) European crop dataset, and (iii) 27 datasets for the EU countries for the ITS2 barcoding marker of vascular plants. To that end, we first developed a pipeline script that entails (i) an automated curation stage comprising five filters, (ii) manual taxonomic correction for misclassified taxa, and (iii) manual addition of newly sequenced species. The pipeline allows easy updating of the curated datasets. With this approach, 13% of the sequences, corresponding to 7% of species originally imported from GenBank, were discarded. Further, 259 sequences were manually added to the curated global dataset, which now comprises 307,977 sequences of 111,382 plant species. | pt_PT |
| dc.description.sponsorship | AQ acknowledges the PhD scholarship (2020.05155.BD), funded by the Portuguese Foundation for Science and Technology (FCT). This work was developed in the framework of INSIGNIA – Environmental monitoring of pesticide use through honeybees (SANTE/E4/SI2.788418-SI2.788452-INSIGINIA-PP-1-1-2018) and INSIGNIA-EU - Preparatory action for monitoring of environmental pollution using honey bees (Procurement procedure ENV/2021/OP/0014 of 28-09-2021). FCT provided financial support by national funds (FCT/MCTES) to CIMO (UIDB/00690/2020 and UIDP/00690/2020) and SusTEC (LA/P/0007/2021). | pt_PT |
| dc.description.version | info:eu-repo/semantics/publishedVersion | pt_PT |
| dc.identifier.citation | Quaresma, Andreia; Ankenbrand, Markus J.; Garcia, Carlos Ariel Yadró; Rufino, José; Honrado, Mónica; Amaral, Joana S.; Brodschneider, Robert; Brusbardis, Valters; Gratzer, Kristina; Hatjina, Fani; Kilpinen, Ole; Pietropaoli, Marco; Roessink, Ivo; Steen, Jozef van der; Vejsnæs, Flemming; Pinto, M. Alice; Keller, Alexander (2024). Semi-automated sequence curation for reliable reference datasets in ITS2 vascular plant DNA (meta-)barcoding. Scientific Data. EISSN 2052-4463. 11:1, p. 1-11 | pt_PT |
| dc.identifier.doi | 10.1038/s41597-024-02962-5 | pt_PT |
| dc.identifier.eissn | 2052-4463 | |
| dc.identifier.uri | http://hdl.handle.net/10198/29711 | |
| dc.language.iso | eng | pt_PT |
| dc.peerreviewed | yes | pt_PT |
| dc.publisher | Nature Portfolio | pt_PT |
| dc.relation | LA/P/0007/2021 | pt_PT |
| dc.relation | DNA metabarcoding of pollen mixtures for environmental monitoring: qualitative and quantitative robustness based on mock mixtures and honeybee-collected samples from across Europe | |
| dc.relation | Mountain Research Center | |
| dc.relation | Mountain Research Center | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | pt_PT |
| dc.subject | Internal transcribed spacer | pt_PT |
| dc.subject | Barcode | pt_PT |
| dc.subject | Biodiversity | pt_PT |
| dc.title | Semi-automated sequence curation for reliable reference datasets in ITS2 vascular plant DNA (meta-)barcoding | pt_PT |
| dc.type | journal article | |
| dspace.entity.type | Publication | |
| oaire.awardTitle | DNA metabarcoding of pollen mixtures for environmental monitoring: qualitative and quantitative robustness based on mock mixtures and honeybee-collected samples from across Europe | |
| oaire.awardTitle | Mountain Research Center | |
| oaire.awardTitle | Mountain Research Center | |
| oaire.awardURI | info:eu-repo/grantAgreement/FCT//2020.05155.BD/PT | |
| oaire.awardURI | info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDB%2F00690%2F2020/PT | |
| oaire.awardURI | info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F00690%2F2020/PT | |
| oaire.citation.endPage | 11 | pt_PT |
| oaire.citation.issue | 1 | pt_PT |
| oaire.citation.startPage | 1 | pt_PT |
| oaire.citation.title | Scientific Data | pt_PT |
| oaire.citation.volume | 11 | pt_PT |
| oaire.fundingStream | 6817 - DCRRNI ID | |
| oaire.fundingStream | 6817 - DCRRNI ID | |
| person.familyName | Quaresma | |
| person.familyName | Rufino | |
| person.familyName | Honrado | |
| person.familyName | Amaral | |
| person.familyName | Pinto | |
| person.givenName | Andreia | |
| person.givenName | José | |
| person.givenName | Mónica | |
| person.givenName | Joana S. | |
| person.givenName | M. Alice | |
| person.identifier.ciencia-id | 4F1A-4E4A-3F23 | |
| person.identifier.ciencia-id | C414-F47F-6323 | |
| person.identifier.ciencia-id | 4712-B40B-4B0E | |
| person.identifier.ciencia-id | 5319-7DE8-BEDA | |
| person.identifier.ciencia-id | F814-A1D0-8318 | |
| person.identifier.orcid | 0000-0002-8678-5800 | |
| person.identifier.orcid | 0000-0002-1344-8264 | |
| person.identifier.orcid | 0000-0002-5126-4693 | |
| person.identifier.orcid | 0000-0002-3648-7303 | |
| person.identifier.orcid | 0000-0001-9663-8399 | |
| person.identifier.scopus-author-id | 57119742600 | |
| person.identifier.scopus-author-id | 55947199100 | |
| person.identifier.scopus-author-id | 8085507800 | |
| project.funder.identifier | http://doi.org/10.13039/501100001871 | |
| project.funder.identifier | http://doi.org/10.13039/501100001871 | |
| project.funder.identifier | http://doi.org/10.13039/501100001871 | |
| project.funder.name | Fundação para a Ciência e a Tecnologia | |
| project.funder.name | Fundação para a Ciência e a Tecnologia | |
| project.funder.name | Fundação para a Ciência e a Tecnologia | |
| rcaap.rights | openAccess | pt_PT |
| rcaap.type | article | pt_PT |
| relation.isAuthorOfPublication | d417b0ac-c8ee-473a-a355-820b5b9a3f55 | |
| relation.isAuthorOfPublication | 1e24d2ce-a354-442a-bef8-eebadd94b385 | |
| relation.isAuthorOfPublication | 87f8840d-04b1-427a-bca9-d37eadfc0e9b | |
| relation.isAuthorOfPublication | 42be2cf4-adc4-4e7f-ac60-7aab515b38cd | |
| relation.isAuthorOfPublication | 0667fe04-7078-483d-9198-56d167b19bc5 | |
| relation.isAuthorOfPublication.latestForDiscovery | d417b0ac-c8ee-473a-a355-820b5b9a3f55 | |
| relation.isProjectOfPublication | e0a6e4aa-533f-4118-baeb-96fc5e870ed8 | |
| relation.isProjectOfPublication | 29718e93-4989-42bb-bcbc-4daff3870b25 | |
| relation.isProjectOfPublication | 0aac8939-28c2-46f4-ab6b-439dba7f9942 | |
| relation.isProjectOfPublication.latestForDiscovery | 29718e93-4989-42bb-bcbc-4daff3870b25 |
