Publicação
Mining github software repositories to look for programming language cocktails
| datacite.subject.fos | Engenharia e Tecnologia::Engenharia Eletrotécnica, Eletrónica e Informática | |
| datacite.subject.fos | Humanidades::Línguas e Literaturas | |
| datacite.subject.sdg | 04:Educação de Qualidade | |
| datacite.subject.sdg | 09:Indústria, Inovação e Infraestruturas | |
| dc.contributor.author | Loureiro, João | |
| dc.contributor.author | Costa Neto, Alvaro | |
| dc.contributor.author | Pereira, Maria João | |
| dc.contributor.author | Henriques, Pedro Rangel | |
| dc.date.accessioned | 2026-03-18T15:04:12Z | |
| dc.date.available | 2026-03-18T15:04:12Z | |
| dc.date.issued | 2025 | |
| dc.description.abstract | In light of specific development needs, it is common to concurrently apply different technologies to build complex applications. Given that lowering risks, costs, and other negative factors, while improving their positive counterparts is paramount to a better development environment, it becomes relevant to find out what technologies work best for each intended purpose in a project. In order to reach these findings, it is necessary to analyse and study the technologies applied in these projects and how they interconnect and relate to each other. The theory behind Programming Cocktails (meaning the set of programming technologies - Ingredients - that are used to develop complex systems) can support these analysis. However, due to the sheer amount of data that is required to construct and analyse these Cocktails, it becomes unsustainable to manually obtain them. From the desire to accelerate this process comes the need for a tool that automates the data collection and its conversion into an appropriate format for analysis. As such, the project proposed in this paper revolves around the development of a web-scraping application that can generate Cocktail Identity Cards (CIC) from source code repositories hosted on GitHub. Said CICs contain the Ingredients (programming languages, libraries and frameworks) used in the corresponding GitHub repository and follow the ontology previously established in a larger research project to model each Programming Cocktail. This paper presents a survey of current Source Version Control Systems (SVCSs) and web-scrapping technologies, an overview of Programming Cocktails and its current foundations, and the design of a tool that can automate the gathering of CICs from GitHub repositories. | eng |
| dc.description.sponsorship | This work has been supported by FCT – Fundação para a Ciência e Tecnologia within the R&D Units Project Scope: UID/00319/2023. The work of Maria João and Alvaro was supported by national funds: UID/05757 - Research Centre in Digitalization and Intelligent Robotics (CeDRI); and SusTEC, LA/P/0007/2020 (DOI: 10.54499/LA/P/0007/2020). | |
| dc.identifier.citation | Loureiro, João; Costa Neto, Alvaro; Pereira, Maria João; Henriques, Pedro Rangel (2025). Mining GitHub Software Repositories to Look for Programming Language Cocktails. In 14th Symposium on Languages, Applications and Technologies, SLATE 2025. 135:13, p. 1-16 .ISBN 978-395977387-4. DOI: 10.4230/2025.13 | |
| dc.identifier.doi | 10.4230/2025.13 | |
| dc.identifier.isbn | 978-395977387-4 | |
| dc.identifier.issn | 2190-6807 | |
| dc.identifier.uri | http://hdl.handle.net/10198/36135 | |
| dc.language.iso | eng | |
| dc.peerreviewed | yes | |
| dc.publisher | Schloss Dagstuhl - Leibniz-Zentrum fur Informatik GmbH, Dagstuhl Publishing | |
| dc.relation | Research Centre in Digitalization and Intelligent Robotics | |
| dc.relation | Associate Laboratory for Sustainability and Tecnology in Mountain Regions | |
| dc.rights.uri | http://creativecommons.org/licenses/by-sa/4.0/ | |
| dc.subject | Software repository mining | |
| dc.subject | Source version control | |
| dc.subject | GitHub scraping | |
| dc.subject | Programming cocktails | |
| dc.title | Mining github software repositories to look for programming language cocktails | eng |
| dc.type | conference paper | |
| dspace.entity.type | Publication | |
| oaire.awardNumber | UIDP/05757/2020 | |
| oaire.awardNumber | LA/P/0007/2020 | |
| oaire.awardTitle | Research Centre in Digitalization and Intelligent Robotics | |
| oaire.awardTitle | Associate Laboratory for Sustainability and Tecnology in Mountain Regions | |
| oaire.awardURI | info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/UIDP%2F05757%2F2020/PT | |
| oaire.awardURI | info:eu-repo/grantAgreement/FCT/6817 - DCRRNI ID/LA%2FP%2F0007%2F2020/PT | |
| oaire.citation.conferencePlace | Faro, Portugal | |
| oaire.citation.endPage | 16 | |
| oaire.citation.issue | 13 | |
| oaire.citation.startPage | 1 | |
| oaire.citation.title | 14th Symposium on Languages, Applications and Technologies, SLATE 2025 | |
| oaire.citation.volume | 135 | |
| oaire.fundingStream | 6817 - DCRRNI ID | |
| oaire.fundingStream | 6817 - DCRRNI ID | |
| oaire.version | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |
| person.familyName | Pereira | |
| person.givenName | Maria João | |
| person.identifier.ciencia-id | C912-4A49-A3B3 | |
| person.identifier.orcid | 0000-0001-6323-0071 | |
| person.identifier.rid | G-5999-2011 | |
| person.identifier.scopus-author-id | 13907870300 | |
| project.funder.identifier | http://doi.org/10.13039/501100001871 | |
| project.funder.identifier | http://doi.org/10.13039/501100001871 | |
| project.funder.name | Fundação para a Ciência e a Tecnologia | |
| project.funder.name | Fundação para a Ciência e a Tecnologia | |
| relation.isAuthorOfPublication | a20ccfa6-4e84-4c25-ab0d-8d6ba196ffc2 | |
| relation.isAuthorOfPublication.latestForDiscovery | a20ccfa6-4e84-4c25-ab0d-8d6ba196ffc2 | |
| relation.isProjectOfPublication | d0a17270-80a8-4985-9644-a04c2a9f2dff | |
| relation.isProjectOfPublication | 6255046e-bc79-4b82-8884-8b52074b4384 | |
| relation.isProjectOfPublication.latestForDiscovery | d0a17270-80a8-4985-9644-a04c2a9f2dff |
