Repository logo
 
Publication

Efficient partitioning strategies for distributed web crawling

dc.contributor.authorExposto, José
dc.contributor.authorMacedo, Joaquim
dc.contributor.authorPina, António
dc.contributor.authorAlves, Albano
dc.contributor.authorRufino, José
dc.date.accessioned2011-05-19T13:08:25Z
dc.date.available2011-05-19T13:08:25Z
dc.date.issued2008
dc.description.abstractThis paper presents a multi-objective approach toWeb space partitioning, aimed to improve distributed crawling efficiency. The in- vestigation is supported by the construction of two different weighted graphs. The first is used to model the topological communication infras- tructure between crawlers and Web servers and the second is used to represent the amount of link connections between servers’ pages. The values of the graph edges represent, respectively, computed RTTs and pages links between nodes. The two graphs are further combined, using a multi-objective partition- ing algorithm, to support Web space partitioning and load allocation for an adaptable number of geographical distributed crawlers. Partitioning strategies were evaluated by varying the number of parti- tions (crawlers) to obtain merit figures for: i) download time, ii) exchange time and iii) relocation time. Evaluation has showed that our partition- ing schemes outperform traditional hostname hash based counterparts in all evaluated metric, achieving on average 18% reduction for download time, 78% reduction for exchange time and 46% reduction for relocation time.por
dc.identifier.citationExposto, José; Macedo, Joaquim; Pina, António; Alves, Albano; Rufino, José (2008). Efficient partitioning strategies for distributed web crawling. Lecture Notes in Computer Science. 5200, p.544-553.por
dc.identifier.urihttp://hdl.handle.net/10198/4406
dc.language.isoengpor
dc.peerreviewedyespor
dc.titleEfficient partitioning strategies for distributed web crawlingpor
dc.typeconference object
dspace.entity.typePublication
oaire.citation.endPage553por
oaire.citation.issueVolume 5200/2008por
oaire.citation.startPage544por
oaire.citation.titleLecture Notes in Computer Sciencepor
person.familyNameExposto
person.familyNameAlves
person.familyNameRufino
person.givenNameJosé
person.givenNameAlbano
person.givenNameJosé
person.identifier.ciencia-idDA10-808F-99EA
person.identifier.ciencia-id281A-DD4A-2605
person.identifier.ciencia-idC414-F47F-6323
person.identifier.orcid0000-0003-3857-6083
person.identifier.orcid0000-0001-9796-6810
person.identifier.orcid0000-0002-1344-8264
person.identifier.scopus-author-id56619498700
person.identifier.scopus-author-id55947199100
rcaap.rightsopenAccesspor
rcaap.typeconferenceObjectpor
relation.isAuthorOfPublication66fd8128-90b1-4754-936e-2d9e9e0829ec
relation.isAuthorOfPublication80d7f985-d700-4911-8974-b2678816db35
relation.isAuthorOfPublication1e24d2ce-a354-442a-bef8-eebadd94b385
relation.isAuthorOfPublication.latestForDiscovery1e24d2ce-a354-442a-bef8-eebadd94b385

Files

Original bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
LNCS2008.pdf
Size:
250.89 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: