Utilize este identificador para referenciar este registo: http://hdl.handle.net/10198/4406
Título: Efficient partitioning strategies for distributed web crawling
Autor: Exposto, José
Macedo, Joaquim
Pina, António
Alves, Albano
Rufino, José
Data: 2008
Citação: Exposto, José; Macedo, Joaquim; Pina, António; Alves, Albano; Rufino, José (2008) - Efficient partitioning strategies for distributed web crawling. Lecture Notes in Computer Science. 5200, p.544-553.
Resumo: This paper presents a multi-objective approach toWeb space partitioning, aimed to improve distributed crawling efficiency. The in- vestigation is supported by the construction of two different weighted graphs. The first is used to model the topological communication infras- tructure between crawlers and Web servers and the second is used to represent the amount of link connections between servers’ pages. The values of the graph edges represent, respectively, computed RTTs and pages links between nodes. The two graphs are further combined, using a multi-objective partition- ing algorithm, to support Web space partitioning and load allocation for an adaptable number of geographical distributed crawlers. Partitioning strategies were evaluated by varying the number of parti- tions (crawlers) to obtain merit figures for: i) download time, ii) exchange time and iii) relocation time. Evaluation has showed that our partition- ing schemes outperform traditional hostname hash based counterparts in all evaluated metric, achieving on average 18% reduction for download time, 78% reduction for exchange time and 46% reduction for relocation time.
Peer review: yes
URI: http://hdl.handle.net/10198/4406
Aparece nas colecções:ESTiG - Publicações em Proceedings Indexadas à WoS/Scopus

Ficheiros deste registo:
Ficheiro Descrição TamanhoFormato 
LNCS2008.pdf250,89 kBAdobe PDFVer/Abrir

FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpace
Formato BibTex MendeleyEndnote Degois 

Todos os registos no repositório estão protegidos por leis de copyright, com todos os direitos reservados.