Please use this identifier to cite or link to this item:
Title: Geographical partition for distributed web crawling
Author: Exposto, José
Macedo, Joaquim
Pina, António
Alves, Albano
Rufino, José
Keywords: Web mining
Parallel crawling
Web partitioning
Issue Date: 2005
Publisher: ACM
Citation: Exposto, José; Macedo, Joaquim; Herzog, Pina, António, Alves, Albano; Rufino, José (2005) - Geographical partition for distributed web crawling. In International Conference on Information and Knowledge Management. Bremen, Germany. ISBN 1-59593-140-6
Abstract: This paper evaluates scalable distributed crawling by means of the geographical partition of the Web. The approach is based on the existence of multiple distributed crawlers each one responsible for the pages belonging to one or more previously identified geographical zones. The work considers a distributed crawler where the assignment of pages to visit is based on page content geographical scope. For the initial assignment of a page to a partition we use a simple heuristic that marks a page within the same scope of the hosting web server geographical location. During download, if the analyze of a page contents recommends a different geographical scope, the page is forwarded to the well-located web server. A sample of the Portuguese Web pages, extracted during the year 2005, was used to evaluate: a) page download communication times and the b) overhead of pages exchange among servers. Evaluation results permit to compare our approach to conventional hash partitioning strategies.
Peer review: yes
Publisher Version:
Appears in Collections:IC - Artigos em Proceedings Não Indexados ao ISI/Scopus

Files in This Item:
File Description SizeFormat 
GIR2005-exp.pdf180,21 kBAdobe PDFView/Open

FacebookTwitterDeliciousLinkedInDiggGoogle BookmarksMySpace
Formato BibTex MendeleyEndnote Degois 

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.