Network Size Estimation In A Peer-to-Peer Network

Date of Submission: 
September 15, 2005
Report Number: 
05-030
Report PDF: 
Abstract: 
The emergence of peer-to-peer networks over the last decade has changed user's perspective about information available on the Web. But, with thousand of nodes joining and leaving a peer-to-peer network within a short span of time, it has become practically impossible for a node (or peer) to keep track of complete network. Often times, however, a node needs to at least have an estimate of number of nodes in such a network. For example, in determining time-to-live for a search query packet, a node must have a good estimate of network size. Previous deterministic approaches require a complete walk on the network, since such networks usually lack a central authority. Such approaches hence do not scale well to large networks. A few approaches, which collect partial information about the network, have been proposed as an alternative to address the scalability issues. This paper presents a novel approach for size estimation of a peer-to-peer network. The basic idea is to sample nodes in the network and then using this partial information about the network, an estimate of the network size is obtained using capture-recapture method. The capture-recapture method is a statistical method, which has been widely used for estimation of size of a closed population in oceanography and epidemiology. For a better estimate, the capture-recapture method requires two or more random (independent) samplings (sets of detected nodes) of the network. In our case, for independent sampling, we use random walks on the peer-to-peer network, since a random walk can achieve same statistical properties as independent samplings for a peer-to-peer network (see Gkantsidis et al [1]). Experimental results show that the proposed random walk based capture-recapture approach gives a good estimate of network size. In addition, results of using proposed method as well as three other size estimation methods on scale-free and random networks shows that the former algorithm gives a better estimate (lesser error) with a slight overhead on computation. This research motivates further study of estimation techniques for open networks (i.e. networks whose size changes during the estimation process).