Foundations — On Peer-to-peer — Content Distribution, Acyclic Preference Networks

4.2 Foundations

The purpose of this section is to present the field of preference-based networks, in broad strokes, paved and marked out after three years of exploration: the notations and definitions, as they have stabilized over time (Section 4.2.1); the grand theorem of acyclic preferences, which despite its simplicity is the backbone of the model (Section 4.2.2); and finally, the taxonomy of acyclic preferences, which shows how to relate the model to real-world situations (Section 4.2.3).

4.2.1 Notations and definitions

As mentioned in the introduction, preference-based networks originate from the theory of -matchings [20, 24], and the notation reflects this, even though a specific sub-vocabulary has been introduced over the course of various works [36, 37, 50, 55, 56]. Regarding the French version in particular, the often colorful vocabulary owes much to Dominique Dumont’s popular science article Mariages Stables (Stable Marriages) [28].

Thus, an instance of a preference-based network consists of a set of peers, an acceptability graph, preferences, and quotas. A configuration is a set of interactions, whose dynamics are described by peer initiatives.

Preference-based network

The preference-based network itself thus consists of a set of peers (or nodes) and a graph . Preferences are indicated by a value matrix and quotas by a vector .

The acceptability graph is an undirected and non-reflexive graph. It describes compatibilities: two peers and can interact if, and only if, (iff) . We then say that is acceptable for , and vice versa. For example, can represent knowledge of other peers (it is rare for a peer to know all other peers in a network; its view is generally limited to a certain number of neighbors), or the existence of a common interest (searching for the same file, belonging to the same group, …), or simply an imposed overlay structure. The graph can in principle be arbitrary; however, we will often use in this chapter Erdős-Rényi graphs , where each edge exists with probability independently of the others (the average degree is thus ). This choice is justified by the facilities that graphs offer for the theoretical analysis of preference-based network properties.

The value matrix indicates the interest that peers have in one another. It thus determines preferences. For every acceptable pair of peers , is the value that assigns to . Unless otherwise stated, lower values are preferred¹. Thus, means that prefers to . An assumption made throughout this chapter is that there are never ties, that is, for each row of , the acceptable values are pairwise distinct². The matrix may possess certain special properties that will be described in Section 4.2.3. I also note that while it is always possible, and convenient, to assume the existence of a value matrix , this is not essential from a theoretical standpoint, since only the orders (preferences) induced by matter [10, 36]. It is not necessary for the peers to know . In the worst case, a peer may not even know the values in its own row. However, in order to preserve the meaning of the value matrix concept, we assume that it is always possible for a peer to compare the values of two neighbors by contacting them, for example during an initiative (see below).

Finally, the quota vector, , limits the number of collaborations: a peer cannot collaborate with more than neighbors simultaneously. One can of course always assume that is no larger than the degree of in . Just as is often assumed to be Erdős-Rényi, is generally assumed to be constant when analyzing a network, even though any distribution is allowed in principle.

Configurations

The configuration of a system describes the state of collaborations. Formally, a configuration is a subset of . The neighbors of in are the current partners of . If denotes the number of partners of in , the quotas imply . If , we say that is undermatched.

A configuration can evolve through the resolution of adulterous edges (the term is a tribute to the original theory of stable marriages). An acceptable edge is adulterous for a configuration if it does not belong to and if each peer of the edge has an interest in establishing the collaboration , even if this means abandoning a collaboration in to satisfy the quotas. Formally, an edge is adulterous iff:

( is undermatched), or ( prefers to one of its partners);
or (symmetric condition).

A peer that is adjacent to at least one adulterous edge is eligible (it is likely to modify the configuration). A configuration that has no adulterous edge (or equivalently no eligible peer) is stable.

Initiatives

Adulterous edges are at the basis of configuration dynamics. More precisely, evolution always comes from an eligible peer that tries to resolve an adulterous edge to which it belongs. This process, assumed to be atomic, is called an initiative. An initiative is active if it results in the resolution of an adulterous edge.

Formally, the initiative is a selection “function” (which may depend on time, the system state, be random, …) that assigns to each peer in a neighbor to “try”, or more formally an edge of incident to . While by default, all such edges may be considered by this selection function, in the case where peers have knowledge of the adulterous edges to which they belong, it is possible to restrict the selection to these edges only.

Examples of initiatives include best partner selection (among adulterous edges), decremental selection (round-robin choice among the list of neighbors), or simply random selection. Note that the different types of selection implicitly require more or less knowledge: for instance, best partner and decremental require the ability to sort one’s neighbors (and even to identify adulterous edges for the former), whereas a purely random selection only requires the ability to evaluate a neighbor on the fly. Finally, it is always possible to hybridize several selections to obtain a more efficient one. An example of a hybrid selection will be described in Section 4.3.2 (after [55]).

Starting from a given initial configuration, the evolution of a system is thus described by the sequence of initiatives performed by the peers. A classical sequence is the Round-Robin sequence, suitable for modeling single-period peer behavior, or the uniformly random sequence, used to model a homogeneous Poisson process (each peer follows an i.i.d. Poisson process). Finally, there are adversarial sequences, which try to behave in the “worst” possible way.

For convenience, time is measured directly in terms of the sequence of initiatives (there is thus no need to explicitly introduce temporal initiative processes). It is nevertheless possible to consider several measures depending on the context. For example, in the context of self-stabilization in Dijkstra’s sense [25], only active initiatives are counted, and sequences are divided into rounds (a round is a subsequence such that each peer eligible at the beginning of the round takes an initiative or becomes non-eligible during that round). For simulations, I prefer to define the time unit (t.u.) as initiatives (active or not). Thus, each atomic initiative takes t.u., and after t.u., the number of initiatives per peer is on average . When the initiative sequence is round-robin, round and time unit can be considered synonymous, but this is not true in the general case.

Local stability

The notion of initiative leads to an alternative definition of stability: a configuration is stable if the only configuration reachable by initiatives from is itself. It is easy to verify that for arbitrary initiatives, this definition is equivalent to the one given in Section 4.2.1.2 (no adulterous edge). The interest of an initiative-based approach to stability is that it allows a local definition: an edge of a configuration is stable iff exists in all configurations reachable from . In other words, a stable edge is a collaboration that initiatives cannot break.

When a peer is incident to a stable edge, its degree of freedom is reduced. The free quota of a peer , denoted , is the quota minus the number of stable edges incident to . The free quotas form a vector that is a function of the configuration and that only decreases over time. It is thus possible to extend the definition of stability to the peer level: a peer is stable (or deactivated) if or if shares a stable edge with all its non-stable neighbors.

Finally, a last fundamental notion regarding stability is that of heat. In a given configuration , an acceptable edge outside of () is hot iff:

belongs to the best non-stable neighbors of ,
belongs to the best non-stable neighbors of .

There also exists a stricter definition of heat: an acceptable edge between two non-stable peers that are not collaborating is said to be boiling for iff is the best non-stable neighbor of and vice versa.

By extension, a peer is said to be hot (resp. boiling) if it is adjacent to a hot (resp. boiling) edge.

Intuitively, hot edges, and even more so boiling ones, are super-adulterous edges that only need a well-placed initiative to become stable edges. They play a central role in acyclic preference-based networks, because of Lemma 4.1, which is used in many proofs:

Lemma 4.1 ([36, 50]).

If is a non-stable configuration of an acyclic preference-based network, then there exists at least one boiling edge with respect to .

Obviously, this lemma only makes sense once the notion of acyclicity has been defined. This is in fact the subject of the remainder of this section on the foundations of preference-based networks.

4.2.2 Grand theorem of acyclic preferences

A preference cycle, or Kieschnick cycle, is a cycle of peers such that each peer prefers its successor to its predecessor: prefers to , prefers to , …, prefers to (or, expressed in terms of values, , , …, ).

The main subject of this chapter is acyclic preference-based networks, that is, networks that do not contain any Kieschnick cycle. Note that acyclicity is entirely defined by , and depends neither on the quotas nor on the initiatives. Similarly, if produces acyclic preferences for complete acceptability, then the network will be acyclic for any acceptability graph . By convention, such a value matrix is also called acyclic. A classification of acyclic matrices, and their connection to P2P systems, are proposed in Section 4.2.3.

Acyclic preference-based networks are characterized by a property that makes them unique among preference-based networks:

Theorem 4.1 ([36]).

An acyclic preference-based network admits one, and only one, stable configuration. Moreover, it is self-stabilizing by initiatives.

Proof.

The complete proof is available in [36], but I give the idea here, because of the importance of this theorem and the typical nature of the techniques employed. We proceed in two steps: first show self-stabilization (which will prove the existence of a stable configuration), then uniqueness.

Self-stabilization comes from the following property: regardless of the initiative sequence considered and the starting configuration, the corresponding sequence of configurations is irreversible. That is, if the system was in a configuration in the past and is now in a configuration , then is not reachable from . Irreversibility is proved by contradiction, by extracting a Kieschnick cycle from a cycle of configurations. Since is assumed finite, the number of possible configurations is also finite, which means that any trajectory of configurations necessarily leads to a configuration that no longer evolves, i.e., a stable configuration.

Uniqueness is also proved by contradiction: if one considers two distinct stable configurations and of the same system, and a peer whose collaborations differ from to , then one can construct a Kieschnick cycle starting from (even if does not necessarily belong to the cycle ultimately obtained).

□

I emphasize once again the fundamental nature of this existence/uniqueness/self-stabilization theorem, as it allows us to dispense with the stability analysis performed for other types of preferences, and to focus more closely on the configurations themselves.

The first application of this theorem appeared in the article Stratification³ in P2P Networks: Application to BitTorrent [37]. In this article, which sought to model a BitTorrent network as an acyclic preference-based network (I will discuss this in more detail later in the chapter), we measured the effective convergence between actual and stable configurations. To estimate the practical importance of self-stabilization, we had considered three types of scenarios (cf Figure 4.1):

Static (Figure 4.1a): Starting from the empty configuration (often denoted ), we observed convergence toward the stable configuration. We observed good convergence in all cases, while noting that the system parameters, and the average degree of the acceptability graph in particular, played a major role;
Atomic alteration (Figure 4.1b): A second elementary scenario consists of removing a peer from a stable configuration (which modifies the stable configuration), and letting the system converge again. We observe that the actual configuration does not deviate much from the stable configuration, and eventually re-converges, even though, due to a possible domino effect, the convergence time may be comparable to the static case.
Continuous churn (Figure 4.1c): Finally, a third scenario consists of having peers join and leave the system at a certain rate. This scenario shows that the actual configuration does not always manage to catch up with the evolution of the stable configuration, especially when churn is extreme. However, self-stabilization ensures that the actual configuration remains reasonably close to the stable configuration, the distance between the two being roughly proportional to the churn.

Following this study, I began working under the assumption that, provided the system’s convergence is sufficiently fast compared to its evolution, the stable configuration should be a good approximation of actual configurations. This is the spring metaphor (cf Figure 4.2). This metaphor suggests decomposing the study of acyclic preference-based systems into two distinct problems:

Figure 4.2: The spring metaphor: self-stabilization can be seen as a continuous attraction of the current configuration toward the stable configuration, as if a spring connected these two configurations. This principle holds even if the stable configuration moves over time.

What is the convergence speed?: Indeed, Theorem 4.1 only gives us the number of possible configurations as a bound. This number grows factorially [21] and thus offers rather limited practical interest. A first area of study therefore consists of characterizing more precisely the effective stiffness of the spring.
What are the properties of stable configurations?: The previous question should help determine to what extent a stable configuration is a good approximation of an actual configuration. When this is the case, the generic properties of stable solutions can help estimate system performance.

These two questions will be the subject of the next two sections of this chapter. But before answering them, I propose to take a closer look at these acyclic preferences, of which I have so far given only a summary and impractical definition.

4.2.3 Taxonomy of acyclic preferences

Among all possible real matrices, few are acyclic. In fact, if one takes a matrix whose coefficients are chosen at random, it is very likely that the resulting preferences contain at least one cycle of length ⁴. On the other hand, many matrix properties are synonymous with acyclicity. For example ([36]):

Global matrices: A value matrix is global if all its rows are identical when restricted to acceptable entries. Intuitively, this is a special case of a rank- matrix, and formally, for all that accept . The corresponding preferences, also called global, reflect a total order on the peers. One can moreover note that because of this total order, up to acceptability and permutation of peers, there is only one global preference.
Symmetric matrices: If is symmetric on its acceptable entries ( for every acceptable pair), is acyclic. An interesting property of symmetric matrices is that they describe the set of all acyclic preferences (cf [10, 36]). The relationship between acyclic, symmetric, and global matrices, and the corresponding preferences, is illustrated in Figure 4.3 ⁵.
Linear combination of global and symmetric matrices: The resulting matrices are acyclic as long as they do not generate ties (but as noted in Section 4.2.1.1, it is always possible to eliminate ties). On the other hand, a linear combination of acyclic matrices is not necessarily acyclic: if the input matrices are not themselves already linear combinations of symmetric and global matrices, they must be symmetrized, which is fairly easy since the proof that symmetric matrices surject onto acyclic preferences is constructive (cf [10, 36]).

Figure 4.3: Correspondences between matrices and preferences: symmetric and global matrices form two distinct subsets of acyclic matrices, but symmetric matrices can describe the entire set of acyclic preferences (which includes global preferences).

Preferences suited to P2P

Having outlined what makes preferences acyclic, I now propose to refocus on the peer-to-peer context and consider acyclic peer-to-peer preferences.

Capacities: A peer in a P2P network possesses many intrinsic scalar characteristics that can create preferences: access bandwidth, storage or computing capacity, average uptime, … Taking the example of the file-sharing protocol BitTorrent [23], a tit-for-tat algorithm causes a peer to preferentially collaborate with neighbors that provide it the highest download speed. This can be seen, as a first approximation, as a preference-based network where the value is the upload bandwidth divided by the quota (the number of simultaneous uploads).
Proximities: All values that can be assimilated to some kind of distance (physical or virtual) or similarity are symmetric by nature. Thus, many P2P systems try to minimize latencies⁶, such as the DHT Pastry [68] or real-time online gaming applications [51]. Similarly, massively multiplayer online games (MMOGs) must connect players that are close to each other in some virtual space [33, 47, 48]. Some authors also propose connecting participants of a file-sharing system based on their common interests [29, 71], which remains a symmetric measure. Co-uptime, that is, the (average) common active time, is a last example of a symmetric measure of interest for collaborative applications.
Complementarities: Measuring differences between participants’ resources can also prove interesting. For example, in a distributed file storage application, it is useful to find machines that are on when my own machine is off, in order to keep my data available at all times. The corresponding measure is complementary uptime. Similarly, in a system like BitTorrent, all participants seek to have the same file, which is divided into blocks⁷. Finding neighbors that have the most blocks I have not yet obtained is of interest. All these so-called complementary preferences are a special case of a linear combination of a global matrix and a symmetric matrix, and are therefore acyclic [36].

Preference classes studied

As we have just seen, there exist many acyclic preferences of interest from the P2P perspective. Over the course of my work, I focused on four classes of acyclic preferences, which avoids getting lost in a maze of special cases while giving a precise idea of the links between preferences and system properties.

Global: Although representing a negligible subset (in terms of cardinality) of acyclic preferences, global preferences are among the most important acyclic preferences (they model capacities), and must receive particular attention. Since there is in fact only one unique global preference system, induced by a total order on the peers, I will use this order instead of making the value matrix explicit. Peers will thus be labeled from to , being the best (it is preferred by all its acceptable neighbors), and so on…
Geometric: For preferences derived from distances, the properties of preferences depend greatly on how the points are positioned in the underlying space. One can moreover easily show that all acyclic preferences can be generated by placing the peers at the vertices of a simplex and perturbing it (by slightly moving the vertices) in an appropriate manner. Nevertheless, experience seems to show that certain characteristics occur frequently. To obtain typical and analyzable proximity preferences, I propose to consider preferences obtained by taking distances between random points on a unit -torus of dimension ().
Real latencies: Having real measurements to validate results is obviously necessary, even though theoretical analysis then becomes difficult. For , I use, to check the validity of certain geometric results, subsets of size extracted from the Meridian project dataset [6]. The values are the (symmetric) latencies between the selected nodes.
Random acyclic: A last class of preferences is obtained by assigning to each acceptable edge a uniform random value between and . I call the resulting preferences random acyclic (or simply acyclic when there is no ambiguity), because it can be shown that this approach produces a uniform sampling over the set of all possible acyclic preferences [10, 36].

¹This choice is purely conventional and does not affect any of the results presented here. It is natural when considering latencies or distances. In the case of values such as bandwidth, or more generally capacities, the opposite would be more appropriate.
²Ties in preferences are a source of problems that I do not wish to discuss in this chapter [43–45, 52, 53, 65]. We will therefore assume that ties can always be broken, for example by assigning each peer a unique identifier to resolve any ambiguity.
³Many people ask me why stratification, instead of clustering for example. Well, first of all, because clustering is already semantically overloaded in computer science; then because the image of strata is at least as evocative; and finally because the word is elegant.
⁴The probability that three mutually acceptable peers form a cycle is , for uniform random values. The probability of having no such cycle is therefore at most (if one neglects possible correlations), which is not worth much as soon as is large enough.
⁵ Figure 4.3 is of course incomplete. It is missing, for example, linear combinations in general, and complementary matrices in particular, as well as the generation of the set of all acyclic preferences by permutations of the entries of a symmetric matrix with distinct coefficients. But graphical readability would suffer.
⁶The question of whether latencies are distances, or even symmetric, naturally arises. In any case, empirically, they do not produce Kieschnick cycles.
⁷It even appears that sometimes, some blocks may be rarer than others [59].