Stable Configurations — On Peer-to-peer — Content Distribution, Acyclic Preference Networks

4.4 Stable configurations

Before concluding this chapter on networks with acyclic preferences, I propose to answer, at least partially and for Erdős-Rényi acceptability graphs, the second major question of acyclic preference systems: what are the properties of the stable configuration?

For , it is possible to study the distribution of partners through a (simplified) mean field approach. For global preferences, one can prove the existence of a fluid limit with an explicit solution, which shows in particular that the distribution of the value of a peer’s partner is centered around the value of that peer, with exponential decay: this is the stratification effect [37]. For random acyclic or geometric preferences, the solution of the fluid limit decays as a power law. Although I am unable to prove the mean field approach, I propose to validate it by comparing it with the exact solution (for global preferences) or using simulations.

Finally, I propose to extend the results to . The fluid equations then no longer seem to admit explicit solutions, but the same asymptotic behavior as for is observed (exponential for global preferences, power law otherwise). An unexpected consequence is that for geometric preferences, the stable configuration is a small-world if the quotas are sufficient.

Most of the results presented here are gathered in the article The Stable Configuration in Acyclic Preference-Based Systems [57], of which an extended version is available as a research report [58].

4.4.1 Specific notation

Since I propose to study the stable configuration in detail in this section, some additional notation is necessary, particularly to describe the distribution of partners. If and are acceptable, denotes the acceptable rank of according to ( being the best rank). is the acceptable ranking of . If has more than (acceptable) neighbors, is the th best acceptable neighbor of . Similarly, for any , denotes the rank of in the complete graph, without taking into account the notion of acceptability⁸. is the complete ranking of . For , is the th best neighbor, acceptable or not, of .

In all that follows, denotes the distribution of the stable partner or partners. To lighten the notation, I propose to use a loose notation, where the meaning of is specified using subscripts and superscripts whenever necessary. Thus, indicates the probability that has its peer of complete rank as a partner; is the probability that and are partners if there are peers with Erdős-Rényi acceptability of average degree ; if , is the probability that the th best stable partner of has acceptable rank …

The complementary cumulative distribution function (CCDF) of is denoted , and the normalized versions of and are respectively denoted and .

4.4.2 Acyclic equations

For the case of simple matching (), I propose a generic method to describe the complete rank of , if it exists, in the stable configuration . The generalization to -matching will be done in Section 4.4.5.

Exact equation

Let be the probability that , i.e. that the partner of , if it exists, has complete rank . The CCDF of is defined by , i.e. the probability that has a partner of complete rank greater than or equal to () or has no partner at all (shorthand notation for the disjunction of both events: ). Inspired by the approach used in [37], I propose an exact equation describing before giving a simplified version derived from a mean field approach.

To express , one can observe that for to be stable with its peer of complete rank , denoted , the following three conditions must be (and suffice to be) satisfied:

must be an acceptable edge, which occurs with probability if is an Erdős-Rényi graph ;
must not have a better stable partner than ();
must not have a better stable partner than ().

This leads to the following exact equation:

Approximate equation

The main difficulty of Equation (4.1) is the conditional probability, which is delicate to handle because of the correlations that may exist between and . The solution is to consider these correlations as negligible:

Approximation 4.1.

The events “ is not with someone better than ” and “ is not with someone better than ” are independent.

This approximation, which I sometimes somewhat loosely call “mean field hypothesis”, is reasonable for small enough⁹. Equation (4.1) can then be simplified to

To go further, one must take into account the type of preferences.

4.4.3 Global preferences

Since global preferences are characterized by a total order on the peers, one can dispense with the value matrix by labeling the peers from to , being the best. We thus directly consider the probability that and are partners. Noting that the complete rank of for is if and if (a peer does not rank itself), we obtain the relation between and :

Setting , we obtain the version of Equation (4.2) for global preferences:

This equation¹⁰, which is solved numerically by a double iteration, gives a very good approximation of the empirical distribution [37].

Normalization

An explicit fluid limit toward which the discrete distributions converge is a definite asset for describing the stable configuration. It indeed allows giving a complete, immediate and global description of the distribution for all and , which is not the case of Equation (4.4).

In order to obtain this limit, one must be able to compare distributions for arbitrary values of . It is therefore necessary to normalize . A fairly simple way to do this is to represent each peer by a normalized rank , with . More precisely, we associate to each the real number , and conversely to each positive real the integer . The normalized version of , denoted , is then defined by

is a step function (of two variables), which takes the values . The factor allows expressing simply as an integral of :

The normalized complementary cumulative distribution is defined by

and the relation between and is

Convergence of normalized distributions

If the average degree remains constant, I have shown the existence of a continuous limit of the distributions . The first step is to address the problem of an intrinsic discontinuity along the main diagonal (), due to the fact that . This is a minor problem since this discontinuity is simply there to recall the non-reflexivity of the matching. It is resolved by introducing a more “continuous” function , obtained by extending on the main diagonal using Equation (4.4):

The fluid limit, at constant degree, of the functions is then given by the following theorem:

Theorem 4.9.

Soit une constante. Si , avec , les fonctions convergent uniformément vers

This result shows that asymptotically, the distribution of partners depends only on the average degree of (assumed Erdős-Rényi). This allows quantitatively describing the stratification effect [37]: for fixed , the distribution of the stable partner of decays exponentially in , with intensity ( for large enough). In other words, a peer of normalized rank tends to have as partner a peer of the same rank, within plus or minus .

Proof.

The proof of Theorem 4.9 consists of four steps [58]:

show that the functions are uniformly Cauchy on ;
use Cauchy convergence to show that and admit limits and ;
give a PDE satisfied by ;
solve the PDE¹¹, and deduce from the solution.

□

The existence of this fluid limit had been proposed as a conjecture as early as [37], and proven for the case , but it was only later that the complete proof and expression were found [58].

Theorem 4.9 yields three immediate corollaries that complete the understanding of the stable configuration.

Corollary 4.1.

Considering , the probability that a peer of normalized rank is unmatched in the stable configuration is .

Corollary 4.2.

For (discrete case), a good approximation of is

Corollary 4.3.

Let a sequence of normalized distributions with unbounded increasing degree (). We have

This last corollary generalizes a theorem proposed in [37], which showed the existence of a weak Dirac limit in the case of a sequence of distributions with constant. It means that as soon as the degree tends to infinity (for instance as ), then asymptotically, a peer has as partner a peer of the same normalized rank (no deviation), and the probability of being unmatched becomes zero.

Validation

To validate the previous results, one must compare simulation results with Equation (4.4) (recurrence under the independence approximation), then with Equation (4.11). This is what was done in [37, 58].

As indicated previously, we observe for Equation (4.4) a very good precision, of order .

For the fluid limit, apart from the continuous extension on the main diagonal, the precision is also very good, except for small values of and high values of . These observations are consistent with the complete proof of Theorem 4.9 [58], which shows convergence in . They would even rather support convergence in , which is in my opinion the true bound, although this remains to be proven¹².

Exact resolution

In the situation I have just proposed (, global preferences, acceptability ), it is in fact unnecessary to resort to the independence approximation, since there exists a recurrence that yields the exact distribution [58].

I was able to obtain this recurrence by conditioning the fact that and are partners according to the rank of the partner of peer (less than , between and , greater than , or no partner).

Just as for the approximate formula, the exact formula admits a fluid limit, which obeys a certain PDE. Although the two PDEs (exact and approximate) are extremely different¹³, they give the same solution, namely Equation (4.10).

The fact that in this particular case, the recurrence given by the independence approximation has exactly the same fluid limit as the exact recurrence is a strong argument (albeit not a rigorous one) for justifying the use of this approximation in other cases. For a major drawback of the exact recurrence is that the “trick” used does not apply to other preferences, nor for . However elegant this recurrence may be, the independence hypothesis therefore remains indispensable if one wants to generalize the results.

4.4.4 Acyclic and geometric preferences

I now propose to turn to (random) acyclic and geometric preferences. For these preferences, one can note that the peers are undifferentiated. Over the set of possible realizations, one can therefore consider that all peers follow the same partner distribution: is independent of , and can therefore be denoted .

As for global preferences, the goal is to find the distribution of the complete rank by simplifying Equation (4.1) before solving it. I also briefly give the distribution of distances and elements for solving the distribution of the relative rank.

Distribution of the complete rank

For geometric or random preferences, the Approximation 4.1 is not sufficient, which is why I propose an additional approximation:

Approximation 4.2.

The complete rank is symmetric: .

I thus assume that is not too bad an approximation of for the preferences considered. This gives a very simple equation for :

can thus be obtained by a simple recurrence:

And is directly given by .

Fluid limit.

As for global preferences, can be normalized. For , we set . The normalization factor is now since this is the maximum value of ( was that of and in Section 4.4.3). is expressed as an integral of :

is then naturally defined by:

Theorem 4.10.

Soit une constante. Si , avec , les fonctions convergent uniformément vers

En particulier, la probabilité qu’un pair soit célibataire dans la configuration stable est asymptotiquement , et une bonne approximation de est

Proof.

The proof is the same as for Theorem 4.9, but easier. One must first show that the are uniformly Cauchy (which is easier here since there is only one variable and no need to introduce an extension on the main diagonal). We then have uniform convergence toward a continuous function . We then deduce from Equation (4.14) an ordinary differential equation satisfied by :

with the initial condition . The solution of Equation (4.19) is Equation (4.17), which completes the proof.

□

Validation.

Given the approximations made, it is necessary to verify on a few examples the precision of Equation (4.18). I have therefore tested its validity for a few values of and [58].

For , one cannot really say that the independence hypothesis is satisfied. Simulations show that the distributions are affected by the particular type of preferences, and the fluid limit is not very precise, particularly at the boundaries ( close to or ). One can among other things note that the probability of being unmatched given by Equation (4.18) is clearly overestimated if is even (since it is then actually zero), but exact for odd . The fluid limit nevertheless manages to give the behavior common to all preferences considered. From this point of view, it is better than the recursive (4.14) (from which it derives), which gives for .

As decreases, the empirical curves and the limit given by Theorem 4.10 converge very quickly. By , they are almost indistinguishable, which validates the use of the fluid limit as an approximation of the complete rank distribution.

Distribution of distances

It may be interesting to consider distributions other than that of the complete rank. For example, for geometric preferences, the distribution of distances between stable partners can be interesting if the performance of a matching is related to distances. Now, this distribution can be deduced from that of the complete rank:

Theorem 4.11.

Soit la probabilité qu’un pair n’ait pas de partenaire stable à distance inférieure à dans le tore unitaire de dimension . Soit le volume d’une boule de rayon dans ce tore. Sous la limite fluide, on a

Proof.

A ball of radius , centered on any peer, contains approximately peers since it occupies a proportion of the torus (which is assumed to be unit). The most distant peer inside this ball should therefore have a complete rank of approximately for the central peer, while being at a distance of approximately . We thus obtain the relation , and it only remains to use Equation (4.18) to conclude.

□

The function depends on the dimension and the norm used. For the infinity norm, we simply have . The expression for other norms can be somewhat more complicated due to possible wrap-around effects at the boundaries. Note that if I had chosen (with homogeneous density) instead of the unit torus, would simply have been the volume of a ball of radius , without boundary effects.

As usual, the precision of Equation (4.20) has been verified empirically, and the result is that it has virtually the same region of validity as Equation (4.18) [58].

Distribution of the relative rank

For random acyclic and geometric preferences, I have also studied the distribution of the relative rank, given by the function and its complementary cumulative distribution .

I therefore tried to adapt the method used for the complete rank. The conditions to be fulfilled for the stable partner of to be its th acceptable neighbor are the following:

must have at least neighbors (if the th neighbor exists, it is acceptable by definition),
must not be with a peer better than ,
must not be with a peer better than .

By adapting the independence and symmetry approximations to the relative rank, we then obtain the following recurrence formula [58]:

where is the regularized incomplete beta function.

Unfortunately, Equation (4.21) is much less precise than Equation (4.18) is for the complete rank, particularly for [58]. The reason is that correlations are much more present when it is the acceptable rank that is considered (particularly if one considers the close neighborhood).

As an exercise, I therefore asked myself whether it was possible to refine the estimate of the relative rank, and I partially succeeded: for , it is possible to have a better estimate in the fluid limit, which is obtained by conditioning on the normalized complete rank of the first acceptable partner [58]. We then obtain

where is the exponential integral.

This value is very good in almost all cases for random acyclic preferences. For geometric preferences, it is however necessary to be in good conditions, i.e. small and large [58].

4.4.5 Generalization to -matching

I now propose to extend the previous results to the -matching problem. is still assumed to be constant across peers for simplicity. I only propose here results on the complete rank, although it is a priori possible to reuse the techniques seen previously for the relative rank or distances.

Equations under the independence hypothesis

A peer can now have up to partners. We therefore denote, for , the distribution of the complete rank of the th best stable partner, and the corresponding CCDF. Just as for the case , it is possible to give the necessary and sufficient conditions for to be the th best stable partner of :

the pair must be acceptable,
the th best partner of (if ) must be strictly better than , while the th (if it exists) must not be,
the th partner of , if it exists, must not be better than .

By extending the independence hypothesis Approximation 4.1, we thus obtain a generic formula for -matching:

To be usable, this formula must be adapted to the type of preferences considered.

For global preferences, if denotes the probability that the th stable partner of is ¹⁴, we obtain the following recursive system, which is solved by a triple iteration over , and [37]:

Similarly, for random acyclic and geometric preferences, we obtain the following system from Approximation 4.2 (symmetry of the complete rank) [58]:

Using and , system Equation (4.25) is easily solved by a double iteration over and .

Simulations show that both systems agree very well with the empirical distributions as long as and satisfy the usual conditions [58]. The behavior is qualitatively very similar to that of simple matching: exponential decay for global preferences (but because of the multiplicity of partners, shifts appear between the density peaks of the and the value of the original peer), heavy tail distribution for geometric and random acyclic preferences.

For , fluid limits also exist. Unfortunately, unlike simple matching, I have not managed to find explicit solutions to describe them (and as time passes, I increasingly doubt that such solutions exist). It is nevertheless possible to give the PDEs satisfied by these limits.

For global preferences, the fluid limits satisfy

with the initial conditions .

Similarly, for acyclic and geometric preferences, the limits satisfy

with the initial conditions .

Even if these equations cannot be solved completely, the fluid limit still has several advantages.

First, the very existence of a limit allows computing it numerically with precision and using the result for multiple values of and . Take for example the case of global preferences (the same reasoning can be applied to acyclic and geometric preferences). We assume is fixed. Let be the maximum average degree of the distributions we want to evaluate, and a fixed sampling size of the fluid limit (the larger , the more precise the evaluation). We can then set and compute the (once and for all) using Equation (4.24). For a degree , we set ( is not necessarily an integer). Inspired by the normalization Equation (4.5), we obtain the following approximation for :

We can then in turn use this estimate of the fluid limit for discrete distributions. Thus, for any integer and for any , we have, for (which amounts to if )

Another advantage of fluid limits is that the PDEs they satisfy give us information about their behavior. One can for example use them to show that (and the equivalent for global preferences) and thus understand why the behavior in -matching remains similar to that of simple matching.

4.4.6 Some applications

Having put considerable effort into the mere study of distributions, I must admit I have not yet devoted much time to advanced properties of stable configurations, i.e. those likely to help understand existing systems and develop new ones. Here are nevertheless two: the stratification of global preferences and the “small-worldification” of geometric preferences.

Stratification

As we have just seen, with global preferences, the stable partners of a given peer have, on average, the same rank as . This is stratification, which guarantees a certain fairness in the stable configuration [37]: in terms of rank, what a peer gives should be roughly equal to what it receives. One must also remember that the have exponential decay with deviation of , being the average degree of the acceptability graph. One then realizes that the following trade-off must be resolved:

if is too small the deviation is high. In particular, if the entries of the value matrix (for example the bandwidths of a BitTorrent-like system) follow a non-uniform distribution, there can be a very large difference between the expected gain and what one gives. This problem was highlighted in [37] to explain potential flaws in the Tit-for-Tat technique employed by BitTorrent;
a large on the other hand strengthens fairness. But in practice, increasing the size of the acceptability graph has a cost for the peers: convergence time, memory space, graph maintenance… A large also shortens the deviation (down to the Dirac in the fluid limit), and therefore increases the diameter of the stable configuration, which can be problematic if one wants to propagate information through stable edges.

For the quota , which represents the maximum degree in the stable configuration, a similar trade-off exists: a large can improve fairness and decrease the diameter, but will be costly in resources.

This suggests that for most systems with global preferences (i.e. based on the sharing of bandwidth, storage capacity, computing power, uptime…), there should exist a pair (or more generally a coupling between an acceptability graph and a vector of quotas) optimal for the stable configuration, whose exact value would depend on the importance given to parameters such as diameter, fairness, convergence time, or maintenance cost.

Small-worldification

A small-world is a sparse graph (average degree in or even ) with an average shortest path length (ASPL) in and a high clustering coefficient (there are far more short cycles than for a random graph of the same size). For example, Kleinberg showed some years ago that an -dimensional grid could be transformed into a small-world by adding long-range edges following a distribution in [49].

Let us now look at what happens for a stable configuration, with in so that the configuration is “sparse”. If it is global preferences, the clustering is indeed there as a consequence of stratification, but the diameter tends to grow linearly [36]. Similarly, for random acyclic preferences, the small diameter is verified but there is no high clustering (the stable configuration behaves like an incomplete random -graph). On the other hand, for geometric preferences, the heavy tail distribution allows having both properties: on the one hand, most stable partners have a small complete rank; they are therefore “geographically” close which gives clustering; on the other hand, there exist long-range links that make the diameter small. Geometric preferences are therefore conducive to generating small-world configurations, and this is indeed what occurs [36]. This opens a number of perspectives on the use of the stable configuration, starting with an easy-to-use small-world generator.

What is surprising about this small-worldification with geometric preferences is that it is solely created by the way peers rank each other: actual distances are used to construct these rankings, but the values themselves play no direct role in the preference system in general and in its stable configuration in particular. This leads one to think that topological characteristics of a system can therefore be contained in a set of preferences.

As an example of characteristics, I computed numerically the small-world parameters of preferences on tori for a few dimensions, the result being that the diameter and clustering tend to decrease when the dimension increases [58]. I then looked at what the Meridian latencies gave, and the parameters obtained turned out to be close to those of the 3-dimensional torus. I particularly like this unexpected result, because it seems to suggest that there exists a dimension of the Internet in the sense of preferences, which is about . I can thus add my brick to the wall of efforts made everywhere to estimate a dimension of the Internet (see for example [11]).

⁸I assume the existence of a natural extension of the value matrix to non-acceptable pairs. This extension is immediate for global preferences (the intrinsic value) and geometric preferences (the distance). For random acyclic preferences, it is obtained by assigning random “phantom” values to non-acceptable edges.
⁹A few simple examples seem to indicate an error of order , confirmed by the simulations [37].
¹⁰For the record, Equation (4.4) was proposed by Julien Reynier based on an original idea by Fabien de Montgolfier. Chronologically it is the origin of Equation (4.2) (general equation), and not the other way around.
¹¹For the reader eager for “exotic” equations, this involves solving , with the initial condition . Although seemingly innocent, solving this non-local equation gave me quite a hard time, even though I already knew its solution by other means. I take this opportunity to thank François Baccelli who pointed me toward the approach of de-non-localization.
¹²The factor comes from the use of Grönwall’s lemma [12], which amplifies an incompressible quantization error of . But in practice, Equation (4.4), which serves as the basis for constructing the distributions, is self-stabilizing (an error in one direction at a given moment is compensated in the other direction at the next iteration), which Grönwall’s lemma does not allow to take into account.
¹³The exact PDE is relatively classical and is solved by the method of characteristics, whereas the approximate PDE, described in an earlier note, had to be solved in a less conventional manner.
¹⁴I take the opportunity to point out that is no longer symmetric, whereas it was for .