-
Notifications
You must be signed in to change notification settings - Fork 102
Description
Hi all,
I tried the static PageRank computation on GraphX (in Spark 1.0.0) with the following code:
import org.apache.spark._
import org.apache.spark.graphx._
import org.apache.spark.rdd.RDD
object ord extends Ordering[(VertexId, Double)] { def compare(a:(VertexId, Double), b:(VertexId, Double)) = a._2 compare b._2}
val graph = GraphLoader.edgeListFile(sc, "/lhome/zma/test.txt")
for (i <- 0 to 30) {
println("iter: " + i)
val ranks = graph.staticPageRank(i)
ranks.vertices.top(100)(ord).foreach(println(_))
}
The /lhome/zma/test.txt contains a simple graph:
0 1
1 3
2 3
3 3
The PageRank implemenation ( https://github.com/amplab/graphx/blob/master/graphx/src/main/scala/org/apache/spark/graphx/lib/PageRank.scala ) uses this PageRank definition:
PR[i] = alpha + (1 - alpha) * inNbrs[i].map(j => oldPR[j] / outDeg[j]).sum
However, the results I get are:
iter: 0
(0,0.15)
(1,0.15)
(3,0.15)
(2,0.15)
iter: 1
(3,0.5325)
(1,0.27749999999999997)
(0,0.15)
(2,0.15)
iter: 2
(3,0.8384999999999999)
(1,0.27749999999999997)
(0,0.15)
(2,0.15)
iter: 3
(3,0.862725)
(1,0.27749999999999997)
(0,0.15)
(2,0.15)
...
The results with iterNum as 0 and 1 are correct. However, for "iter: 2":
The PageRank value for 3 should be:
0.15 + 0.85 * (0.27749999999999997 + 0.15 + 0.5325)
= 0.966
while the result is (3,0.8384999999999999). It seems the PageRank of vertex 2 is not passed to the PageRank of vertex 3 (0.8384999999999999 == 0.15 + 0.85 * (0.27749999999999997 + 0.5325)).
I noticed that the Pregel implementation (https://github.com/amplab/graphx/blob/master/graphx/src/main/scala/org/apache/spark/graphx/Pregel.scala) only sends messages from vertices changed in last superstep. I can understand this performance optimization for less messages. However, is this PageRank implementation on top of this model wrong? Will passing the delta out from changed vertices be more suitable for implementation on this "Pregel" model in GraphX?
Please correct me if I am wrong at understanding some parts. I will appreciate it if additional information or pointers to them are provided.