Skip to content

low efficiency of "from_networkx" compared to TupleList and the other way #401

Closed
@geowcz

Description

@geowcz

I encounter the slow efficiency problem of "from_networkx" when I was trying to use the Leiden algorithm on a very large network which is composed of around 33,000 nodes and 11.3 million edges. I reported the problem in leidenalg GitHub and the author Traag suggested me to report the issue here. So here are my experiments about the three ways to load the network:

"from_networkx":

G1 = ig.Graph.from_networkx(G)
prepartition = la.find_partition(G1, la.RBConfigurationVertexPartition, None, G1.es["weight"], 20, 0, 1, resolution_parameter = inputResolution)
partition_dict = {}
for name, membership in zip(G1.vs["_nx_name"], prepartition.membership):
	partition_dict[int(name)] = int(membership)

"TupeList":

G1 = ig.Graph.TupleList(edges, directed=False, vertex_name_attr="name", edge_attrs= ["weight", "estTime"], weights= False)
prepartition = la.find_partition(G1, la.RBConfigurationVertexPartition, None, G1.es["weight"], 20, 0, 1, resolution_parameter = inputResolution)
for name, membership in zip(G1.vs["name"], prepartition.membership):
	partition_dict[int(name)] = int(membership)

"the third method":

G1 = ig.Graph(directed = False)
G1.add_vertices(list(set(G.nodes)))
G1.vs["name"] = list(set(G.nodes))
G1.add_edges([(x, y) for (x, y, z, w) in edges])
G1.es['weight'] = [z for (x, y, z, w) in edges]
G1.es['estTime'] = [w for (x, y, z, w) in edges]
prepartition = la.find_partition(G1, la.RBConfigurationVertexPartition, None, G1.es["weight"], 20, 0, 1, resolution_parameter = inputResolution)
for name, membership in zip(G1.vs["name"], prepartition.membership):
	partition_dict[int(name)] = int(membership)

The "from_networkx", "TupleList", and "creating a graph by adding vertices and edges" took around 14.7 minutes, 0.82 seconds, and 0.14 seconds accordingly. Such a distinct difference may be stem from the "networkx".

Also, I found the "from_networkx" can load all my nodes no matter the nodes have edges or not, but "TupleList" can only load the nodes that have edges, and the third way can load all nodes in ascending order.

BTW, I used the python-igraph 0.8.3 and I obtained it from your site.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions