Skip to content

low efficiency of "from_networkx" compared to TupleList and the other way #401

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
geowcz opened this issue May 27, 2021 · 12 comments
Closed
Milestone

Comments

@geowcz
Copy link

geowcz commented May 27, 2021

I encounter the slow efficiency problem of "from_networkx" when I was trying to use the Leiden algorithm on a very large network which is composed of around 33,000 nodes and 11.3 million edges. I reported the problem in leidenalg GitHub and the author Traag suggested me to report the issue here. So here are my experiments about the three ways to load the network:

"from_networkx":

G1 = ig.Graph.from_networkx(G)
prepartition = la.find_partition(G1, la.RBConfigurationVertexPartition, None, G1.es["weight"], 20, 0, 1, resolution_parameter = inputResolution)
partition_dict = {}
for name, membership in zip(G1.vs["_nx_name"], prepartition.membership):
	partition_dict[int(name)] = int(membership)

"TupeList":

G1 = ig.Graph.TupleList(edges, directed=False, vertex_name_attr="name", edge_attrs= ["weight", "estTime"], weights= False)
prepartition = la.find_partition(G1, la.RBConfigurationVertexPartition, None, G1.es["weight"], 20, 0, 1, resolution_parameter = inputResolution)
for name, membership in zip(G1.vs["name"], prepartition.membership):
	partition_dict[int(name)] = int(membership)

"the third method":

G1 = ig.Graph(directed = False)
G1.add_vertices(list(set(G.nodes)))
G1.vs["name"] = list(set(G.nodes))
G1.add_edges([(x, y) for (x, y, z, w) in edges])
G1.es['weight'] = [z for (x, y, z, w) in edges]
G1.es['estTime'] = [w for (x, y, z, w) in edges]
prepartition = la.find_partition(G1, la.RBConfigurationVertexPartition, None, G1.es["weight"], 20, 0, 1, resolution_parameter = inputResolution)
for name, membership in zip(G1.vs["name"], prepartition.membership):
	partition_dict[int(name)] = int(membership)

The "from_networkx", "TupleList", and "creating a graph by adding vertices and edges" took around 14.7 minutes, 0.82 seconds, and 0.14 seconds accordingly. Such a distinct difference may be stem from the "networkx".

Also, I found the "from_networkx" can load all my nodes no matter the nodes have edges or not, but "TupleList" can only load the nodes that have edges, and the third way can load all nodes in ascending order.

BTW, I used the python-igraph 0.8.3 and I obtained it from your site.

@szhorvat
Copy link
Member

BTW, I used the python-igraph 0.8.3 and I obtained it from your site.

While in this case it probably makes no difference, I wanted to note that 0.9.1 has been available for quite a while now, and that it is unclear what you mean by "from your site".

It is indeed important to make this conversion as fast as possible.

@geowcz
Copy link
Author

geowcz commented May 27, 2021

Well, when I was trying to find a place to post the issue, I clicked the bug report button and there was a sentence at the bottom to let me give my version information, that is why I posted it. My main point is the efficiency problem of "from_networkx".

@szhorvat
Copy link
Member

@jwang-19 I understand why you tried to provide the information, but we still don't know what you meant by "your site".

@szhorvat
Copy link
Member

szhorvat commented May 27, 2021

@iosonofabio I am looking at this line:

https://github.com/igraph/python-igraph/blob/master/src/igraph/__init__.py#L1949

Adding edges one-by-one is extremely inefficient in igraph (regardless of interface). Graphs should never be built this way. Each add_edge will rebuild the entire graph from scratch. The usual way around this is to first record all edges, then build the graph in a single go using add_edges.

@szhorvat
Copy link
Member

If that is indeed the problem (it has to be, but it seems strange that it would cause a slowdown of only 100x with 10^7 edges), we need to fix it here as well:

https://github.com/igraph/python-igraph/blob/master/src/igraph/__init__.py#L2046

@geowcz
Copy link
Author

geowcz commented May 27, 2021

@jwang-19 I understand why you tried to provide the information, but we still don't know what you meant by "your site".

oh, okay, so I installed the package from this site: https://pypi.org/project/python-igraph/#files last year.

@geowcz
Copy link
Author

geowcz commented May 27, 2021

If that is indeed the problem (it has to be, but it seems strange that it would cause a slowdown of only 100x with 10^7 edges), we need to fix it here as well:

https://github.com/igraph/python-igraph/blob/master/src/igraph/__init__.py#L2046

Yes, I was also curious why it took such a long time. It seemed to be fast when I used the function on a small network that is composed of around 6000 nodes, but it became slower when I used a large one.

@geowcz
Copy link
Author

geowcz commented May 27, 2021

@iosonofabio I am looking at this line:

If that is indeed the problem (it has to be, but it seems strange that it would cause a slowdown of only 100x with 10^7 edges), we need to fix it here as well:

https://github.com/igraph/python-igraph/blob/master/src/igraph/__init__.py#L2046

Yes, you are right.

@ntamas
Copy link
Member

ntamas commented May 27, 2021

from_networkx performance issue is now fixed by @szhorvat in f6eb8cf . I'll probably release a new version after sorting out from_graph_tool as well.

@iosonofabio
Copy link
Member

Thanks guys, happy to edit the graph tool function @ntamas

@geowcz
Copy link
Author

geowcz commented May 27, 2021

Thank you for the great work, I am looking forward to your new version.

@ntamas
Copy link
Member

ntamas commented Jun 2, 2021

Fix released in 0.9.4 so I'm closing this.

@ntamas ntamas closed this as completed Jun 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants