Description
Feature or enhancement
Please consider extending the interface of hashlib.file_digest()
so that it can calculate a file's hashsums for multiple algorithms efficiently, that is without re-reading the file multiple times.
One idea would be that if its digest
parameter is a list (or perhaps even any iterable) it would simply create multiple digest objects (one per algorithm) and call .update()
on each of those.
In the end it might e.g. return a dict, where the key is the algorithm and the value the hashvalue, tough this wouldn't work properly I guess, if digest
ain't a string.
So maybe just return an (ordered) list of hashvalues and put it in the responsibility of the caller to know the order of algorithms as passed in digest
.
In principle implementation seems easy at a first glance, but at a 2nd one it may be more complex (well at least for me, being a Python-noob):
file_digest()
calls update()
twice, depending on the object type, I guess:
Line 215 in 4849a80
and
Line 236 in 4849a80
In both cases I don't really know whether it's possible (and if so efficiently) to simply use the source (i.e. fileobj.getbuffer()
respectively view[:size]
) multiple times and always get the same data without any additional reading.
Probably not so in the first case?
Pitch
Admittedly, most use cases need only one hash algorithm. But there are some more cases, beyond a utility that prints various hashalgos for a given file 😉 , that could benefit from this. For example in security, when verifying files it's not so uncommon to verify against multiple different hash algos. E.g. Debian’s secure APT files (Release
and Package
files) contain typically various hash algos for a given file.
Of course one can simply manually read the file in binary mode and .update()
a number of digests and not use file_digest()
at all.
But this looses any optimisations done by that (like the zero-copy buffer, or should it ever get the already indicated method using AF_ALG sockets and sendfile() for zero-copy hashing with hardware acceleration). For users it would be just nice to have a function that does it right out of the box.
Previous discussion
No real discussion I guess, but I've asked for opinions in #89313 (comment) and was recommended to open a new issue.