Skip to content

Conversation

@urazoff
Copy link
Contributor

@urazoff urazoff commented Mar 26, 2021

This allows to compute non-cryptographic hash
of mp_int which can be used as a key in a hash table.

@urazoff urazoff force-pushed the develop branch 3 times, most recently from 12730b2 to ad46b9e Compare March 26, 2021 10:01
@czurnieden
Copy link
Contributor

Ah, that's how you meant it (was unsure what and how much of the mp_int you wanted in the hash and forgot to ask).

One main point: it needs a version for MP_16BIT where the largest integer data type that can be safely assumed is uint32_t, so it needs the prime and start point for 32 bit FNV-1a (e.g.: 0x01000193 and 0x811c9dc5 respectively as listed on the Wiki-page) and an uint32_t data type for the hash.

The implementation is a bit slow. You can do it in one loop and because using mp_div_2d is a bit expensive here you could simply parse the mp_int.dp array and tackle each limb separately. Either ignore the leading zeros and just read the whole limb,, overwrite them with bits from the MSB side from the last hash, or do something more complicated.

And as much as I hate to insist on it but we really need a complete unit test. Just a couple of mp_ints (edge-cases and some random numbers) to compare the output of your function with the precomputed hashes will do, nothing fancy needed.

@urazoff urazoff force-pushed the develop branch 3 times, most recently from 0212975 to 911b6df Compare September 12, 2021 16:51
@urazoff
Copy link
Contributor Author

urazoff commented Sep 12, 2021

I updated the patch. Please, take a look at it. However, I am still not sure about some details e.g. signs handling and test cases.

@czurnieden
Copy link
Contributor

Looks good to me at first glance!
Tests should include edge cases, too: one with zero and one with all all-ones (all bits set) should suffice, I think.
Some of the literals might need suffixes (e.g.: UL) but we have a member here who has a better eye for these details.

@urazoff urazoff force-pushed the develop branch 2 times, most recently from 58ef57e to 75fc980 Compare September 13, 2021 20:17
@czurnieden
Copy link
Contributor

Looks good to me, thanks for the work!

Travis is still/again not working? Great! *sigh*

Copy link
Member

@sjaeckel sjaeckel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really like where this is heading towards 👍

The only things I think we should take into account are that hashes should be stable for each MPI, i.e. the implementation should return the same results no matter what MP_DIGIT_BIT says. Or am I wrong there?

Also I can see the need for a smaller hash, so that's fine for me as well but that one should (maybe) also be stable independent of MP_DIGIT_BIT.

Maybe it'd make sense to have mp_hash32(mp_int *a, uint32_t *hash) and mp_hash64(mp_int *a, uint64_t *hash)?

@sjaeckel
Copy link
Member

Travis is still/again not working? Great! sigh

Yeah, we should really put some effort into migrating away from travis, but I don't know where we should migrate to ... :( ... maybe GH actions?

@urazoff
Copy link
Contributor Author

urazoff commented Sep 15, 2021

I thought that MP_DIGIT_BIT controls the size of mp_digit and, accordingly, the base. So the same number with a different MP_DIGIT_BIT set would be represented with a different set of bytes so FNV would give a different result. So I am not sure that FNV can give a stable result.

If we would have mp_hash32(mp_int *a, uint32_t *hash) and mp_hash64(mp_int *a, uint64_t *hash) the latter would be impossible for compilers that do not implement the unsigned long long type so we would need to simulate it with two uint32_t.

So I think the result of hashing and the size of hash (respectively type of algorithm) depends on platform.

Please correct me if I understand something wrong.

Also, the only reference for bignum hashing I found is CPython's longobject hashing and they use some small custom algorithm.

@czurnieden
Copy link
Contributor

Yeah, we should really put some effort into migrating away from travis, but I don't know where we should migrate to ... :( ... maybe GH actions?

Took a look.
Got headaches.
Won't go there again.

But there is documentation. Sort of. And it seems as if we could implement a Travis clone with it if I understood it correctly. There is a time limit of 3,000 minutes/month (of what? CPU time?) for free accounts (or is it?). Travis runs for about 10 CPU-minutes (hard to tell exactly, of course) with our long list of tests. That are about 300 tests per month and with the current method of one automatic test per commit, there is a chance of a DoS but I think that can be restricted.

@urazoff
How often do you want/need to compute the hash?

You could take the mp_hash_fnv(mp_int *a, mp_hash *hash) route and use something like typedef struct {uint32_t hi;uint32_t lo;} mp_hash; for the costs of a small memory overhead and a slightly larger computation overhead. There is also a chance that uint32_t is implemented in software for 16-bit MCUs.

Or you ask yourself if uint32_t would be sufficient for all of your enumeration needs.

@urazoff
Copy link
Contributor Author

urazoff commented Sep 23, 2021

If you agree that it is impossible to do stable hashing (and I am not sure if it even could be needed in any case) using FNV then I think the patch is ready because I don't see why platform-dependent mp_hash is bad.

@sjaeckel sjaeckel force-pushed the develop branch 3 times, most recently from 8c800e1 to 68d818b Compare February 16, 2022 09:25
This allows to compute non-cryptographic hash
of mp_int which can be used as a key in a hash table.
@sjaeckel
Copy link
Member

Sorry for the long silence ... I hope you still have time&passion to follow up and finish this :)

Took a look.
Got headaches.
Won't go there again.

Yeah, the same for me.

Then I saw some examples of other projects and suddenly it wasn't that bad anymore :)

If you agree that it is impossible to do stable hashing (and I am not sure if it even could be needed in any case) using FNV then I think the patch is ready because I don't see why platform-dependent mp_hash is bad.

I agree that my previous point regarding stable hashing was more wish than reality. So yes, stable hashing doesn't make sense in this case/isn't possible/necessary and I think as well that the patch itself is ready, but CI says it isn't 😄

It's still missing documentation in docs/bn.tex ... and I'm not sure in which chapter it suits best ...

  • sub-section under "Comparisons"
  • section under "Basic operations"
  • section under "Little Helpers"
  • something else I didn't think of!?

Would it be sufficient to describe what the function does and is useful for? or could this maybe also include some minimal abstract "real application" example?

I started with this, feel free to improve&extend

\section{Hashing}
To get a non-cryptographic hash of an \texttt{mp\_int} use the following function.

\index{mp\_hash}
\begin{alltt}
mp_err mp_hash (mp_int *a, mp_hval *hash);
\end{alltt}

This will create the hash of $a$ following the FNV-1a algorithm as described on
\url{http://www.isthe.com/chongo/tech/comp/fnv/index.html#FNV-1a}.

NB: The hashing is not stable over different widths of an \texttt{mp\_digit}.

@urazoff
Copy link
Contributor Author

urazoff commented Mar 8, 2022

Yes, I still have passion to finish this!

I guess, it could be placed somewhere under chapter "Basic operations" as a section. May be it's sufficient to describe what the function does and is useful for so the description won't be too heavy.

\section{Hashing}
To get a non-cryptographic hash of an \texttt{mp\_int} use the following function.

\index{mp\_hash}
\begin{alltt}
mp_err mp_hash (mp_int *a, mp_hval *hash);
\end{alltt}

This will create the hash of $a$ following the FNV-1a algorithm as described on
\url{http://www.isthe.com/chongo/tech/comp/fnv/index.html#FNV-1a}. With the
help of this function one can use an \texttt{mp\_int} as a key in a hash table.

NB: The hashing is not stable over different widths of an \texttt{mp\_digit}.

@sjaeckel
Copy link
Member

@czurnieden something to add to that documentation? If not I'll commit it and merge this PR soon'ish.

@czurnieden
Copy link
Contributor

@sjaeckel I think FNV-1a should get a box \mbox{FNV-1a} (or \nobreakdash from amsmath but the LaTex community seems to prefer the former ). But that's "hard nitpicking", I guess ;-)

No, it's good, can go into develop. Thanks!

Signed-off-by: Steffen Jaeckel <[email protected]>
@sjaeckel sjaeckel merged commit 66de864 into libtom:develop Mar 14, 2022
@sjaeckel sjaeckel linked an issue Mar 10, 2024 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Enhancement proposal: add mp_int hash function

3 participants