-
Notifications
You must be signed in to change notification settings - Fork 170
Dict implementation design #983
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Adding TODOs here after #975 is merged (in decreasing priority order),
|
Question - Should we allow floats/reals as keys? The reason why I am asking this question is because floats are always approximately equal. So, converting them to int of same size by memcpy won't work well. Plus, its not a good practice to use floats as keys. 123.000 and 123.000001 are approximately same, now user may expect them to work as a single key but due to slight difference both will give totally different hash value. So disallowing these as keys will prevent a lot of bad practices. Also, in case of collisions we will go for exact equality checks in keys but as I said above floats are approximately equal mostly, so users may get very random results due to this property of floats. Even if we want to allow floats as keys, we should ask the user to provide an equality function in the dict type which we should use to compare floats in case of collisions. See https://softwareengineering.stackexchange.com/a/391107 and https://diego.assencio.com/?index=67e5393c40a627818513f9bcacd6a70d for details. |
For now I would not. Later once we allow a use defined hash, we can, but this is rarely used in Python anyway, so for now let's just give a compiler error. |
Hey everyone. I want to work on development and enhancements of Data Structures for LPython. Can anyone suggest a #983 seems to be dormant from some time. So, anything which I can help with? |
This one may be. |
For
For the above code, when I sequentially ran LPython on it with flags
For the above code, when I again ran sequentially with the above mentioned flags it fails on @czgdp1807 Sounds good? If possible, can you point me to the associated files with above Issue? |
Yes. Makes sense to me. |
This issue is to be handled in It seems the lpython/src/libasr/codegen/asr_to_llvm.cpp Line 3756 in c792719
|
Hey. I was able to figure out a few things as to how to implement the solution of Return Type at least. First of all, there are two apis that have been exposed by LLVM to us - In the above mentioned place by @Thirumalai-Shaktivel, I added the following logic for a return-type of Dictionary.
@certik @czgdp1807 After this, what else needs to be changed? I guess |
It seems like you have provided a description of the different freedoms we have in implementing dictionaries and some suggestions on how to approach implementing them. There are different choices that can be made in implementing dictionaries, such as the hash function, load factor, resizing strategy, collision strategy, memory allocation, ordered vs unordered, and hash table vs red-black tree. A good starting point is to choose default options for all of the above choices. To allow users to select their own options, a type annotation can be used, such as x: Dict[hash3, loadfactor(5), resizing("x"), collision_strategy("list")]. Benchmarking can be used to choose good solid defaults. A user-defined hash function can be added later for optimal performance. Ideally, the hash function should be defined in the frontend language as well. Ordered dictionaries can be implemented on top of unordered dictionaries by maintaining a data structure to track the order of key insertion nodes and another hash table of key -> node, but this may result in slower insertion and deletion due to the additional data structures. |
Hey!! Can you give an example of I feel that I should go with nested dicts as value support first and then should try for other data types? As per my knowledge, A Dictionary keys of a Dictionary must be unique and of immutable data type such as Strings, Integers, Floats, tuples ( |
@czgdp1807 I have completed the task |
Hello @czgdp1807 is this project still available for GSoC or is this completed? |
Uh oh!
There was an error while loading. Please reload this page.
Here are the freedoms that we have in implementation dictionaries:
Here is how we can do this:
x: Dict[hash3, loadfactor(5), resizing("x"), collision_strategy("list")]
.Note: #975 implements unordered dict. The ordered dict implementation can be implemented on top by adding some data structure to maintain the order of key insertion nodes (such as linked list), and another hash table of key -> node, so that you can do insertion, lookup and deletion all in O(1); it seems lookup is as fast as for unordered dict, and so is unordered iteration; but insertion and deletion is slower due to updating of the two extra datastructures; and there is a new feature "ordered iteration" (that did not exist in unordered dict, but it is slower than "unordered iteration"); more details at https://github.com/Aqcurate/OrderedDict#ordered-dictionary-implementation.
The text was updated successfully, but these errors were encountered: