Skip to content

Scaling up lists to different types and supporting ListConstant in LLVM #835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 29, 2022

Conversation

czgdp1807
Copy link
Collaborator

No description provided.

@czgdp1807 czgdp1807 added the llvm LLVM related changes label Jul 28, 2022
@czgdp1807 czgdp1807 marked this pull request as ready for review July 28, 2022 15:08
@czgdp1807 czgdp1807 requested a review from certik July 28, 2022 15:08
@czgdp1807
Copy link
Collaborator Author

@certik I have revamped lists to support lists and list constants for i32. Do you want anything to be changed? If not then I will go ahead and implement features for other element types in lists (except structures a.k.a dataclasses which I will do in a separate PR). Please let me know.


struct _lcompilers_list {
struct _lcompilers_list_info* info;
void *p;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would put n, capacity and type_size directly into _lcompilers_list. I would not use extra pointer indirection.

Copy link
Collaborator Author

@czgdp1807 czgdp1807 Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did this is because lists are shallow copied in python, so while shallow copying one list to another if you do the following,

dest->n = src->n;
dest->capacity = src->capacity;
dest->type_size = src->type_size;
dest->p = src->p;

and then call append on dest,

dest.append(1);

and then do,

print(src[0]);

then src->n, src->capacity, src->type_size would be unchanged and it would give Out of bounds error which would be incorrect from Python perspective. So, the question being, do we want to follow Python's rule of shallow copying lists or LPython should always deep copy lists. If you want to follow Python's rule then other than using another struct we can use array of 3 elements to store the information.

Python 3.8.9 (default, Apr 13 2022, 08:48:06) 
[Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> src = []
>>> dest = src
>>> dest.append(1)
>>> src
[1] # calling dest.append updates src as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I think this is the same issue as with NumPy arrays, which we implement using deep copy, not shallow copy.

I would also deep copy lists.

Note that your approach would not work with freeing the list, as you would not know when to free it. You would have to use reference counting. But that is slow.

So the approach to this is to always deep copy everything, and in order to achieve Python semantics (which we need), we need to disallow the following case:

>>> src = [] # fine
>>> dest = src # fine
>>> dest.append(1) # fine
>>> src # error --- LPython and CPython would differ here, so we must disallow at compile time

The above (in general) can only be made with reference counting (or garbage collection), which we can implement, but then the user would need to ask explicitly, something like:

>>> src = rcp([]) # fine
>>> dest = rcp(src) # fine
>>> dest.append(1) # fine
>>> src # fine

Here rcp would be a no-op in CPython, but in LPython it would use reference counted pointers to track src and dest, and both can then point to the same list, and they will be freed when the reference count reaches zero.

Copy link
Collaborator Author

@czgdp1807 czgdp1807 Jul 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well then this is not a problem anymore. Deep copy means 2 variables will diverge so doesn't matter what happens to the source variable. Just calling memcpy will work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that your approach would not work with freeing the list, as you would not know when to free it. You would have to use reference counting. But that is slow.

Yes a one of the reasons why one cannot do parallel programming in Python because of locks being applied on reference count attribute of Python objects. So deep copy should be the best way for compiled python.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. And later, we add an optimization ASR->ASR pass that rewrites this:

src = []
dest = src # deep copy
dest.append(1)
print(dest)

To this:

src = []
src.append(1)
print(src)

To avoid the extra copy.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note sure about removing dest altogether. May be someone might use dest and src as two separate lists?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In CPython with the above syntax (i.e., no explicit call to deepcopy) you can't use dest and src as two separate lists.

If there is explicit "copy" being done in CPython, then of course we can (and should) support that in LPython as well. That's not a problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. So you mean we eventually will follow CPython semantics by the above removal of dest variable and hence removing any kind of copy altogether. Makes sense.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@certik
Copy link
Contributor

certik commented Jul 28, 2022

Also, while we are refactoring this: I wonder if we should do it now and just implement this in LLVM itself, and not store type_size in the descriptor (since we know it at compile time), just a pointer, size and capacity. We pass the descriptor by reference (not by value), since inside the function, if we "append" to the list, we might need to reallocate the pointer. That should be very efficient and LLVM has a chance to further optimize things out. Since lists are used all over in Python, we want to make them as efficient as we possibly can. (I think dictionaries and sets can be implemented using the approach of this PR, but lists are used much more often, so they need to be top performing.)

@czgdp1807
Copy link
Collaborator Author

czgdp1807 commented Jul 28, 2022

Well implementing this in LLVM would be preferable (was my first choice but since I saw lists already implemented in C so refactored there only). However malloc calls will be made through C (for allocatable arrays we do the same, if LLVM provides an easy way to make malloc calls then I will use that). Let me know if you want LLVM implementation of lists in this PR or later and for now keep using the C implementation.

@certik
Copy link
Contributor

certik commented Jul 28, 2022

Yes, malloc must be called from C.

Since we are working on lists, let's get them done right, and implement them in LLVM. Whether it is easier to merge this PR, and then send another one, or directly, I'll leave up to you.

Copy link
Contributor

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine to merge as is. If it is easier for you to merge this, and then send another PR, then go ahead and do that.

@czgdp1807
Copy link
Collaborator Author

czgdp1807 commented Jul 28, 2022

I will do the LLVM implementation here. Shouldn't be that hard, just a bit lengthy but I would let it be. I agree that implementing in LLVM would be the best way to go as we implement arrays as well in LLVM so lists are just slightly different.

The point being the lists are anyways not supported to a usable extent so let's do it here and make them usable.

@certik
Copy link
Contributor

certik commented Jul 28, 2022

I think you can hide the LLVM code in some class or namespace in a separate header file, so that it's not a big deal I think. Yes, it might be maybe a couple hundreds lines, but it should not be that hard and it will be very high performing.

@czgdp1807
Copy link
Collaborator Author

Yes. I am planning to do something similar to what we did for arrays (a class with methods hiding the list manipulation/creation logic). By the way are we planning to support nested lists? Like, list[list[i32]]?

@certik
Copy link
Contributor

certik commented Jul 28, 2022

I think we should support nested lists.

@czgdp1807
Copy link
Collaborator Author

Great. Will take care of this while implementing lists in LLVM.

@czgdp1807 czgdp1807 requested a review from certik July 29, 2022 15:18
@czgdp1807
Copy link
Collaborator Author

@certik I have implemented lists in LLVM. Let me know if you need any change. If not then I will test for a bunch of other types and clean up the git history.

Copy link
Contributor

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks really good, well done.

@certik
Copy link
Contributor

certik commented Jul 29, 2022

This is the best way to do it. LLVM will be able to optimize things out in many cases. Later on, we can add more optimizations, such as predicting the size of the list and just allocating it initially with the right size, then we do not even need to check for resizing, etc.

@czgdp1807
Copy link
Collaborator Author

Great. I will clean up the git history by tomorrow and merge it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm LLVM related changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants