Scaling up lists to different types and supporting `ListConstant` in LLVM #835

czgdp1807 · 2022-07-28T10:37:25Z

No description provided.

czgdp1807 · 2022-07-28T15:10:53Z

@certik I have revamped lists to support lists and list constants for i32. Do you want anything to be changed? If not then I will go ahead and implement features for other element types in lists (except structures a.k.a dataclasses which I will do in a separate PR). Please let me know.

certik · 2022-07-28T16:35:11Z

src/runtime/impure/lfortran_intrinsics.c

+
+struct _lcompilers_list {
+    struct _lcompilers_list_info* info;
+    void *p;


I would put n, capacity and type_size directly into _lcompilers_list. I would not use extra pointer indirection.

The reason I did this is because lists are shallow copied in python, so while shallow copying one list to another if you do the following,

dest->n = src->n; dest->capacity = src->capacity; dest->type_size = src->type_size; dest->p = src->p;

and then call append on dest,

dest.append(1);

and then do,

print(src[0]);

then src->n, src->capacity, src->type_size would be unchanged and it would give Out of bounds error which would be incorrect from Python perspective. So, the question being, do we want to follow Python's rule of shallow copying lists or LPython should always deep copy lists. If you want to follow Python's rule then other than using another struct we can use array of 3 elements to store the information.

Python 3.8.9 (default, Apr 13 2022, 08:48:06) [Clang 13.1.6 (clang-1316.0.21.2.5)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> src = [] >>> dest = src >>> dest.append(1) >>> src [1] # calling dest.append updates src as well.

I see. I think this is the same issue as with NumPy arrays, which we implement using deep copy, not shallow copy.

I would also deep copy lists.

Note that your approach would not work with freeing the list, as you would not know when to free it. You would have to use reference counting. But that is slow.

So the approach to this is to always deep copy everything, and in order to achieve Python semantics (which we need), we need to disallow the following case:

>>> src = [] # fine >>> dest = src # fine >>> dest.append(1) # fine >>> src # error --- LPython and CPython would differ here, so we must disallow at compile time

The above (in general) can only be made with reference counting (or garbage collection), which we can implement, but then the user would need to ask explicitly, something like:

>>> src = rcp([]) # fine >>> dest = rcp(src) # fine >>> dest.append(1) # fine >>> src # fine

Here rcp would be a no-op in CPython, but in LPython it would use reference counted pointers to track src and dest, and both can then point to the same list, and they will be freed when the reference count reaches zero.

Well then this is not a problem anymore. Deep copy means 2 variables will diverge so doesn't matter what happens to the source variable. Just calling memcpy will work.

Note that your approach would not work with freeing the list, as you would not know when to free it. You would have to use reference counting. But that is slow.

Yes a one of the reasons why one cannot do parallel programming in Python because of locks being applied on reference count attribute of Python objects. So deep copy should be the best way for compiled python.

Right. And later, we add an optimization ASR->ASR pass that rewrites this:

src = [] dest = src # deep copy dest.append(1) print(dest)

To this:

src = [] src.append(1) print(src)

To avoid the extra copy.

Note sure about removing dest altogether. May be someone might use dest and src as two separate lists?

In CPython with the above syntax (i.e., no explicit call to deepcopy) you can't use dest and src as two separate lists.

If there is explicit "copy" being done in CPython, then of course we can (and should) support that in LPython as well. That's not a problem.

I see. So you mean we eventually will follow CPython semantics by the above removal of dest variable and hence removing any kind of copy altogether. Makes sense.

I documented this at https://github.com/lcompilers/lpython/wiki/LPython-Semantics

certik · 2022-07-28T16:43:21Z

Also, while we are refactoring this: I wonder if we should do it now and just implement this in LLVM itself, and not store type_size in the descriptor (since we know it at compile time), just a pointer, size and capacity. We pass the descriptor by reference (not by value), since inside the function, if we "append" to the list, we might need to reallocate the pointer. That should be very efficient and LLVM has a chance to further optimize things out. Since lists are used all over in Python, we want to make them as efficient as we possibly can. (I think dictionaries and sets can be implemented using the approach of this PR, but lists are used much more often, so they need to be top performing.)

czgdp1807 · 2022-07-28T16:46:30Z

Well implementing this in LLVM would be preferable (was my first choice but since I saw lists already implemented in C so refactored there only). However malloc calls will be made through C (for allocatable arrays we do the same, if LLVM provides an easy way to make malloc calls then I will use that). Let me know if you want LLVM implementation of lists in this PR or later and for now keep using the C implementation.

certik · 2022-07-28T16:53:09Z

Yes, malloc must be called from C.

Since we are working on lists, let's get them done right, and implement them in LLVM. Whether it is easier to merge this PR, and then send another one, or directly, I'll leave up to you.

certik

This is fine to merge as is. If it is easier for you to merge this, and then send another PR, then go ahead and do that.

czgdp1807 · 2022-07-28T16:54:38Z

I will do the LLVM implementation here. Shouldn't be that hard, just a bit lengthy but I would let it be. I agree that implementing in LLVM would be the best way to go as we implement arrays as well in LLVM so lists are just slightly different.

The point being the lists are anyways not supported to a usable extent so let's do it here and make them usable.

certik · 2022-07-28T17:05:34Z

I think you can hide the LLVM code in some class or namespace in a separate header file, so that it's not a big deal I think. Yes, it might be maybe a couple hundreds lines, but it should not be that hard and it will be very high performing.

czgdp1807 · 2022-07-28T17:08:39Z

Yes. I am planning to do something similar to what we did for arrays (a class with methods hiding the list manipulation/creation logic). By the way are we planning to support nested lists? Like, list[list[i32]]?

certik · 2022-07-28T17:11:08Z

I think we should support nested lists.

czgdp1807 · 2022-07-28T17:12:00Z

Great. Will take care of this while implementing lists in LLVM.

czgdp1807 · 2022-07-29T15:19:23Z

@certik I have implemented lists in LLVM. Let me know if you need any change. If not then I will test for a bunch of other types and clean up the git history.

certik

I think this looks really good, well done.

certik · 2022-07-29T16:30:10Z

This is the best way to do it. LLVM will be able to optimize things out in many cases. Later on, we can add more optimizations, such as predicting the size of the list and just allocating it initially with the right size, then we do not even need to check for resizing, etc.

czgdp1807 · 2022-07-29T16:34:23Z

Great. I will clean up the git history by tomorrow and merge it.

…mespace

…amespace

czgdp1807 added the llvm LLVM related changes label Jul 28, 2022

czgdp1807 marked this pull request as ready for review July 28, 2022 15:08

czgdp1807 requested a review from certik July 28, 2022 15:08

certik reviewed Jul 28, 2022

View reviewed changes

certik approved these changes Jul 28, 2022

View reviewed changes

czgdp1807 requested a review from certik July 29, 2022 15:18

certik approved these changes Jul 29, 2022

View reviewed changes

czgdp1807 added 10 commits July 29, 2022 22:08

Made test_list_01 robust and natural

0d1e5b9

Added get_type_code for encoding types for caching

46c3808

Added LLVMList, lfortran_realloc and moved lfortran_malloc to LLVM na…

ccf3b22

…mespace

Removed lfortran_malloc from array utils

e4737c5

Implemented LLVMLists, lfortran_malloc and lfortran_realloc

d7030c8

Removed lfortran_malloc and update use lfortran_malloc from correct n…

60ee963

…amespace

Use LLVM list implementation in LLVM backend

32e1d0a

Added _lfortran_relloc and removed list implementation from C runtime

f3812b8

Test for f64 typed lists

f7fe3ff

Update capacity after resizing

cf91d9d

czgdp1807 force-pushed the list01 branch from b9c7768 to cf91d9d Compare July 29, 2022 17:11

czgdp1807 enabled auto-merge July 29, 2022 17:18

czgdp1807 merged commit fd17b7d into lcompilers:main Jul 29, 2022

czgdp1807 deleted the list01 branch July 29, 2022 17:37

czgdp1807 mentioned this pull request Jul 30, 2022

[Draft] Add str to visit_ListItem #800

Closed

certik mentioned this pull request Jul 31, 2022

List append benchmark #857

Open

czgdp1807 mentioned this pull request Aug 4, 2022

Raise error while changing immutable objects #878

Merged

czgdp1807 mentioned this pull request Aug 13, 2022

Supporting list of lists, list of tuples and tuples with lists #952

Merged

Scaling up lists to different types and supporting ListConstant in LLVM #835

Scaling up lists to different types and supporting ListConstant in LLVM #835

Uh oh!

Conversation

czgdp1807 commented Jul 28, 2022

Uh oh!

czgdp1807 commented Jul 28, 2022

Uh oh!

Choose a reason for hiding this comment

Uh oh!

czgdp1807 Jul 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

czgdp1807 Jul 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

certik commented Jul 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

czgdp1807 commented Jul 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

certik commented Jul 28, 2022

Uh oh!

certik left a comment

Choose a reason for hiding this comment

Uh oh!

czgdp1807 commented Jul 28, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

certik commented Jul 28, 2022

Uh oh!

czgdp1807 commented Jul 28, 2022

Uh oh!

certik commented Jul 28, 2022

Uh oh!

czgdp1807 commented Jul 28, 2022

Uh oh!

czgdp1807 commented Jul 29, 2022

Uh oh!

certik left a comment

Choose a reason for hiding this comment

Uh oh!

certik commented Jul 29, 2022

Uh oh!

czgdp1807 commented Jul 29, 2022

Uh oh!

Uh oh!

Scaling up lists to different types and supporting `ListConstant` in LLVM #835

Scaling up lists to different types and supporting `ListConstant` in LLVM #835

czgdp1807 Jul 28, 2022 •

edited

Loading

czgdp1807 Jul 28, 2022 •

edited

Loading

certik commented Jul 28, 2022 •

edited

Loading

czgdp1807 commented Jul 28, 2022 •

edited

Loading

czgdp1807 commented Jul 28, 2022 •

edited

Loading