Skip to content

Small improvements to the dictionary compression #3

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 13, 2020

Conversation

ciscorn
Copy link

@ciscorn ciscorn commented Sep 13, 2020

Many thanks for implementing my suggestion! It looks great.

I've modified some parts to do further optimization:

  • Completely removes the binary/linear search by looking up the canonical dict.
  • Precompute the end positions of each word at compile time to remove the runtime wlen reduction.

@jepler
Copy link
Owner

jepler commented Sep 13, 2020

Nice, it seems to save an additional 40 bytes or so on trinket m0 english translation. Thanks @ciscorn

@jepler jepler merged commit 9abfc51 into jepler:better-dictionary-compression Sep 13, 2020
jepler added a commit that referenced this pull request Oct 1, 2020
It was incorrect to NULL out the pointer to our heap allocated buffer in
`reset`, because subsequent to framebuffer_reset, but while
the heap was still active, we could call `get_bufinfo` again,
leading to a fresh allocation on the heap that is about to be destroyed.

Typical stack trace:
```
#1  0x0006c368 in sharpdisplay_framebuffer_get_bufinfo
#2  0x0006ad6e in _refresh_display
#3  0x0006b168 in framebufferio_framebufferdisplay_background
#4  0x00069d22 in displayio_background
adafruit#5  0x00045496 in supervisor_background_tasks
adafruit#6  0x000446e8 in background_callback_run_all
adafruit#7  0x00045546 in supervisor_run_background_tasks_if_tick
adafruit#8  0x0005b042 in common_hal_neopixel_write
adafruit#9  0x00044c4c in clear_temp_status
adafruit#10 0x000497de in spi_flash_flush_keep_cache
adafruit#11 0x00049a66 in supervisor_external_flash_flush
adafruit#12 0x00044b22 in supervisor_flash_flush
adafruit#13 0x0004490e in filesystem_flush
adafruit#14 0x00043e18 in cleanup_after_vm
adafruit#15 0x0004414c in run_repl
adafruit#16 0x000441ce in main
```
When this happened -- which was inconsistent -- the display would keep
some heap allocation across reset which is exactly what we need to avoid.

NULLing the pointer in reconstruct follows what RGBMatrix does, and that
code is a bit more battle-tested anyway.

If I had a motivation for structuring the SharpMemory code differently,
I can no longer recall it.

Testing performed: Ran my complicated calculator program over multiple
iterations without observing signs of heap corruption.

Closes: adafruit#3473
jepler pushed a commit that referenced this pull request Jan 4, 2021
jepler pushed a commit that referenced this pull request Feb 10, 2021
jepler pushed a commit that referenced this pull request Apr 28, 2021
jepler added a commit that referenced this pull request May 10, 2021
asan considers that memcmp(p, q, N) is permitted to access N bytes at
each of p and q, even for values of p and q that have a difference
earlier.  Accessing additional values is frequently done in practice,
reading 4 or more bytes from each input at a time for efficiency, so
when completing "non_exist<TAB>" in the repl, this causes a diagnostic:

```
==16938==ERROR: AddressSanitizer: global-buffer-overflow on address 0x555555cd8dc8 at pc 0x7ffff726457b bp 0x7fffffffda20 sp 0x7fffffffd1d0
READ of size 9 at 0x555555cd8dc8 thread T0
    #0 0x7ffff726457a  (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a)
    #1 0x555555b0e82a in mp_repl_autocomplete ../../py/repl.c:301
    #2 0x555555c89585 in readline_process_char ../../lib/mp-readline/readline.c:225
    #3 0x555555c8ac6e in readline ../../lib/mp-readline/readline.c:513
    #4 0x555555b8dcbd in do_repl /home/jepler/src/micropython/ports/unix/main.c:194
    adafruit#5 0x555555b90859 in main_ /home/jepler/src/micropython/ports/unix/main.c:673
    adafruit#6 0x555555b90a3a in main /home/jepler/src/micropython/ports/unix/main.c:436
    adafruit#7 0x7ffff619a09a in __libc_start_main ../csu/libc-start.c:308
    adafruit#8 0x55555595fd69 in _start (/home/jepler/src/micropython/ports/unix/micropython-coverage+0x40bd69)

0x555555cd8dc8 is located 0 bytes to the right of global variable 'import_str' defined in '../../py/repl.c:285:23' (0x555555cd8dc0) of size 8
  'import_str' is ascii string 'import '
SUMMARY: AddressSanitizer: global-buffer-overflow (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a)
Shadow bytes around the buggy address:
  0x0aab2ab93160: 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9
  0x0aab2ab93170: 05 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9
  0x0aab2ab93180: 06 f9 f9 f9 f9 f9 f9 f9 06 f9 f9 f9 f9 f9 f9 f9
  0x0aab2ab93190: 05 f9 f9 f9 f9 f9 f9 f9 00 00 f9 f9 f9 f9 f9 f9
  0x0aab2ab931a0: 00 00 00 00 00 00 00 00 04 f9 f9 f9 f9 f9 f9 f9
=>0x0aab2ab931b0: 00 00 00 00 00 00 00 00 00[f9]f9 f9 f9 f9 f9 f9
  0x0aab2ab931c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0aab2ab931d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0aab2ab931e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 f9
  0x0aab2ab931f0: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 f9 f9 f9
  0x0aab2ab93200: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==16938==ABORTING
```
jepler added a commit that referenced this pull request May 10, 2021
asan considers that memcmp(p, q, N) is permitted to access N bytes at
each of p and q, even for values of p and q that have a difference
earlier.  Accessing additional values is frequently done in practice,
reading 4 or more bytes from each input at a time for efficiency, so
when completing "non_exist<TAB>" in the repl, this causes a diagnostic:

```
==16938==ERROR: AddressSanitizer: global-buffer-overflow on address 0x555555cd8dc8 at pc 0x7ffff726457b bp 0x7fffffffda20 sp 0x7fffffffd1d0
READ of size 9 at 0x555555cd8dc8 thread T0
    #0 0x7ffff726457a  (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a)
    #1 0x555555b0e82a in mp_repl_autocomplete ../../py/repl.c:301
    #2 0x555555c89585 in readline_process_char ../../lib/mp-readline/readline.c:225
    #3 0x555555c8ac6e in readline ../../lib/mp-readline/readline.c:513
    #4 0x555555b8dcbd in do_repl /home/jepler/src/micropython/ports/unix/main.c:194
    adafruit#5 0x555555b90859 in main_ /home/jepler/src/micropython/ports/unix/main.c:673
    adafruit#6 0x555555b90a3a in main /home/jepler/src/micropython/ports/unix/main.c:436
    adafruit#7 0x7ffff619a09a in __libc_start_main ../csu/libc-start.c:308
    adafruit#8 0x55555595fd69 in _start (/home/jepler/src/micropython/ports/unix/micropython-coverage+0x40bd69)

0x555555cd8dc8 is located 0 bytes to the right of global variable 'import_str' defined in '../../py/repl.c:285:23' (0x555555cd8dc0) of size 8
  'import_str' is ascii string 'import '
SUMMARY: AddressSanitizer: global-buffer-overflow (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a)
Shadow bytes around the buggy address:
  0x0aab2ab93160: 04 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9
  0x0aab2ab93170: 05 f9 f9 f9 f9 f9 f9 f9 04 f9 f9 f9 f9 f9 f9 f9
  0x0aab2ab93180: 06 f9 f9 f9 f9 f9 f9 f9 06 f9 f9 f9 f9 f9 f9 f9
  0x0aab2ab93190: 05 f9 f9 f9 f9 f9 f9 f9 00 00 f9 f9 f9 f9 f9 f9
  0x0aab2ab931a0: 00 00 00 00 00 00 00 00 04 f9 f9 f9 f9 f9 f9 f9
=>0x0aab2ab931b0: 00 00 00 00 00 00 00 00 00[f9]f9 f9 f9 f9 f9 f9
  0x0aab2ab931c0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0aab2ab931d0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0aab2ab931e0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 f9
  0x0aab2ab931f0: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 f9 f9 f9
  0x0aab2ab93200: f9 f9 f9 f9 00 00 00 00 00 00 00 00 00 00 00 00
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==16938==ABORTING
```

Signed-off-by: Jeff Epler <[email protected]>
jepler added a commit that referenced this pull request May 10, 2021
asan considers that memcmp(p, q, N) is permitted to access N bytes at
each of p and q, even for values of p and q that have a difference
earlier.  Accessing additional values is frequently done in practice,
reading 4 or more bytes from each input at a time for efficiency, so
when completing "non_exist<TAB>" in the repl, this causes a diagnostic:

```
==16938==ERROR: AddressSanitizer: global-buffer-overflow on
address 0x555555cd8dc8 at pc 0x7ffff726457b bp 0x7fffffffda20 sp 0x7fff
READ of size 9 at 0x555555cd8dc8 thread T0
    #0 0x7ffff726457a  (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a)
    #1 0x555555b0e82a in mp_repl_autocomplete ../../py/repl.c:301
    #2 0x555555c89585 in readline_process_char ../../lib/mp-readline/re
    #3 0x555555c8ac6e in readline ../../lib/mp-readline/readline.c:513
    #4 0x555555b8dcbd in do_repl /home/jepler/src/micropython/ports/uni
    adafruit#5 0x555555b90859 in main_ /home/jepler/src/micropython/ports/unix/
    adafruit#6 0x555555b90a3a in main /home/jepler/src/micropython/ports/unix/m
    adafruit#7 0x7ffff619a09a in __libc_start_main ../csu/libc-start.c:308
    adafruit#8 0x55555595fd69 in _start (/home/jepler/src/micropython/ports/uni

0x555555cd8dc8 is located 0 bytes to the right of global variable
'import_str' defined in '../../py/repl.c:285:23' (0x555555cd8dc0) of
size 8
  'import_str' is ascii string 'import '
```

Signed-off-by: Jeff Epler <[email protected]>
jepler added a commit that referenced this pull request May 10, 2021
asan considers that memcmp(p, q, N) is permitted to access N bytes at
each of p and q, even for values of p and q that have a difference
earlier.  Accessing additional values is frequently done in practice,
reading 4 or more bytes from each input at a time for efficiency, so
when completing "non_exist<TAB>" in the repl, this causes a diagnostic:

```
==16938==ERROR: AddressSanitizer: global-buffer-overflow on
address 0x555555cd8dc8 at pc 0x7ffff726457b bp 0x7fffffffda20 sp 0x7fff
READ of size 9 at 0x555555cd8dc8 thread T0
    #0 0x7ffff726457a  (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a)
    #1 0x555555b0e82a in mp_repl_autocomplete ../../py/repl.c:301
    #2 0x555555c89585 in readline_process_char ../../lib/mp-readline/re
    #3 0x555555c8ac6e in readline ../../lib/mp-readline/readline.c:513
    #4 0x555555b8dcbd in do_repl /home/jepler/src/micropython/ports/uni
    adafruit#5 0x555555b90859 in main_ /home/jepler/src/micropython/ports/unix/
    adafruit#6 0x555555b90a3a in main /home/jepler/src/micropython/ports/unix/m
    adafruit#7 0x7ffff619a09a in __libc_start_main ../csu/libc-start.c:308
    adafruit#8 0x55555595fd69 in _start (/home/jepler/src/micropython/ports/uni

0x555555cd8dc8 is located 0 bytes to the right of global variable
'import_str' defined in '../../py/repl.c:285:23' (0x555555cd8dc0) of
size 8
  'import_str' is ascii string 'import '
```

Signed-off-by: Jeff Epler <[email protected]>
jepler added a commit that referenced this pull request Jun 8, 2021
asan considers that memcmp(p, q, N) is permitted to access N bytes at each
of p and q, even for values of p and q that have a difference earlier.
Accessing additional values is frequently done in practice, reading 4 or
more bytes from each input at a time for efficiency, so when completing
"non_exist<TAB>" in the repl, this causes a diagnostic:

    ==16938==ERROR: AddressSanitizer: global-buffer-overflow on
    address 0x555555cd8dc8 at pc 0x7ffff726457b bp 0x7fffffffda20 sp 0x7fff
    READ of size 9 at 0x555555cd8dc8 thread T0
        #0 0x7ffff726457a  (/usr/lib/x86_64-linux-gnu/libasan.so.5+0xb857a)
        #1 0x555555b0e82a in mp_repl_autocomplete ../../py/repl.c:301
        #2 0x555555c89585 in readline_process_char ../../lib/mp-readline/re
        #3 0x555555c8ac6e in readline ../../lib/mp-readline/readline.c:513
        #4 0x555555b8dcbd in do_repl /home/jepler/src/micropython/ports/uni
        adafruit#5 0x555555b90859 in main_ /home/jepler/src/micropython/ports/unix/
        adafruit#6 0x555555b90a3a in main /home/jepler/src/micropython/ports/unix/m
        adafruit#7 0x7ffff619a09a in __libc_start_main ../csu/libc-start.c:308
        adafruit#8 0x55555595fd69 in _start (/home/jepler/src/micropython/ports/uni

    0x555555cd8dc8 is located 0 bytes to the right of global variable
    'import_str' defined in '../../py/repl.c:285:23' (0x555555cd8dc0) of
    size 8
      'import_str' is ascii string 'import '

Signed-off-by: Jeff Epler <[email protected]>
jepler pushed a commit that referenced this pull request Jan 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants