Skip to content

bugfix: fixed incorrect bytestring encoding PlutusData #269

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Oct 13, 2023
27 changes: 27 additions & 0 deletions pycardano/serialization.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import re
import typing
from collections import OrderedDict, UserList, defaultdict
from collections.abc import Sequence
from copy import deepcopy
from dataclasses import Field, dataclass, fields
from datetime import datetime
Expand Down Expand Up @@ -60,6 +61,20 @@ class IndefiniteFrozenList(FrozenList, IndefiniteList): # type: ignore
pass


@dataclass
class ByteString:
value: bytes

def __hash__(self):
return hash(self.value)

def __eq__(self, other: Union[bytes, ByteString]):
if isinstance(other, ByteString):
return self.value == other.value
else:
return self.value == other


@dataclass
class RawCBOR:
"""A wrapper class for bytes that represents a CBOR value."""
Expand Down Expand Up @@ -160,6 +175,7 @@ def default_encoder(
assert isinstance(
value,
(
ByteString,
CBORSerializable,
IndefiniteList,
RawCBOR,
Expand All @@ -178,6 +194,15 @@ def default_encoder(
for item in value:
encoder.encode(item)
encoder.write(b"\xff")
elif isinstance(value, ByteString):
if len(value.value) > 64:
encoder.write(b"\x5f")
for i in range(0, len(value.value), 64):
imax = min(i + 64, len(value.value))
encoder.encode(value.value[i:imax])
encoder.write(b"\xff")
else:
encoder.encode(value.value)
elif isinstance(value, RawCBOR):
encoder.write(value.cbor)
elif isinstance(value, FrozenList):
Expand Down Expand Up @@ -240,6 +265,8 @@ def to_primitive(self) -> Primitive:
def _dfs(value, freeze=False):
if isinstance(value, CBORSerializable):
return _dfs(value.to_primitive(), freeze)
elif isinstance(value, bytes):
return ByteString(value)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it seems incorrect to replace every bytes with ByteString. Instead, we just offer ByteString for users to use in PlutusData or Metadata.
For some internal implementation that generates bytes as intermediate values, e.g. script_data_hash, we don't want to change its type arbitrarily.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be implementable by a simple parameter?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was one of my original points and how I originally had it implemented.

Can someone please make a definitive final decision so I can fix and be done? I've implemented and reimplemented this multiple times.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nielstron Could you elaborate how adding a parameter will work?

Copy link
Contributor

@nielstron nielstron Oct 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might not work as straightforward as I imagined it. @theeldermillenial was right, maybe we should just roll with the initial design. I appreciate the excourse though because now we know precisely which bytes to encode this way 😅 sorry for the divergence.

maybe we can document this (and ideally find some supporting documentation on the discrepancy in the implementation)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preferred approach is to offer users ByteString class to use, which the encoder can automatically break it down to byte array. If a bytes object is found longer than 64 instead, pycardano should raise an exception and recommend users to use ByteString.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cffls But should this error only be thrown for PolusData and Metadata? Or should we apply it globally?

My two cents is "only implement exactly what is defined". The bytes length restriction appears to be limited to "metadata", so maybe we only apply it to metadata.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have this check for metadata: https://github.com/Python-Cardano/pycardano/blob/main/pycardano/metadata.py#L40-L49

I thought this should be also enforced in PlutusData, which was the reason why this PR was raised. If not, I am fine with only providing ByteString as an option for user in this PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My apologies. I misspoke. When I said Metadata, I also meant PlutusData.

Also, I now see exactly what you're saying, and I think you're solution makes the most sense. You are saying we should inject a check for long byte strings in PlutusData and throw an error similar to what is seen in Metadata. Part of that error message should indicate the user can use the new ByteString class to allow longer bytes.

I think this is the most transparent approach, and it keeps in line with what I see to be pycardano's philosophy of being very unbiased.

If this is what you mean, I'll make the changes and we can be done. I will revert any hashes I altered, since this should really only affect the test I created. If there's any other unit tests you would like to see, I'm happy to add them.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is exactly what I meant. Please go ahead with this approach. Thank you for confirming. ☺️

elif isinstance(value, (dict, OrderedDict, defaultdict)):
_dict = type(value)()
if hasattr(value, "default_factory"):
Expand Down
45 changes: 43 additions & 2 deletions test/pycardano/test_plutus.py
Original file line number Diff line number Diff line change
Expand Up @@ -245,7 +245,7 @@ def test_redeemer_empty_datum():


def test_cost_model():
assert (
print(
"a141005901d59f1a000302590001011a00060bc719026d00011a000249f01903e800011"
"a000249f018201a0025cea81971f70419744d186419744d186419744d186419744d1864"
"19744d186419744d18641864186419744d18641a000249f018201a000249f018201a000"
Expand All @@ -259,7 +259,26 @@ def test_cost_model():
"7e2318760001011a000242201a00067e2318760001011a0025cea81971f704001a00014"
"1bb041a000249f019138800011a000249f018201a000302590001011a000249f018201a"
"000249f018201a000249f018201a000249f018201a000249f018201a000249f018201a0"
"00249f018201a00330da70101ff" == COST_MODELS.to_cbor_hex()
"00249f018201a00330da70101ff"
)
print()
print(COST_MODELS.to_cbor_hex())
assert (
"a141005f58409f1a000302590001011a00060bc719026d00011a000249f01903e800011"
"a000249f018201a0025cea81971f70419744d186419744d186419744d186419744d1858"
"406419744d186419744d18641864186419744d18641a000249f018201a000249f018201"
"a000249f018201a000249f01903e800011a000249f018201a000249f019584003e80008"
"1a000242201a00067e2318760001011a000249f01903e800081a000249f01a0001b7981"
"8f7011a000249f0192710011a0002155e19052e011903e81a5840000249f01903e8011a"
"000249f018201a000249f018201a000249f0182001011a000249f0011a000249f0041a0"
"00194af18f8011a000194af18f8011a0002377c5840190556011a0002bdea1901f1011a"
"000249f018201a000249f018201a000249f018201a000249f018201a000249f018201a0"
"00249f018201a000242201a00067e584023187600010119f04c192bd200011a000249f0"
"18201a000242201a00067e2318760001011a000242201a00067e2318760001011a0025c"
"ea81971f704001a0001584041bb041a000249f019138800011a000249f018201a000302"
"590001011a000249f018201a000249f018201a000249f018201a000249f018201a00024"
"9f018201a55000249f018201a000249f018201a00330da70101ffff"
== COST_MODELS.to_cbor_hex()
)


Expand Down Expand Up @@ -396,3 +415,25 @@ class A(PlutusData):
assert (
res == res2
), "Same class has different default constructor id in two consecutive runs"


def test_plutus_data_long_bytes():
@dataclass
class A(PlutusData):
a: bytes

quote = (
"The line separating good and evil passes ... right through every human heart."
)

quote_hex = (
"d866821a51e835649f5f5840546865206c696e652073657061726174696e6720676f6f6420616e"
+ "64206576696c20706173736573202e2e2e207269676874207468726f7567682065766572794d"
+ "2068756d616e2068656172742effff"
)

A_tmp = A(quote.encode())

assert (
A_tmp.to_cbor_hex() == quote_hex
), "Long metadata bytestring is encoded incorrectly."
2 changes: 1 addition & 1 deletion test/pycardano/test_util.py
Original file line number Diff line number Diff line change
Expand Up @@ -149,7 +149,7 @@ def test_script_data_hash():
redeemers = [Redeemer(unit, ExecutionUnits(1000000, 1000000))]
redeemers[0].tag = RedeemerTag.SPEND
assert ScriptDataHash.from_primitive(
"032d812ee0731af78fe4ec67e4d30d16313c09e6fb675af28f825797e8b5621d"
"b11ed6f6046df925b6409b850ac54a829cd1e7603145c9aaf765885d8ec64da7"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this should change. If we use write the same test in Haskell, it would generate the same hash.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think as you noticed in the other comment, this changes because all bytes are being encoded the same way per nielstrons suggestion.

Your comment makes sense. If we only change encoding in metadata/plutusdata, then the hash would not change.

) == script_data_hash(redeemers=redeemers, datums=[unit])


Expand Down