Skip to content

Conversation

voith
Copy link
Contributor

@voith voith commented Dec 10, 2016

Issue discussed here #211 (comment)
Todo List

  • Fix msgpack codec
  • Fix json codec
  • Integration test with Hbase backend(manually)

This PR fixes #211

Other things done in this besides the todo list:

  • Added two methods _convert and reconvert in json codec. These are needed as JSONEncoder accepts strings only as unicode. Method convert converts objects recursively to unicode and saves their type.
  • made the requirement of msgpack >=0.4 as only versions greater than 0.4 support the changes made in this PR.
  • fixed a buggy test case in test_message_bus_backend which got exposed after fixing the codecs.

@voith voith force-pushed the fix-msgpack-codec branch from 85bfcb3 to 9448930 Compare December 12, 2016 17:33
@voith
Copy link
Contributor Author

voith commented Dec 12, 2016

Build is still failing. need #235 to fix the build error.

@codecov-io
Copy link

codecov-io commented Dec 12, 2016

Codecov Report

Merging #233 into master will decrease coverage by -0.05%.

@@            Coverage Diff             @@
##           master     #233      +/-   ##
==========================================
- Coverage    70.2%   70.15%   -0.05%     
==========================================
  Files          68       68              
  Lines        4692     4715      +23     
  Branches      623      632       +9     
==========================================
+ Hits         3294     3308      +14     
- Misses       1258     1267       +9     
  Partials      140      140
Impacted Files Coverage Δ
frontera/contrib/backends/remote/codecs/msgpack.py 85.33% <92.85%> (+1.54%)
frontera/contrib/backends/remote/codecs/json.py 95.09% <97.87%> (+1.34%)
frontera/utils/misc.py 68.18% <ø> (-22.73%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba47302...9c1b873. Read the comment docs.

@voith
Copy link
Contributor Author

voith commented Dec 12, 2016

managed to get the build passing by temporarily merging #235. Here is the build https://travis-ci.org/scrapinghub/frontera/builds/183326040
The msg-pack and json codecs have been fixed. What is remaining now is to run an integration test manually on the Hbase Backend.

@voith voith force-pushed the fix-msgpack-codec branch from 2a5aeb0 to 409b49a Compare December 12, 2016 18:11
@voith voith changed the title [WIP] Fix codecs Fix codecs Dec 13, 2016
@voith voith force-pushed the fix-msgpack-codec branch from e8fb2e7 to 31777f6 Compare December 13, 2016 10:14
Copy link
Member

@sibiryakov sibiryakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution for this annoying problem, and I'm happy that it will be solved soon! Thank you.

from w3lib.util import to_unicode, to_bytes


def _convert(obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not entirely about conversion, but encoding of types preservation, I propose to change the name to _encode_recursively()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

return 'other', obj


def _reconvert(obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the same here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

elif hasattr(obj, '__dict__'):
return serialize(obj.__dict__)
else:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function skips silently the object if it's not possible to serialize. I propose to keep this behavior, but make it throw a warning.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this function back except with a minor change. I have excluded the encoding part for six.text_type as this will be handled now by msgpack


def encode_add_seeds(self, seeds):
return packb([b'as', [_prepare_request_message(seed) for seed in seeds]])
return packb([b'as', [_prepare_request_message(seed) for seed in seeds]], use_bin_type=True, encoding="utf-8")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to short msgpack-python documentation, it's recommended to use encoding argument for unpackb not for packb

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

super(Encoder, self).__init__(request_model, *a, **kw)

def encode(self, obj):
converted = _convert(obj)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently, only meta requires type information to be stored along with the content. Type information cause message size growth, and introduces cpu overhead on parsing. So, I suggest to reimplement this to make it efficient, the same is (and especially) true for msgpack. I don't care much about json codec performance, but msgpack is very important one to be efficient, because it's used in large deployments.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this error occur for headers, cookies or in fact all dict type objects? Right now issues are only reported for meta.

Also any suggestions to make this efficient for msgpack?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is something I haven't fixed yet. @Preetwinder Your input here would be of great value

enc = encoder(Request, send_body=True)
dec = decoder(Request, Response)
req = Request(url="http://www.yandex.ru",method=b'GET', meta={b"test": b"shmest"}, headers={b'reqhdr': b'value'})
req = Request(url="http://www.yandex.ru", method=b'GET',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to explicitly check the type for string objects in meta, to prove that your update codecs code really respects the type and preserves it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@sibiryakov
Copy link
Member

also @Preetwinder please have a look. Your opinion would much appreciated.

@voith
Copy link
Contributor Author

voith commented Dec 19, 2016

@Preetwinder Since this PR overwrites some of your changes, your input here would be highly appreciated.

@chazkii
Copy link

chazkii commented Dec 20, 2016

@voith Thanks for your hard work! I've reviewed the code and it looks good, like the idea of using tuples for the JSON encoding/decoding to reduce the payload size as opposed to a dict (or at least that's what I think you're doing).

@voith voith force-pushed the fix-msgpack-codec branch from a420692 to 9ac2b49 Compare December 22, 2016 08:55
@voith
Copy link
Contributor Author

voith commented Dec 22, 2016

@chuckus Thanks for your feedback. My idea to use tuples was to create a generator, But unfortunately, JSONEncoder doesn't support generators, so I had to use lists in some places. So the code doesn't really achieve the desired optimization.

@voith
Copy link
Contributor Author

voith commented Dec 22, 2016

rebased. @sibiryakov Can I get a second review? I have still not implemented your suggestion for optimization. I'm waiting for your reply on the comments that I have left behind in the code review

elif hasattr(obj, '__dict__'):
return serialize(obj.__dict__)
else:
return None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's make it generate warning, that object of some non-serializable type was skipped.

import pytest


def _compare_dicts(dict1, dict2):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there any reason to have such complex machinery for comparing dicts?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to be a little bit precise

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me it was not immediately obvious, but I think the main difference from built-in dict comparison is this line https://github.com/scrapinghub/frontera/pull/233/files#diff-252cc4202700183e8f0c782fb28dbfc2R34, so it will catch e.g. python 2 dicts with equal str and unicode values, int/float, etc.

dicts_are_equal = True
for key in dict1.keys():
if type(dict1[key]) is dict:
dicts_are_equal = dicts_are_equal and _compare_dicts(dict1[key], dict2[key])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here you could return False earlier in case of inequality, there is no need to iterate till the end

@sibiryakov
Copy link
Member

otherwise looks good!

@Preetwinder
Copy link
Contributor

LGTM.

@voith
Copy link
Contributor Author

voith commented Jan 3, 2017

@Preetwinder Thanks for your review. @sibiryakov Is there anything else that needs to be done to get this merged?

Copy link
Contributor

@Preetwinder Preetwinder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any changes needed in the documentation? For example here - http://frontera.readthedocs.io/en/latest/topics/frontier-objects.html#frontera.core.models.Request.meta

def _response_from_object(self, obj):
url = to_native_str(obj[b'url'])
url = obj['url']
request = self._request_model(url=url,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the keys guaranteed to be native strings for this and all the following obj accesses?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a look at this line https://github.com/scrapinghub/frontera/pull/233/files?diff=split#diff-6cd48fdd254557ffc1b50d60a0323364R154. The previous method called in the decoder was dict_to_bytes which was changing the native_string types to bytes. But the encoder was never changed and it used native_Strings. Since the encoder takes care that it uses native strings the decoder need not worry about any native_string conversion.

Copy link
Member

@sibiryakov sibiryakov Jan 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If url had type 'bytes' then after decoding it will have type bytes, and in Frontera URLs have to be native strings (when in process). so maybe we still need a preventive conversion to native strings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If url had type 'bytes' then after decoding it will have type bytes

You are right about the fact that this PR will save the type of the object it was originally given to the codec and on on decoding it will return the original type of value.

and in Frontera URLs have to be native strings

If this is the case than such type checking and conversion should be done somewhere else, Not In a codec.

so maybe we still need a preventive conversion to native strings

A codec should never change the type of the object it was originally inputed with. Forcing such a mechanism in codec is a hacky way of getting around this problem.

Copy link
Member

@sibiryakov sibiryakov Jan 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A codec should never change the type of the object it was originally inputed with. Forcing such a mechanism in codec is a hacky way of getting around this problem.

Agree, it's not a codec's responsibility. But then we would need to check the rest of the code and make sure we're using native strings for URLs. URLs could appear in crawling strategy (sw), backend, message bus or converters. Because such a change could break the code relying on URLs present as native strings.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, URLs shouldn't be a problem, they are guaranteed to be native strings. @voith I understand. The reason for using dict_to_bytes was that the JSON encoder returns unicode strings and we wanted meta, headers, etc to be bytes. But it seems the new function takes care of that.

elif isinstance(obj, (list, tuple)):
return type(obj).__name__, [_encode_recursively(item) for item in obj]
return 'other', obj

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't storing the types 'other, and 'dict' redundant, since the JSON encoder will take care of these types?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A dict can can have nested dicts or other kinds of objects. Since we are doing some kind of conversion here, we want to save the type here for successful de-conversion. Json Encoder taking care is not sufficient. We need to successfully Decode this to the original form for which we need to save type.
Although this implementation currently only converts strings, list and dicts it can be easily extended for something like sets.
If you have any other implementation in mind, please let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I borrowed this code from: https://github.com/samuel/python-bert/blob/master/bert/codec.py#L93-L123. I just simplified it for this use case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still it's not clear why 'dict' and 'other' is needed? Please be more specific. My concern is connected with integration with other components. Imagine if somebody else decide to implement this codec in other language. The less type-related information we store here, the easier it would be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay a bit hard to explain, But I'll try my best to explain through a series of examples. lets take Preetwinders suggestion and see the potential problems with that.

Isn't storing the types 'other, and 'dict' redundant, since the JSON encoder will take care of these types

If we had to implement this suggestion, the code would look somewhat like this.

def _encode_recursively(obj):
    if isinstance(obj, bytes):
        return ('bytes', to_unicode(obj))
    elif isinstance(obj, dict):
        return {_encode_recursively(k): _encode_recursively(v) for k, v in six.iteritems(obj)}
    elif isinstance(obj, (list, tuple)):
        return obj.__class__(_encode_recursively(item) for item in obj)
    return obj

now consider this:

In [59]: import json

In [60]: encoder = json.JSONEncoder()
    ...: decoder = json.JSONDecoder()
    ...:

In [61]: meta = {b'byte_key': 'str_value'}

In [62]: encoded_value = _encode_recursively(meta)  # using the above defined code

In [63]: print(encoded_value)
{('bytes', 'byte_key'): 'str_value'}

In [64]: encoder.encode(encoded_value) 
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-65-abb76ad6421f> in <module>()
----> 1 encoder.encode(encoded_value)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/encoder.py in encode(self, o)
    196         # exceptions aren't as detailed.  The list call should be roughly
    197         # equivalent to the PySequence_Fast that ''.join() would do.
--> 198         chunks = self.iterencode(o, _one_shot=True)
    199         if not isinstance(chunks, (list, tuple)):
    200             chunks = list(chunks)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/json/encoder.py in iterencode(self, o, _one_shot)
    254                 self.key_separator, self.item_separator, self.sort_keys,
    255                 self.skipkeys, _one_shot)
--> 256         return _iterencode(o, 0)
    257
    258 def _make_iterencode(markers, _default, _encoder, _indent, _floatstr,

TypeError: keys must be a string

So the goal here is to use a data-structure that will serialize well with JSONEncoder and also help store type.
After a lot of thinking and reference from this code I decided that tuple would be the best way to store this. But a thing to note here is that with this design, a dict object is converted to tuple (for serialization purpose) and its type has to be stored. Else the information for decoding will be lost.
Even a tuple has to be converted and its type should be stored so that we can clearly differentiate the tuple used by the encoder to one actually received and converted.
So the code I came up was to convert any basic python object to the form (object_type_originally, object_value_coverted) and to convert this recursively as objects can be nested.
Now the way this works for decoding is that, the decoder knows that the first object in this encoded object always contains type information, and that the second value will always contain the value which again should be decoded recursively to obtain successfully the original object.
I hope this atleast explain why type of dict has to be saved.

Copy link
Member

@sibiryakov sibiryakov Jan 24, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks for the explanation, now it's much clearer. Ok, let's leave the dict there. But could you leave the comment in the code explaining why it is, at least in short form?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, I get it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sibiryakov will add a docstring.

@voith
Copy link
Contributor Author

voith commented Jan 11, 2017

@Preetwinder thank you for your review. I will address your concerns tonight

@voith
Copy link
Contributor Author

voith commented Jan 16, 2017

sorry for keeping this PR on hold. Have been a little busy lately

@sibiryakov
Copy link
Member

NP, I had vacations too ;)


def _encode_recursively(obj):
"""
recursively encodes an object to unicode and store its type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a nitpick: you can't encode to unicode because unicode is not an encoding (at least in Python), you can only decode to unicode

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kmike Thanks for the review.
I had previously named this method convert: 33347d9#diff-6cd48fdd254557ffc1b50d60a0323364R12.

Side Note: I think I should start contributing to scrapy. There's a lot I can learn from your reviews :)

Copy link
Contributor Author

@voith voith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Preetwinder
I hope I've addressed your concerns.
I do agree that some doc changes are needed. But after having a look at #243, it seems like we need a lot of refactoring

elif isinstance(obj, (list, tuple)):
return type(obj).__name__, [_encode_recursively(item) for item in obj]
return 'other', obj

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, and I borrowed this code from: https://github.com/samuel/python-bert/blob/master/bert/codec.py#L93-L123. I just simplified it for this use case

def _response_from_object(self, obj):
url = to_native_str(obj[b'url'])
url = obj['url']
request = self._request_model(url=url,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have a look at this line https://github.com/scrapinghub/frontera/pull/233/files?diff=split#diff-6cd48fdd254557ffc1b50d60a0323364R154. The previous method called in the decoder was dict_to_bytes which was changing the native_string types to bytes. But the encoder was never changed and it used native_Strings. Since the encoder takes care that it uses native strings the decoder need not worry about any native_string conversion.

@voith
Copy link
Contributor Author

voith commented Jan 19, 2017

I've also tested this PR manually with the Hbase backend and it works fine for me

@voith
Copy link
Contributor Author

voith commented Jan 23, 2017

@sibiryakov Please don't merge this. I have to apply kmikes suggestion

@voith voith changed the title Fix codecs [WIP]Fix codecs Jan 23, 2017
@sibiryakov
Copy link
Member

I think we're close. Let's clarify all the issues and try to merge it.

@voith
Copy link
Contributor Author

voith commented Jan 23, 2017

@sibiryakov you're concerns are real. I'll try to address these issues when I get some more time. Finding it hard to manage time! But i'll get back to this as soon as possible

@voith
Copy link
Contributor Author

voith commented Jan 23, 2017

@sibiryakov I have tried my best to address your comments as Best as I possibly could. Please let me know if there's still any doubt. A a bit late here now(IST), so can't apply kmikes suggestion. But I hope I have atleast addressed your concerns.

@sibiryakov
Copy link
Member

Have you tried the crawling in distributed backends mode with Json codec @voith ?

@voith
Copy link
Contributor Author

voith commented Jan 24, 2017

@sibiryakov No I haven't. I only checked that its not blowing up anything. But I have tested msg-pack in distributed mode. I will let you know when I finish testing the json codec in distributed mode

@voith voith force-pushed the fix-msgpack-codec branch from 635388d to 29d0310 Compare January 29, 2017 20:20
@voith voith changed the title [WIP]Fix codecs Fix codecs Jan 29, 2017
@voith
Copy link
Contributor Author

voith commented Jan 30, 2017

I have changed the names of the methods _encode_recursively and _decode_recursively as pointed out by @kmike. I have also added a docstring explaining in brief the purpose of these methods and their design considerations.

But I didn't really get the time to test json-codec with the distributed backend. If anyone in the community could help me test it than it would be great. In any case it'll take some more time to test this thoroughly.



def _decode_recursively(obj):
def _revert_from_saved_type(obj):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO revert is used in the wrong context here http://dictionary.cambridge.org/dictionary/english/revert-to-sth, convert_from_saved_type would be better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will do

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renamed

@sibiryakov
Copy link
Member

sibiryakov commented Feb 3, 2017

OK. Final call before I merge it. @voith please rename the revert_* method name.
Anyone here has any ideas what it could break @lopuhin @kmike @Preetwinder ?

first element of the nested tuple as the original type information and
the second value to be the converted value. It applies the original type
recursively on the object to retrieve the original form of the object.
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super minor, but I would maybe add a check that obj has exactly two items, e.g. with obj_type, obj_value = obj, just in case some list is passed instead of an object returned by _convert_and_save_type

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lopuhin thanks for the review. Please see if this looks good to you

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks @voith! 👍

Copy link
Member

@lopuhin lopuhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! I didn't review changes related to msgpack as I don't know it good enough.

@voith
Copy link
Contributor Author

voith commented Feb 5, 2017

@sibiryakov Here's a suspicious line I found out. I haven't really tested it, but I have a feeling that this will break. I think as part of this PR some clean up is also needed. All strings related to scrapy should be native strings.

Also I did try to test json codec with the distributed backend. But I was having some problems with my hbase setup. I have recently shifted to mac and hbase-docker doesn't seem to work well with it. Allow me some time to work on this. But I can't promise anything. Or if anyone else has the time than fell free to continue this PR.

@sibiryakov
Copy link
Member

Yeah @voith I noticed that. But that was broken before you, and I don't see how it is connected to this PR.

@voith
Copy link
Contributor Author

voith commented Feb 6, 2017

@sibiryakov You're probably right. I wasn't entirely sure. But give me a day to manually test everything. I managed to setup hbase locally. I will try to verify it tonight and than will let you know

@sibiryakov
Copy link
Member

Sure @voith

@voith
Copy link
Contributor Author

voith commented Feb 6, 2017

@sibiryakov I have finished testing json codec in distributed mode. I can confirm that everything works. The only error I was getting has been fixed in #252. If i apply that patch to this branch than everything works fine. Also that patch has nothing to do with this PR. That bug existed before I made this PR. This PR should be merged after #252 is merged.

@voith
Copy link
Contributor Author

voith commented Feb 6, 2017

I expect some merge conflicts in this branch if #252 gets merged first. I will resolve them later

@sibiryakov
Copy link
Member

ok, rebase it pls.

voith added 2 commits February 9, 2017 11:01
added test for testing encoded unicode values in meta using msgpack codec

fixed message_bus_backend_test

added tests for _convert, _reconvert

make msgpack requirement >=0.4

made suggested changes

added warning message for non serializable objects and added an early return

renamed method names for encode-decode to convert-revert and added docstring to these methods

renamed methdod _revert_from_saved_type to _convert_from_saved_type

replaced revert with restores in a doc string

added assertion to check object length in _convert_from_saved_type

resolved merge conflicts
@voith voith force-pushed the fix-msgpack-codec branch from b7f5a2d to 9c1b873 Compare February 9, 2017 05:42
@voith
Copy link
Contributor Author

voith commented Feb 9, 2017

@sibiryakov I have rebased the PR and fixed merge conflicts.
I have also squashed the commits into two: one for msg-pack and the other for json

@sibiryakov sibiryakov merged commit a442055 into scrapinghub:master Feb 9, 2017
@sibiryakov
Copy link
Member

Congratulations and thank you @voith @Preetwinder and others! 🍺 🍺 🍕

@voith voith deleted the fix-msgpack-codec branch February 9, 2017 13:49
@voith
Copy link
Contributor Author

voith commented Feb 9, 2017

Thanks to everyone who reviewed this PR and helped me make the code better! :)

@sibiryakov
Copy link
Member

thank you for bearing all this!

@voith
Copy link
Contributor Author

voith commented Feb 9, 2017

No problem @sibiryakov. I personally learnt a lot from this PR.
In fact if any further issues arise due to this PR, I will fix them myself.

@voith voith changed the title Fix codecs Redesign codecs May 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

frontera converts keys in scrapy's response meta to bytes
7 participants