-
Notifications
You must be signed in to change notification settings - Fork 218
Closed
Description
while running crawler in https://github.com/sibiryakov/frontera-google I get the following error.
2016-10-04 23:38:19 [scrapy] ERROR: Spider error processing <GET http://zpravy.idnes.cz/madarsko-zmena-ustavy-zakaz-kvot-migrace-f1c-/zahranicni.aspx?c=A161004_175512_zahranicni_mlb> (referer: None)
Traceback (most recent call last):
File "/home/voith/Projects/frontera-google/env-3.5/lib/python3.5/site-packages/twisted/internet/defer.py", line 587, in _runCallbacks
current.result = callback(current.result, *args, **kw)
File "/home/voith/Projects/frontera-google/env-3.5/lib/python3.5/site-packages/scrapy/spiders/crawl.py", line 68, in _response_downloaded
rule = self._rules[response.meta['rule']]
KeyError: 'rule'
I ran the crawler without using frontera(only using scrapy) and it ran fine. That gave me a hint that frontera must be causing the error.
from the little debugging that I did, I figured out the following:
> /home/voith/Projects/frontera-google/env-3.5/lib/python3.5/site-packages/scrapy/spiders/crawl.py(68)_response_downloaded()
-> rule = self._rules[response.meta['rule']]
(Pdb) response.meta
{'download_timeout': 180.0, b'frontier_request': <Request at 0x7f1b39a9cc50 https://www.alza.cz/nejobchod meta={b'scrapy_callback': b'_response_downloaded', b'score': 0.16666666666666666, b'scrapy_errback': None, b'scrapy_meta': {b'rule': 0, b'frontier_request': <Request at 0x7f1b39a9cc50 https://www.alza.cz/nejobchod meta={...} body=... cookies={}, headers={}>, b'link_text': b'\r\n\r\n'}, b'domain': {b'netloc': b'www.alza.cz', b'tld': b'', b'subdomain': b'', b'fingerprint': b'4273593b6eda0469d3db862aa0fda848d7a89974', b'name': b'www.alza.cz', b'scheme': b'https', b'sld': b''}, b'fingerprint': b'7ee99dd119df89a7a103e1001b97f1255ea6e466', b'state': 1, b'origin_is_frontier': True, b'jid': 0} body=... cookies={}, headers={}>, b'rule': 0, 'download_latency': 0.33022451400756836, b'link_text': b'\r\n\r\n', 'download_slot': 'www.alza.cz'}
(Pdb) response.meta['rule']
*** KeyError: 'rule'
(Pdb) response.meta[b'rule']
0
Since the keys have been converted into bytes, they cannot be accessed by just using the key name.
I have yet to figure which part of the code is exactly causing this. If I figure it out, I'll definitely submit a fix for this. But I'm logging this for now so that it can be tracked.
Metadata
Metadata
Assignees
Labels
No labels