Closed
Description
When downloading page faild and retrying, the page.getRawText()
will be null. But when the parameter spawnUrl
is false, the Spider
will omit to retry. I consider spawnUrl
is not related to retry action.
I wonder whether it's right.
the processRequest
in Spider
protected void processRequest(Request request) {
Page page = downloader.download(request, this);
if (page == null) {
sleep(site.getSleepTime());
return;
}
// for cycle retry
if (page.getRawText() == null) {
extractAndAddRequests(page);
sleep(site.getSleepTime());
return;
}
...
}
the extractAndAddRequests
fuction:
protected void extractAndAddRequests(Page page) {
if (spawnUrl && CollectionUtils.isNotEmpty(page.getTargetRequests())) {
for (Request request : page.getTargetRequests()) {
addRequest(request);
}
}
}