Skip to content

A little confused about the parameter spawnUrl of Spider #62

Closed
@yxssfxwzy

Description

@yxssfxwzy

When downloading page faild and retrying, the page.getRawText() will be null. But when the parameter spawnUrl is false, the Spider will omit to retry. I consider spawnUrl is not related to retry action.
I wonder whether it's right.

the processRequest in Spider

protected void processRequest(Request request) {
        Page page = downloader.download(request, this);
        if (page == null) {
            sleep(site.getSleepTime());
            return;
        }
        // for cycle retry
        if (page.getRawText() == null) {
            extractAndAddRequests(page);
            sleep(site.getSleepTime());
            return;
        }
        ...
}

the extractAndAddRequests fuction:

protected void extractAndAddRequests(Page page) {
  if (spawnUrl && CollectionUtils.isNotEmpty(page.getTargetRequests())) {
    for (Request request : page.getTargetRequests()) {
      addRequest(request);
    }
  }
}

Metadata

Metadata

Assignees

Labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions