Skip to content

[Important] Worker thread will block main loop/主分发线程会被工作线程阻塞 #110

Closed
@code4craft

Description

@code4craft

In Spider, the main loop poll all urls from scheduler and dispatch them to worker thread. But in the threadpool ExecutorService, there is a bug:

public static ExecutorService newFixedThreadPool(int threadSize) {
    if (threadSize <= 0) {
        throw new IllegalArgumentException("ThreadSize must be greater than 0!");
    }
    if (threadSize == 1) {
        return MoreExecutors.sameThreadExecutor();
    }
    return new ThreadPoolExecutor(threadSize - 1, threadSize - 1, 0L, TimeUnit.MILLISECONDS,
            new SynchronousQueue<Runnable>(), new ThreadPoolExecutor.CallerRunsPolicy());
}

ThreadPoolExecutor.CallerRunsPolicy will call main thread to process request so the dispatching of urls will stop and other threads will be blocked.


在WebMagic的多线程实现中,由一个主线程负责URL分发,多个子线程负责请求的处理。但是存在一个问题:WebMagic使用的线程池使用了ThreadPoolExecutor.CallerRunsPolicy这一策略,这表示当线程池跑满后会用主线程来运行请求,这就导致其他线程执行结束后会一直等待。这会对性能有巨大影响。

Metadata

Metadata

Assignees

Labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions