Skip to content

What's the best way of diagnosing scheduler memory issues? #4998

@max-sixty

Description

@max-sixty

I don't have a repro unfortunately, but my question is limited to how to investigate problems.

I have a job that frequently will cause the scheduler to start accumulating memory, eventually to 400GB of memory and then failing (the scheduler machine has that much memory).

I think I've read the docs thoroughly, but can't find any reason the scheduler should do this beyond having large values in tasks. I'm fairly confident that the client is not submitting any large values in tasks — it starts with bag.from_sequence(date_list) and has a series of .map calls.

Are there other reasons? How could I see what the scheduler is holding?

Thanks in advance!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions