-
Notifications
You must be signed in to change notification settings - Fork 3
Adding gpu metrics detection logic to seff script #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This is for GH200s on Alpine and a40's on Blanca
|
Confluence page for the tests I ran: https://colorado.atlassian.net/wiki/spaces/RC/pages/1431601158/Slurm+seff+update+Tests |
b-reyes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mohalkh5 thank you very much for putting this together. After getting more familiar with this script, I noticed that seff is actually querying slurmdb using the Perl API, rather than using sacct. For this reason, I decided to provide an alternative approach where we also use the Perl API to obtain the information we want.
We do not have to go with this alternative approach, but we should definitely discuss, if we want to. Please see PR 1 where I provide this alternative approach. That PR merges into your add-seff-gpu branch. So, if we decide to go with that, we can merge it in and it will be reflected here.
Additionally, here are some comments on general items.
- Can we please test this for job arrays ids?
- Did you create a task with RIT to fix the missing gpus from
AccountingStorageTres? - Please add the modifications we discussed for
seff-array - Before we merge this in, we need to add a link to the documentation
|
@b-reyes I have added a comment to the Perl API PR 1 with changes for identifying and printing GPU Type. For PR 5: |
* instead of calling salloc use the Slurm perl API to get GPU metrics * add a space above ──────── CPU Metrics ──────── * modify the units displayed in kbytes2str so they reflect the true units e.g. iB * Call function to get the GPU type Co-authored-by: mohalkh5 <[email protected]> * apply @mohalkh5's suggestion for a function that gets the GPU type Co-authored-by: mohalkh5 <[email protected]> --------- Co-authored-by: mohalkh5 <[email protected]>
|
@mohalkh5 thank you for reviewing my PR and for adding the additional items for gathering the GPU type. After break, I will do a second round of review. |
No description provided.