- 
                Notifications
    
You must be signed in to change notification settings  - Fork 134
 
Add bucket calibration, allow reading/writing bucketing configs to file #345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| import pandas as pd | ||
| 
               | 
          ||
| def yaml_serializer(df, bucket_cfg_file): | ||
| import yaml | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to add pyyaml to requirements?
| return {'min': cfg[0], 'step': cfg[1], 'max': cfg[2]} | ||
| 
               | 
          ||
| data: Dict[str, Any] = {} # type: ignore | ||
| #data['buckets'] = df.to_dict(orient='records') | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented code
| logger.warning("Configuration: (%s, %s, %s) was not warmed-up!", | ||
| phase, batch_size, seq_len) | ||
| if not self.calibrate_buckets: | ||
| logger.warning( | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to divide code into 3 lines? We now have wide displays and it does nt make it more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to divide code into 3 lines?
I wish we didn't, but format.sh made this into such an abomination.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @kzawora-intel ,
Could you please add documentation to README_GAUDI.md?
I don't understand motivation behind this feature and how it can be used in the production.
Do you plan to upstream it?
Let's have a look at your example:
(decode, [128, 1024])=> fine, it will be warmed up(decode, [128, 1152])=> fine, it will be warmed up(decode, [128, 896])=> it will not be warmed up
What will happen in the third example:
- vllm will use warmed up 
(decode, [128, 1024])? - vllm will compile 
(decode, [128, 896])? 
IMHO the second option would be non-intuitive and would make this feature not usable in the production.
| 
           This pull request has been automatically marked as stale because it has not had any activity within 90 days. It will be automatically closed if no further activity occurs within 30 days. Leave a comment if you feel this pull request should remain open. Thank you!  | 
    
| 
           This pull request has been automatically closed due to inactivity. Please feel free to reopen if you intend to continue working on it. Thank you!  | 
    
This PR adds allows user to calibrate bucket usage and load bucket configuration from file.
Design
When
VLLM_HPU_CALIBRATE_BUCKETS=trueenv var is passed, warmup will be disabled, and upon destruction, the server will store bucket configs and utilized buckets in YAML file (optionally defined inVLLM_HPU_BUCKET_CFGenv var).An example YAML file looks as follows:
Optionally, user can also emit CSV with buckets (useful for data analysis using external tools):
In CSV mode there is no way to dump the
bucket_cfgdata.VLLM_{phase}_{dim}_BUCKET_{param}environment variables.VLLM_{phase}_{dim}_BUCKET_{param}environment variables override values provided in the YAML fileUsage
VLLM_HPU_CALIBRATE_BUCKETSistrueor1andVLLM_HPU_BUCKET_CFGis not provided, calibration will happen, and calibration results will be saved tohpu-buckets-{vllm_instance_id}.yaml.VLLM_HPU_CALIBRATE_BUCKETSistrueor1andVLLM_HPU_BUCKET_CFGis provided, calibration will happen, and calibration results will be saved to a file path defined byVLLM_HPU_BUCKET_CFG. If extension ofVLLM_HPU_BUCKET_CFGis.csv(case insensitive), buckets will be saved in CSV format, if extension isymloryaml(case insensitive), buckets and their ranges will be saved in YAML format.VLLM_HPU_CALIBRATE_BUCKETSis nottrueor1andVLLM_HPU_BUCKET_CFGis provided, calibration will not happen, and bucket settings will be loaded from a file path defined byVLLM_HPU_BUCKET_CFG(both YAML and CSV supported)VLLM_HPU_CALIBRATE_BUCKETSis nottrueor1andVLLM_HPU_BUCKET_CFGis not provided, calibration will not happen, and bucket generation will not be altered in any way (default behavior)Examples:
Calibration with unspecified output file
Input:
Output:
Calibration with specified output file
Input:
Output log:
Calibration yaml:
Loading calibration YAML
Input:
Output log:
Default behavior:
Input:
Output log:
Warmup time and number of buckets has drastically decreased, as only buckets used at least once by given workload are used, and the remaining ones are discarded.