-
Notifications
You must be signed in to change notification settings - Fork 73
Refactoring of benchmarks #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from 34 commits
Commits
Show all changes
36 commits
Select commit
Hold shift + click to select a range
23a7df5
Refactor draft
Alexsandruss f865330
Fix dev.guide link
Alexsandruss 94a2add
Configs update and minor fixes
Alexsandruss 6831628
Copyright and doc fixes
Alexsandruss 2cb6fa1
Add argument aliases
Alexsandruss 5299104
Update configs and docs with corresponding code changes
Alexsandruss 6a73712
Change INCLUDE directive in config spec
Alexsandruss ceb813e
Basic daal4py modelbuilders support
Alexsandruss 2a1e134
Correction of configs
Alexsandruss 610dc8e
Add basic sklearn-like emulation of approx. kNN
Alexsandruss 80ce836
Change configs structure (add common sets);
Alexsandruss 11d4a56
Linting
Alexsandruss 6dd91ff
Update online computation mode
Alexsandruss 780a141
Update for ANN emulators
Alexsandruss c44943c
Update xgboost configs;
Alexsandruss 917cc32
Remove mutex from envs
Alexsandruss 509dbba
Add modin format; fix for faiss ivf_pq compatibility
Alexsandruss 37b21d3
Add modin support; fixes for ANN emulators
Alexsandruss 22ce12e
Add SVS NearestNeighbors emulator
Alexsandruss 1c6bd66
Update CI and minor code rework
Alexsandruss e318f64
Add dpnp and dpctl support
Alexsandruss 6c8a08b
Intermediate changes: apply comments, bug fixes
Alexsandruss 836ffcc
Pin CI Python version to 3.10
Alexsandruss 8453243
CI command fix and doc links fix
Alexsandruss 06ae1ab
Shell usage fix
Alexsandruss 9d62d5d
SPMD support
Alexsandruss 765a5c3
CI fixes
Alexsandruss 6798d01
Update sklbench args info
Alexsandruss 1510e96
Conda envs and CI conf update
Alexsandruss ef0b7c5
Example configs update and fixes:
Alexsandruss 831df21
Fixes and comments applying:
Alexsandruss 8359dea
Fix doctree link and add missing config warning
Alexsandruss 96b7e15
CI matrix update and doc fixes
Alexsandruss dc00f5f
Docs, configs and codeowners changes
Alexsandruss f2fd91e
Update codeowners and doc fix
Alexsandruss 7d144bc
Add examples run to CI
Alexsandruss File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,7 @@ | ||
#owners and reviewers | ||
cuml_bench/* @Alexsandruss | ||
daal4py_bench/* @Alexsandruss @samir-nasibli | ||
datasets/* @Alexsandruss | ||
modelbuilders_bench/* @Alexsandruss | ||
report_generator/* @Alexsandruss | ||
sklearn_bench/* @Alexsandruss @samir-nasibli | ||
xgboost_bench/* @Alexsandruss | ||
*.md @Alexsandruss @maria-Petrova | ||
# owners and reviewers | ||
configs @Alexsandruss | ||
configs/spmd* @Alexsandruss @ethanglaser | ||
sklbench @Alexsandruss | ||
*.md @Alexsandruss @samir-nasibli | ||
requirements*.txt @Alexsandruss | ||
conda-env-*.yml @Alexsandruss |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,18 +1,18 @@ | ||
# Logs | ||
*.log | ||
|
||
# Release and work directories | ||
__pycache__* | ||
__work* | ||
|
||
# Visual Studio related files, e.g., ".vscode" | ||
.vs* | ||
|
||
# Datasets | ||
data | ||
# Dataset files | ||
data_cache | ||
*.csv | ||
*.npy | ||
*.npz | ||
|
||
# Results | ||
results*.json | ||
*.xlsx | ||
# Results at repo root | ||
vtune_results | ||
/*.json | ||
/*.xlsx | ||
/*.ipynb |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
#=============================================================================== | ||
# Copyright 2024 Intel Corporation | ||
# | ||
# Licensed under the Apache License, Version 2.0 (the "License"); | ||
# you may not use this file except in compliance with the License. | ||
# You may obtain a copy of the License at | ||
# | ||
# http://www.apache.org/licenses/LICENSE-2.0 | ||
# | ||
# Unless required by applicable law or agreed to in writing, software | ||
# distributed under the License is distributed on an "AS IS" BASIS, | ||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
#=============================================================================== | ||
|
||
repos: | ||
- repo: https://github.com/psf/black | ||
rev: 23.7.0 | ||
hooks: | ||
- id: black | ||
language_version: python3.10 | ||
- repo: https://github.com/PyCQA/isort | ||
rev: 5.12.0 | ||
hooks: | ||
- id: isort | ||
language_version: python3.10 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,147 +1,105 @@ | ||
|
||
# Machine Learning Benchmarks <!-- omit in toc --> | ||
# Machine Learning Benchmarks | ||
|
||
Alexsandruss marked this conversation as resolved.
Show resolved
Hide resolved
|
||
[](https://dev.azure.com/daal/scikit-learn_bench/_build/latest?definitionId=8&branchName=main) | ||
Alexsandruss marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
**Machine Learning Benchmarks** contains implementations of machine learning algorithms | ||
across data analytics frameworks. Scikit-learn_bench can be extended to add new frameworks | ||
and algorithms. It currently supports the [scikit-learn](https://scikit-learn.org/), | ||
[DAAL4PY](https://intelpython.github.io/daal4py/), [cuML](https://github.com/rapidsai/cuml), | ||
and [XGBoost](https://github.com/dmlc/xgboost) frameworks for commonly used | ||
[machine learning algorithms](#supported-algorithms). | ||
|
||
## Follow us on Medium <!-- omit in toc --> | ||
|
||
We publish blogs on Medium, so [follow us](https://medium.com/intel-analytics-software/tagged/machine-learning) to learn tips and tricks for more efficient data analysis. Here are our latest blogs: | ||
**Scikit-learn_bench** is a benchmark tool for libraries and frameworks implementing Scikit-learn-like APIs and other workloads. | ||
|
||
- [Save Time and Money with Intel Extension for Scikit-learn](https://medium.com/intel-analytics-software/save-time-and-money-with-intel-extension-for-scikit-learn-33627425ae4) | ||
- [Superior Machine Learning Performance on the Latest Intel Xeon Scalable Processors](https://medium.com/intel-analytics-software/superior-machine-learning-performance-on-the-latest-intel-xeon-scalable-processor-efdec279f5a3) | ||
- [Leverage Intel Optimizations in Scikit-Learn](https://medium.com/intel-analytics-software/leverage-intel-optimizations-in-scikit-learn-f562cb9d5544) | ||
- [Optimizing CatBoost Performance](https://medium.com/intel-analytics-software/optimizing-catboost-performance-4f73f0593071) | ||
- [Intel Gives Scikit-Learn the Performance Boost Data Scientists Need](https://medium.com/intel-analytics-software/intel-gives-scikit-learn-the-performance-boost-data-scientists-need-42eb47c80b18) | ||
- [From Hours to Minutes: 600x Faster SVM](https://medium.com/intel-analytics-software/from-hours-to-minutes-600x-faster-svm-647f904c31ae) | ||
- [Improve the Performance of XGBoost and LightGBM Inference](https://medium.com/intel-analytics-software/improving-the-performance-of-xgboost-and-lightgbm-inference-3b542c03447e) | ||
- [Accelerate Kaggle Challenges Using Intel AI Analytics Toolkit](https://medium.com/intel-analytics-software/accelerate-kaggle-challenges-using-intel-ai-analytics-toolkit-beb148f66d5a) | ||
- [Accelerate Your scikit-learn Applications](https://medium.com/intel-analytics-software/improving-the-performance-of-xgboost-and-lightgbm-inference-3b542c03447e) | ||
- [Optimizing XGBoost Training Performance](https://medium.com/intel-analytics-software/new-optimizations-for-cpu-in-xgboost-1-1-81144ea21115) | ||
- [Accelerate Linear Models for Machine Learning](https://medium.com/intel-analytics-software/accelerating-linear-models-for-machine-learning-5a75ff50a0fe) | ||
- [Accelerate K-Means Clustering](https://medium.com/intel-analytics-software/accelerate-k-means-clustering-6385088788a1) | ||
- [Fast Gradient Boosting Tree Inference](https://medium.com/intel-analytics-software/fast-gradient-boosting-tree-inference-for-intel-xeon-processors-35756f174f55) | ||
Benefits: | ||
- Full control of benchmarks suite through CLI | ||
- Flexible and powerful benchmark config structure | ||
- Available with advanced profiling tools, such as Intel(R) VTune* Profiler | ||
- Automated benchmarks report generation | ||
|
||
## Table of content <!-- omit in toc --> | ||
### 📜 Table of Contents | ||
|
||
- [How to create conda environment for benchmarking](#how-to-create-conda-environment-for-benchmarking) | ||
- [Running Python benchmarks with runner script](#running-python-benchmarks-with-runner-script) | ||
- [Benchmark supported algorithms](#benchmark-supported-algorithms) | ||
- [Scikit-learn benchmakrs](#scikit-learn-benchmakrs) | ||
- [Algorithm parameters](#algorithm-parameters) | ||
- [Machine Learning Benchmarks](#machine-learning-benchmarks) | ||
- [🔧 Create a Python Environment](#-create-a-python-environment) | ||
- [🚀 How To Use Scikit-learn\_bench](#-how-to-use-scikit-learn_bench) | ||
- [Benchmarks Runner](#benchmarks-runner) | ||
- [Report Generator](#report-generator) | ||
- [Scikit-learn\_bench High-Level Workflow](#scikit-learn_bench-high-level-workflow) | ||
- [📚 Benchmark Types](#-benchmark-types) | ||
- [📑 Documentation](#-documentation) | ||
|
||
## How to create conda environment for benchmarking | ||
## 🔧 Create a Python Environment | ||
|
||
Create a suitable conda environment for each framework to test. Each item in the list below links to instructions to create an appropriate conda environment for the framework. | ||
How to create a usable Python environment with the following required frameworks: | ||
|
||
- [**scikit-learn**](sklearn_bench#how-to-create-conda-environment-for-benchmarking) | ||
- **sklearn, sklearnex, and gradient boosting frameworks**: | ||
|
||
```bash | ||
pip install -r sklearn_bench/requirements.txt | ||
# or | ||
conda install -c intel scikit-learn scikit-learn-intelex pandas tqdm | ||
# with pip | ||
pip install -r envs/requirements-sklearn.txt | ||
# or with conda | ||
conda env create -n sklearn -f envs/conda-env-sklearn.yml | ||
``` | ||
|
||
- [**daal4py**](daal4py_bench#how-to-create-conda-environment-for-benchmarking) | ||
- **RAPIDS**: | ||
|
||
```bash | ||
conda install -c conda-forge scikit-learn daal4py pandas tqdm | ||
conda env create -n rapids --solver=libmamba -f envs/conda-env-rapids.yml | ||
``` | ||
|
||
- [**cuml**](cuml_bench#how-to-create-conda-environment-for-benchmarking) | ||
## 🚀 How To Use Scikit-learn_bench | ||
|
||
```bash | ||
conda install -c rapidsai -c conda-forge cuml pandas cudf tqdm | ||
``` | ||
### Benchmarks Runner | ||
|
||
- [**xgboost**](xgboost_bench#how-to-create-conda-environment-for-benchmarking) | ||
How to run benchmarks using the `sklbench` module and a specific configuration: | ||
|
||
```bash | ||
pip install -r xgboost_bench/requirements.txt | ||
# or | ||
conda install -c conda-forge xgboost scikit-learn pandas tqdm | ||
python -m sklbench --config configs/sklearn_example.json | ||
``` | ||
|
||
## Running Python benchmarks with runner script | ||
|
||
Run `python runner.py --configs configs/config_example.json [--output-file result.json --verbose INFO --report]` to launch benchmarks. | ||
|
||
Options: | ||
|
||
- ``--configs``: specify the path to a configuration file or a folder that contains configuration files. | ||
- ``--no-intel-optimized``: use Scikit-learn without [Intel(R) Extension for Scikit-learn*](#intelr-extension-for-scikit-learn-support). Now available for [scikit-learn benchmarks](https://github.com/IntelPython/scikit-learn_bench/tree/main/sklearn_bench). By default, the runner uses Intel(R) Extension for Scikit-learn. | ||
- ``--output-file``: specify the name of the output file for the benchmark result. The default name is `result.json` | ||
- ``--report``: create an Excel report based on benchmark results. The `openpyxl` library is required. | ||
- ``--dummy-run``: run configuration parser and dataset generation without benchmarks running. | ||
- ``--verbose``: *WARNING*, *INFO*, *DEBUG*. Print out additional information when the benchmarks are running. The default is *INFO*. | ||
|
||
| Level | Description | | ||
|-----------|---------------| | ||
| *DEBUG* | etailed information, typically of interest only when diagnosing problems. Usually at this level the logging output is so low level that it’s not useful to users who are not familiar with the software’s internals. | | ||
| *INFO* | Confirmation that things are working as expected. | | ||
| *WARNING* | An indication that something unexpected happened, or indicative of some problem in the near future (e.g. ‘disk space low’). The software is still working as expected. | | ||
|
||
Benchmarks currently support the following frameworks: | ||
The default output is a file with JSON-formatted results of benchmarking cases. To generate a better human-readable report, use the following command: | ||
|
||
- **scikit-learn** | ||
- **daal4py** | ||
- **cuml** | ||
- **xgboost** | ||
```bash | ||
python -m sklbench --config configs/sklearn_example.json --report | ||
``` | ||
|
||
The configuration of benchmarks allows you to select the frameworks to run, select datasets for measurements and configure the parameters of the algorithms. | ||
By default, output and report file paths are `result.json` and `report.xlsx`. To specify custom file paths, run: | ||
|
||
You can configure benchmarks by editing a config file. Check [config.json schema](https://github.com/IntelPython/scikit-learn_bench/blob/main/configs/README.md) for more details. | ||
```bash | ||
python -m sklbench --config configs/sklearn_example.json --report --result-file result_example.json --report-file report_example.xlsx | ||
``` | ||
|
||
## Benchmark supported algorithms | ||
For a description of all benchmarks runner arguments, refer to [documentation](sklbench/runner/README.md#arguments). | ||
|
||
| algorithm | benchmark name | sklearn (CPU) | sklearn (GPU) | daal4py | cuml | xgboost | | ||
|---|---|---|---|---|---|---| | ||
|**[DBSCAN](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html)**|dbscan|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:x:| | ||
|**[RandomForestClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html)**|df_clfs|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:|:x:| | ||
|**[RandomForestRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html)**|df_regr|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:|:x:| | ||
|**[pairwise_distances](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise_distances.html)**|distances|:white_check_mark:|:x:|:white_check_mark:|:x:|:x:| | ||
|**[KMeans](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html)**|kmeans|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:x:| | ||
|**[KNeighborsClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.neighbors.KNeighborsClassifier.html)**|knn_clsf|:white_check_mark:|:x:|:x:|:white_check_mark:|:x:| | ||
|**[LinearRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html)**|linear|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:x:| | ||
|**[LogisticRegression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)**|log_reg|:white_check_mark:|:white_check_mark:|:white_check_mark:|:white_check_mark:|:x:| | ||
|**[PCA](https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html)**|pca|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:|:x:| | ||
|**[Ridge](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html)**|ridge|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:|:x:| | ||
|**[SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html)**|svm|:white_check_mark:|:x:|:white_check_mark:|:white_check_mark:|:x:| | ||
|**[TSNE](https://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html)**|tsne|:white_check_mark:|:x:|:x:|:white_check_mark:|:x:| | ||
|**[train_test_split](https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html)**|train_test_split|:white_check_mark:|:x:|:x:|:white_check_mark:|:x:| | ||
|**[GradientBoostingClassifier](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html)**|gbt|:x:|:x:|:x:|:x:|:white_check_mark:| | ||
|**[GradientBoostingRegressor](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingRegressor.html)**|gbt|:x:|:x:|:x:|:x:|:white_check_mark:| | ||
### Report Generator | ||
|
||
### Scikit-learn benchmakrs | ||
To combine raw result files gathered from different environments, call the report generator: | ||
|
||
When you run scikit-learn benchmarks on CPU, [Intel(R) Extension for Scikit-learn](https://github.com/intel/scikit-learn-intelex) is used by default. Use the ``--no-intel-optimized`` option to run the benchmarks without the extension. | ||
```bash | ||
python -m sklbench.report --result-files result_1.json result_2.json --report-file report_example.xlsx | ||
``` | ||
|
||
For the algorithms with both CPU and GPU support, you may use the same [configuration file](https://github.com/IntelPython/scikit-learn_bench/blob/main/configs/skl_xpu_config.json) to run the scikit-learn benchmarks on CPU and GPU. | ||
For a description of all report generator arguments, refer to [documentation](sklbench/report/README.md#arguments). | ||
|
||
## Algorithm parameters | ||
### Scikit-learn_bench High-Level Workflow | ||
|
||
You can launch benchmarks for each algorithm separately. | ||
To do this, go to the directory with the benchmark: | ||
```mermaid | ||
flowchart TB | ||
A[User] -- High-level arguments --> B[Benchmarks runner] | ||
B -- Generated benchmarking cases --> C["Benchmarks collection"] | ||
C -- Raw JSON-formatted results --> D[Report generator] | ||
D -- Human-readable report --> A | ||
|
||
```bash | ||
cd <framework> | ||
classDef userStyle fill:#44b,color:white,stroke-width:2px,stroke:white; | ||
class A userStyle | ||
``` | ||
|
||
Run the following command: | ||
## 📚 Benchmark Types | ||
|
||
```bash | ||
python <benchmark_file> --dataset-name <path to the dataset> <other algorithm parameters> | ||
``` | ||
**Scikit-learn_bench** supports the following types of benchmarks: | ||
|
||
The list of supported parameters for each algorithm you can find here: | ||
- **Scikit-learn estimator** - Measures performance and quality metrics of the [sklearn-like estimator](https://scikit-learn.org/stable/glossary.html#term-estimator). | ||
- **Function** - Measures performance metrics of specified function. | ||
|
||
- [**scikit-learn**](sklearn_bench#algorithms-parameters) | ||
- [**daal4py**](daal4py_bench#algorithms-parameters) | ||
- [**cuml**](cuml_bench#algorithms-parameters) | ||
- [**xgboost**](xgboost_bench#algorithms-parameters) | ||
## 📑 Documentation | ||
[Scikit-learn_bench](README.md): | ||
- [Configs](configs/README.md) | ||
- [Benchmarks Runner](sklbench/runner/README.md) | ||
- [Report Generator](sklbench/report/README.md) | ||
- [Benchmarks](sklbench/benchmarks/README.md) | ||
- [Data Processing](sklbench/datasets/README.md) | ||
- [Data Processing](sklbench/emulators/README.md) | ||
- [Developer Guide](docs/README.md) | ||
Alexsandruss marked this conversation as resolved.
Show resolved
Hide resolved
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.