|
@@ -5,8 +5,29 @@
|
|
|
[MLflow](https://mlflow.org) is an excellent open source platform to manage the ML lifecycle, including experimentation,
|
|
|
reproducibility, deployment, and a central model registry.
|
|
|
|
|
|
-Mlflow task is used to perform mlflow project tasks, which include basic algorithmic and autoML capabilities (
|
|
|
-User-defined MLFlow project task execution will be supported in the near future)
|
|
|
+MLflow task plugin used to execute MLflow tasks,Currently contains Mlflow Projects and MLflow Models.(Model Registry will soon be rewarded for support)
|
|
|
+
|
|
|
+- Mlflow Projects: Package data science code in a format to reproduce runs on any platform.
|
|
|
+- MLflow Models: Deploy machine learning models in diverse serving environments.
|
|
|
+- Model Registry: Store, annotate, discover, and manage models in a central repository.
|
|
|
+
|
|
|
+The Mlflow plugin currently supports and will support the following:
|
|
|
+
|
|
|
+- [ ] MLflow Projects
|
|
|
+ - [x] BasicAlgorithm: contains lr, svm, lightgbm, xgboost
|
|
|
+ - [x] AutoML: AutoML tool,contains autosklean, flaml
|
|
|
+ - [ ] Custom projects: Support for running your own MLflow projects
|
|
|
+- [ ] MLflow Models
|
|
|
+ - [x] MLFLOW: Use `MLflow models serve` to deploy a model service
|
|
|
+ - [x] Docker: Run the container after packaging the docker image
|
|
|
+ - [ ] Docker Compose: Use docker compose to run the container, Will replace the docker run above
|
|
|
+ - [ ] Seldon core: Use Selcon core to deploy model to k8s cluster
|
|
|
+ - [ ] k8s: Deploy containers directly to K8S
|
|
|
+ - [ ] mlflow deployments: Built-in deployment modules, such as built-in deployment to SageMaker, etc
|
|
|
+- [ ] Model Registry
|
|
|
+ - [ ] Register Model: Allows artifacts (Including model and related parameters, indicators) to be registered directly into the model center
|
|
|
+
|
|
|
+
|
|
|
|
|
|
## Create Task
|
|
|
|
|
@@ -14,68 +35,110 @@ User-defined MLFlow project task execution will be supported in the near future)
|
|
|
DAG editing page.
|
|
|
- Drag from the toolbar <img src="/img/tasks/icons/mlflow.png" width="15"/> task node to canvas.
|
|
|
|
|
|
-## Task Parameter
|
|
|
-
|
|
|
-- DolphinScheduler common parameters
|
|
|
- - **Node name**: The node name in a workflow definition is unique.
|
|
|
- - **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select
|
|
|
- the `prohibition execution`.
|
|
|
- - **Descriptive information**: Describe the function of the node.
|
|
|
- - **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high
|
|
|
- to low, and tasks with the same priority will execute in a first-in first-out order.
|
|
|
- - **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected,
|
|
|
- randomly select a worker machine for execution.
|
|
|
- - **Environment Name**: Configure the environment name in which run the script.
|
|
|
- - **Times of failed retry attempts**: The number of times the task failed to resubmit.
|
|
|
- - **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
|
|
|
- - **Delayed execution time**: The time (unit minute) that a task delays in execution.
|
|
|
- - **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm
|
|
|
- email will send and the task execution will fail.
|
|
|
- - **Custom parameter**: It is a local user-defined parameter for mlflow, and will replace the content
|
|
|
- with `${variable}` in the script.
|
|
|
- - **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as
|
|
|
- upstream of the current task.
|
|
|
-
|
|
|
-- MLflow task specific parameters
|
|
|
- - **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
|
|
|
- - **experiment name** :The experiment in which the task is running, if none, is created.
|
|
|
- - **register model** :Register the model or not. If register is selected, the following parameters are expanded.
|
|
|
- - **model name** : The registered model name is added to the original model version and registered as
|
|
|
- Production.
|
|
|
- - **job type** : The type of task to run, currently including the underlying algorithm and AutoML. (User-defined
|
|
|
- MLFlow project task execution will be supported in the near future)
|
|
|
- - BasicAlgorithm specific parameters
|
|
|
- - **algorithm** :The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based
|
|
|
- on [scikit-learn](https://scikit-learn.org/) form.
|
|
|
- - **Parameter search space** : Parameter search space when running the corresponding algorithm, which can be
|
|
|
- empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention
|
|
|
- will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name,
|
|
|
- and using the name after the equal sign to get the corresponding parameter value through `python eval()`.
|
|
|
- - AutoML specific parameters
|
|
|
- - **AutoML tool** : The AutoML tool used, currently
|
|
|
- supports [autosklearn](https://github.com/automl/auto-sklearn)
|
|
|
- and [flaml](https://github.com/microsoft/FLAML)
|
|
|
- - Parameters common to BasicAlgorithm and AutoML
|
|
|
- - **data path** : The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
|
|
|
- test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)。
|
|
|
- - **parameters** : Parameter when initializing the algorithm/AutoML model, which can be empty. For example
|
|
|
- parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; 'shards
|
|
|
- each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
|
|
|
- sign to get the corresponding parameter value through `python eval()`.
|
|
|
- - BasicAlgorithm
|
|
|
- - [lr](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)
|
|
|
- - [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
|
|
|
- - [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
|
|
|
- - [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
|
|
|
- - AutoML
|
|
|
- - [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
|
|
|
- - [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
|
|
|
-
|
|
|
## Task Example
|
|
|
|
|
|
-### Preparation
|
|
|
+First, introduce some general parameters of DolphinScheduler
|
|
|
+
|
|
|
+- **Node name**: The node name in a workflow definition is unique.
|
|
|
+- **Run flag**: Identifies whether this node schedules normally, if it does not need to execute, select
|
|
|
+ the `prohibition execution`.
|
|
|
+- **Descriptive information**: Describe the function of the node.
|
|
|
+- **Task priority**: When the number of worker threads is insufficient, execute in the order of priority from high
|
|
|
+ to low, and tasks with the same priority will execute in a first-in first-out order.
|
|
|
+- **Worker grouping**: Assign tasks to the machines of the worker group to execute. If `Default` is selected,
|
|
|
+ randomly select a worker machine for execution.
|
|
|
+- **Environment Name**: Configure the environment name in which run the script.
|
|
|
+- **Times of failed retry attempts**: The number of times the task failed to resubmit.
|
|
|
+- **Failed retry interval**: The time interval (unit minute) for resubmitting the task after a failed task.
|
|
|
+- **Delayed execution time**: The time (unit minute) that a task delays in execution.
|
|
|
+- **Timeout alarm**: Check the timeout alarm and timeout failure. When the task runs exceed the "timeout", an alarm
|
|
|
+ email will send and the task execution will fail.
|
|
|
+- **Predecessor task**: Selecting a predecessor task for the current task, will set the selected predecessor task as
|
|
|
+ upstream of the current task.
|
|
|
+
|
|
|
+### MLflow Projects
|
|
|
+
|
|
|
+#### BasicAlgorithm
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+**Task Parameter**
|
|
|
+
|
|
|
+- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
|
|
|
+- **job type** : The type of task to run, currently including the underlying algorithm and AutoML. (User-defined
|
|
|
+ MLFlow project task execution will be supported in the near future)
|
|
|
+- **experiment name** :The experiment in which the task is running, if none, is created.
|
|
|
+- **register model** :Register the model or not. If register is selected, the following parameters are expanded.
|
|
|
+ - **model name** : The registered model name is added to the original model version and registered as
|
|
|
+ Production.
|
|
|
+- **data path** : The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
|
|
|
+ test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)。
|
|
|
+- **parameters** : Parameter when initializing the algorithm/AutoML model, which can be empty. For example
|
|
|
+ parameters `"time_budget=30;estimator_list=['lgbm']"` for flaml 。The convention will be passed with '; 'shards
|
|
|
+ each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
|
|
|
+ sign to get the corresponding parameter value through `python eval()`.
|
|
|
+ - [Logistic Regression](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html#sklearn.linear_model.LogisticRegression)
|
|
|
+ - [SVM](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html?highlight=svc#sklearn.svm.SVC)
|
|
|
+ - [lightgbm](https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.LGBMClassifier.html#lightgbm.LGBMClassifier)
|
|
|
+ - [xgboost](https://xgboost.readthedocs.io/en/stable/python/python_api.html#xgboost.XGBClassifier)
|
|
|
+- **algorithm** :The selected algorithm currently supports `LR`, `SVM`, `LightGBM` and `XGboost` based
|
|
|
+ on [scikit-learn](https://scikit-learn.org/) form.
|
|
|
+- **Parameter search space** : Parameter search space when running the corresponding algorithm, which can be
|
|
|
+ empty. For example, the parameter `max_depth=[5, 10];n_estimators=[100, 200]` for lightgbm 。The convention
|
|
|
+ will be passed with '; 'shards each parameter, using the name before the equal sign as the parameter name,
|
|
|
+ and using the name after the equal sign to get the corresponding parameter value through `python eval()`.
|
|
|
+
|
|
|
+#### AutoML
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+**Task Parameter**
|
|
|
+
|
|
|
+- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
|
|
|
+- **job type** : The type of task to run, currently including the underlying algorithm and AutoML. (User-defined
|
|
|
+ MLFlow project task execution will be supported in the near future)
|
|
|
+- **experiment name** :The experiment in which the task is running, if none, is created.
|
|
|
+- **register model** :Register the model or not. If register is selected, the following parameters are expanded.
|
|
|
+ - **model name** : The registered model name is added to the original model version and registered as
|
|
|
+ Production.
|
|
|
+- **data path** : The absolute path of the file or folder. Ends with .csv for file or contain train.csv and
|
|
|
+ test.csv for folder(In the suggested way, users should build their own test sets for model evaluation)。
|
|
|
+- **parameters** : Parameter when initializing the algorithm/AutoML model, which can be empty. For example
|
|
|
+ parameters `n_estimators=200;learning_rate=0.2` for flaml 。The convention will be passed with '; 'shards
|
|
|
+ each parameter, using the name before the equal sign as the parameter name, and using the name after the equal
|
|
|
+ sign to get the corresponding parameter value through `python eval()`. The detailed parameter list is as follows:
|
|
|
+ - [flaml](https://microsoft.github.io/FLAML/docs/reference/automl#automl-objects)
|
|
|
+ - [autosklearn](https://automl.github.io/auto-sklearn/master/api.html)
|
|
|
+- **AutoML tool** : The AutoML tool used, currently
|
|
|
+ supports [autosklearn](https://github.com/automl/auto-sklearn)
|
|
|
+ and [flaml](https://github.com/microsoft/FLAML)
|
|
|
+
|
|
|
+
|
|
|
+### MLflow Models
|
|
|
+
|
|
|
+#### MLFLOW
|
|
|
+
|
|
|
+
|
|
|
|
|
|
-#### Conda env
|
|
|
+**Task Parameter**
|
|
|
+
|
|
|
+- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
|
|
|
+- **model-uri** :Model-uri of mlflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores
|
|
|
+- **Port** :The port to listen on
|
|
|
+
|
|
|
+#### Docker
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
+**Task Parameter**
|
|
|
+
|
|
|
+- **mlflow server tracking uri** :MLflow server uri, default http://localhost:5000.
|
|
|
+- **model-uri** :Model-uri of mlflow , support `models:/<model_name>/suffix` format and `runs:/` format. See https://mlflow.org/docs/latest/tracking.html#artifact-stores
|
|
|
+- **Port** :The port to listen on
|
|
|
+
|
|
|
+## Environment to prepare
|
|
|
+
|
|
|
+### Conda env
|
|
|
|
|
|
You need to enter the admin account to configure a conda environment variable(Please
|
|
|
install [anaconda](https://docs.continuum.io/anaconda/install/)
|
|
@@ -88,7 +151,7 @@ Conda environment.
|
|
|
|
|
|

|
|
|
|
|
|
-#### Start the mlflow service
|
|
|
+### Start the mlflow service
|
|
|
|
|
|
Make sure you have installed MLflow, using 'PIP Install MLFlow'.
|
|
|
|
|
@@ -102,16 +165,6 @@ mlflow server -h 0.0.0.0 -p 5000 --serve-artifacts --backend-store-uri sqlite://
|
|
|
|
|
|
After running, an MLflow service is started
|
|
|
|
|
|
-### Run BasicAlgorithm task
|
|
|
-
|
|
|
-The following example shows how to create an MLflow BasicAlgorithm task.
|
|
|
-
|
|
|
-
|
|
|
-
|
|
|
After this, you can visit the MLFlow service (`http://localhost:5000`) page to view the experiments and models.
|
|
|
|
|
|

|
|
|
-
|
|
|
-### Run AutoML task
|
|
|
-
|
|
|
-
|