123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223 |
- .. Licensed to the Apache Software Foundation (ASF) under one
- or more contributor license agreements. See the NOTICE file
- distributed with this work for additional information
- regarding copyright ownership. The ASF licenses this file
- to you under the Apache License, Version 2.0 (the
- "License"); you may not use this file except in compliance
- with the License. You may obtain a copy of the License at
- .. http://www.apache.org/licenses/LICENSE-2.0
- .. Unless required by applicable law or agreed to in writing,
- software distributed under the License is distributed on an
- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- KIND, either express or implied. See the License for the
- specific language governing permissions and limitations
- under the License.
- Tutorial
- ========
- This tutorial shows you the basic concept of *PyDolphinScheduler* and tells all
- things you should know before you submit or run your first workflow. If you
- still have not installed *PyDolphinScheduler* and start DolphinScheduler, you
- could go and see :ref:`how to getting start PyDolphinScheduler <start:getting started>` firstly.
- Overview of Tutorial
- --------------------
- Here have an overview of our tutorial, and it looks a little complex but does not
- worry about that because we explain this example below as detail as possible.
- There are two types of tutorials: traditional and task decorator.
- - **Traditional Way**: More general, support many :doc:`built-in task types <tasks/index>`, it is convenient
- when you build your workflow at the beginning.
- - **Task Decorator**: A Python decorator allow you to wrap your function into pydolphinscheduler's task. Less
- versatility to the traditional way because it only supported Python functions and without build-in tasks
- supported. But it is helpful if your workflow is all built with Python or if you already have some Python
- workflow code and want to migrate them to pydolphinscheduler.
- .. tab:: Tradition
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
- :dedent: 0
- :start-after: [start tutorial]
- :end-before: [end tutorial]
- .. tab:: Task Decorator
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
- :dedent: 0
- :start-after: [start tutorial]
- :end-before: [end tutorial]
- Import Necessary Module
- -----------------------
- First of all, we should import the necessary module which we would use later just like other Python packages.
- .. tab:: Tradition
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
- :dedent: 0
- :start-after: [start package_import]
- :end-before: [end package_import]
- In tradition tutorial we import :class:`pydolphinscheduler.core.process_definition.ProcessDefinition` and
- :class:`pydolphinscheduler.tasks.shell.Shell`.
- If you want to use other task type you could click and :doc:`see all tasks we support <tasks/index>`
- .. tab:: Task Decorator
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
- :dedent: 0
- :start-after: [start package_import]
- :end-before: [end package_import]
- In task decorator tutorial we import :class:`pydolphinscheduler.core.process_definition.ProcessDefinition` and
- :func:`pydolphinscheduler.tasks.func_wrap.task`.
- Process Definition Declaration
- ------------------------------
- We should instantiate :class:`pydolphinscheduler.core.process_definition.ProcessDefinition` object after we
- import them from `import necessary module`_. Here we declare basic arguments for process definition(aka, workflow).
- We define the name of :code:`ProcessDefinition`, using `Python context manager`_ and it **the only required argument**
- for `ProcessDefinition`. Besides, we also declare three arguments named :code:`schedule` and :code:`start_time`
- which setting workflow schedule interval and schedule start_time, and argument :code:`tenant` defines which tenant
- will be running this task in the DolphinScheduler worker. See :ref:`section tenant <concept:tenant>` in
- *PyDolphinScheduler* :doc:`concept` for more information.
- .. tab:: Tradition
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
- :dedent: 0
- :start-after: [start workflow_declare]
- :end-before: [end workflow_declare]
- .. tab:: Task Decorator
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
- :dedent: 0
- :start-after: [start workflow_declare]
- :end-before: [end workflow_declare]
- We could find more detail about :code:`ProcessDefinition` in :ref:`concept about process definition <concept:process definition>`
- if you are interested in it. For all arguments of object process definition, you could find in the
- :class:`pydolphinscheduler.core.process_definition` API documentation.
- Task Declaration
- ----------------
- .. tab:: Tradition
- We declare four tasks to show how to create tasks, and both of them are simple tasks of
- :class:`pydolphinscheduler.tasks.shell` which runs `echo` command in the terminal. Besides the argument
- `command` with :code:`echo` command, we also need to set the argument `name` for each task
- *(not only shell task, `name` is required for each type of task)*.
-
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
- :dedent: 0
- :start-after: [start task_declare]
- :end-before: [end task_declare]
- Besides shell task, *PyDolphinScheduler* supports multiple tasks and you could find in :doc:`tasks/index`.
- .. tab:: Task Decorator
- We declare four tasks to show how to create tasks, and both of them are created by the task decorator which
- using :func:`pydolphinscheduler.tasks.func_wrap.task`. All we have to do is add a decorator named
- :code:`@task` to existing Python function, and then use them inside :class:`pydolphinscheduler.core.process_definition`
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
- :dedent: 0
- :start-after: [start task_declare]
- :end-before: [end task_declare]
- It makes our workflow more Pythonic, but be careful that when we use task decorator mode mean we only use
- Python function as a task and could not use the :doc:`built-in tasks <tasks/index>` most of the cases.
- Setting Task Dependence
- -----------------------
- After we declare both process definition and task, we have four tasks that are independent and will be running
- in parallel. If you want to start one task until some task is finished, you have to set dependence on those
- tasks.
- Set task dependence is quite easy by task's attribute :code:`set_downstream` and :code:`set_upstream` or by
- bitwise operators :code:`>>` and :code:`<<`
- In this tutorial, task `task_parent` is the leading task of the whole workflow, then task `task_child_one` and
- task `task_child_two` are its downstream tasks. Task `task_union` will not run unless both task `task_child_one`
- and task `task_child_two` was done, because both two task is `task_union`'s upstream.
- .. tab:: Tradition
-
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
- :dedent: 0
- :start-after: [start task_relation_declare]
- :end-before: [end task_relation_declare]
- .. tab:: Task Decorator
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
- :dedent: 0
- :start-after: [start task_relation_declare]
- :end-before: [end task_relation_declare]
- .. note::
- We could set task dependence in batch mode if they have the same downstream or upstream by declaring those
- tasks as task groups. In tutorial, We declare task `task_child_one` and `task_child_two` as task group named
- `task_group`, then set `task_group` as downstream of task `task_parent`. You could see more detail in
- :ref:`concept:Tasks Dependence` for more detail about how to set task dependence.
- Submit Or Run Workflow
- ----------------------
- After that, we finish our workflow definition, with four tasks and task dependence, but all these things are
- local, we should let the DolphinScheduler daemon know how the definition of workflow. So the last thing we
- have to do is submit the workflow to the DolphinScheduler daemon.
- Fortunately, we have a convenient method to submit workflow via `ProcessDefinition` attribute :code:`run` which
- will create workflow definition as well as workflow schedule.
- .. tab:: Tradition
-
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
- :dedent: 0
- :start-after: [start submit_or_run]
- :end-before: [end submit_or_run]
- .. tab:: Task Decorator
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
- :dedent: 0
- :start-after: [start submit_or_run]
- :end-before: [end submit_or_run]
- At last, we could execute this workflow code in your terminal like other Python scripts, running
- :code:`python tutorial.py` to trigger and execute it.
- .. note::
- If you do not start your DolphinScheduler API server, you could find how to start it in
- :ref:`start:start Python gateway service` for more detail. Besides attribute :code:`run`, we have attribute
- :code:`submit` for object `ProcessDefinition` which just submits workflow to the daemon but does not set
- the workflow schedule information. For more detail, you could see :ref:`concept:process definition`.
- DAG Graph After Tutorial Run
- ----------------------------
- After we run the tutorial code, you could log in DolphinScheduler web UI, go and see the
- `DolphinScheduler project page`_. They is a new process definition be created by *PyDolphinScheduler* and it
- named "tutorial" or "tutorial_decorator". The task graph of workflow like below:
- .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
- :language: text
- :lines: 24-28
- .. _`DolphinScheduler project page`: https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/project.html
- .. _`Python context manager`: https://docs.python.org/3/library/stdtypes.html#context-manager-types
|