tutorial.rst 9.8 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223
  1. .. Licensed to the Apache Software Foundation (ASF) under one
  2. or more contributor license agreements. See the NOTICE file
  3. distributed with this work for additional information
  4. regarding copyright ownership. The ASF licenses this file
  5. to you under the Apache License, Version 2.0 (the
  6. "License"); you may not use this file except in compliance
  7. with the License. You may obtain a copy of the License at
  8. .. http://www.apache.org/licenses/LICENSE-2.0
  9. .. Unless required by applicable law or agreed to in writing,
  10. software distributed under the License is distributed on an
  11. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  12. KIND, either express or implied. See the License for the
  13. specific language governing permissions and limitations
  14. under the License.
  15. Tutorial
  16. ========
  17. This tutorial shows you the basic concept of *PyDolphinScheduler* and tells all
  18. things you should know before you submit or run your first workflow. If you
  19. still have not installed *PyDolphinScheduler* and start DolphinScheduler, you
  20. could go and see :ref:`how to getting start PyDolphinScheduler <start:getting started>` firstly.
  21. Overview of Tutorial
  22. --------------------
  23. Here have an overview of our tutorial, and it looks a little complex but does not
  24. worry about that because we explain this example below as detail as possible.
  25. There are two types of tutorials: traditional and task decorator.
  26. - **Traditional Way**: More general, support many :doc:`built-in task types <tasks/index>`, it is convenient
  27. when you build your workflow at the beginning.
  28. - **Task Decorator**: A Python decorator allow you to wrap your function into pydolphinscheduler's task. Less
  29. versatility to the traditional way because it only supported Python functions and without build-in tasks
  30. supported. But it is helpful if your workflow is all built with Python or if you already have some Python
  31. workflow code and want to migrate them to pydolphinscheduler.
  32. .. tab:: Tradition
  33. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
  34. :dedent: 0
  35. :start-after: [start tutorial]
  36. :end-before: [end tutorial]
  37. .. tab:: Task Decorator
  38. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
  39. :dedent: 0
  40. :start-after: [start tutorial]
  41. :end-before: [end tutorial]
  42. Import Necessary Module
  43. -----------------------
  44. First of all, we should import the necessary module which we would use later just like other Python packages.
  45. .. tab:: Tradition
  46. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
  47. :dedent: 0
  48. :start-after: [start package_import]
  49. :end-before: [end package_import]
  50. In tradition tutorial we import :class:`pydolphinscheduler.core.process_definition.ProcessDefinition` and
  51. :class:`pydolphinscheduler.tasks.shell.Shell`.
  52. If you want to use other task type you could click and :doc:`see all tasks we support <tasks/index>`
  53. .. tab:: Task Decorator
  54. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
  55. :dedent: 0
  56. :start-after: [start package_import]
  57. :end-before: [end package_import]
  58. In task decorator tutorial we import :class:`pydolphinscheduler.core.process_definition.ProcessDefinition` and
  59. :func:`pydolphinscheduler.tasks.func_wrap.task`.
  60. Process Definition Declaration
  61. ------------------------------
  62. We should instantiate :class:`pydolphinscheduler.core.process_definition.ProcessDefinition` object after we
  63. import them from `import necessary module`_. Here we declare basic arguments for process definition(aka, workflow).
  64. We define the name of :code:`ProcessDefinition`, using `Python context manager`_ and it **the only required argument**
  65. for `ProcessDefinition`. Besides, we also declare three arguments named :code:`schedule` and :code:`start_time`
  66. which setting workflow schedule interval and schedule start_time, and argument :code:`tenant` defines which tenant
  67. will be running this task in the DolphinScheduler worker. See :ref:`section tenant <concept:tenant>` in
  68. *PyDolphinScheduler* :doc:`concept` for more information.
  69. .. tab:: Tradition
  70. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
  71. :dedent: 0
  72. :start-after: [start workflow_declare]
  73. :end-before: [end workflow_declare]
  74. .. tab:: Task Decorator
  75. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
  76. :dedent: 0
  77. :start-after: [start workflow_declare]
  78. :end-before: [end workflow_declare]
  79. We could find more detail about :code:`ProcessDefinition` in :ref:`concept about process definition <concept:process definition>`
  80. if you are interested in it. For all arguments of object process definition, you could find in the
  81. :class:`pydolphinscheduler.core.process_definition` API documentation.
  82. Task Declaration
  83. ----------------
  84. .. tab:: Tradition
  85. We declare four tasks to show how to create tasks, and both of them are simple tasks of
  86. :class:`pydolphinscheduler.tasks.shell` which runs `echo` command in the terminal. Besides the argument
  87. `command` with :code:`echo` command, we also need to set the argument `name` for each task
  88. *(not only shell task, `name` is required for each type of task)*.
  89. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
  90. :dedent: 0
  91. :start-after: [start task_declare]
  92. :end-before: [end task_declare]
  93. Besides shell task, *PyDolphinScheduler* supports multiple tasks and you could find in :doc:`tasks/index`.
  94. .. tab:: Task Decorator
  95. We declare four tasks to show how to create tasks, and both of them are created by the task decorator which
  96. using :func:`pydolphinscheduler.tasks.func_wrap.task`. All we have to do is add a decorator named
  97. :code:`@task` to existing Python function, and then use them inside :class:`pydolphinscheduler.core.process_definition`
  98. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
  99. :dedent: 0
  100. :start-after: [start task_declare]
  101. :end-before: [end task_declare]
  102. It makes our workflow more Pythonic, but be careful that when we use task decorator mode mean we only use
  103. Python function as a task and could not use the :doc:`built-in tasks <tasks/index>` most of the cases.
  104. Setting Task Dependence
  105. -----------------------
  106. After we declare both process definition and task, we have four tasks that are independent and will be running
  107. in parallel. If you want to start one task until some task is finished, you have to set dependence on those
  108. tasks.
  109. Set task dependence is quite easy by task's attribute :code:`set_downstream` and :code:`set_upstream` or by
  110. bitwise operators :code:`>>` and :code:`<<`
  111. In this tutorial, task `task_parent` is the leading task of the whole workflow, then task `task_child_one` and
  112. task `task_child_two` are its downstream tasks. Task `task_union` will not run unless both task `task_child_one`
  113. and task `task_child_two` was done, because both two task is `task_union`'s upstream.
  114. .. tab:: Tradition
  115. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
  116. :dedent: 0
  117. :start-after: [start task_relation_declare]
  118. :end-before: [end task_relation_declare]
  119. .. tab:: Task Decorator
  120. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
  121. :dedent: 0
  122. :start-after: [start task_relation_declare]
  123. :end-before: [end task_relation_declare]
  124. .. note::
  125. We could set task dependence in batch mode if they have the same downstream or upstream by declaring those
  126. tasks as task groups. In tutorial, We declare task `task_child_one` and `task_child_two` as task group named
  127. `task_group`, then set `task_group` as downstream of task `task_parent`. You could see more detail in
  128. :ref:`concept:Tasks Dependence` for more detail about how to set task dependence.
  129. Submit Or Run Workflow
  130. ----------------------
  131. After that, we finish our workflow definition, with four tasks and task dependence, but all these things are
  132. local, we should let the DolphinScheduler daemon know how the definition of workflow. So the last thing we
  133. have to do is submit the workflow to the DolphinScheduler daemon.
  134. Fortunately, we have a convenient method to submit workflow via `ProcessDefinition` attribute :code:`run` which
  135. will create workflow definition as well as workflow schedule.
  136. .. tab:: Tradition
  137. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
  138. :dedent: 0
  139. :start-after: [start submit_or_run]
  140. :end-before: [end submit_or_run]
  141. .. tab:: Task Decorator
  142. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial_decorator.py
  143. :dedent: 0
  144. :start-after: [start submit_or_run]
  145. :end-before: [end submit_or_run]
  146. At last, we could execute this workflow code in your terminal like other Python scripts, running
  147. :code:`python tutorial.py` to trigger and execute it.
  148. .. note::
  149. If you do not start your DolphinScheduler API server, you could find how to start it in
  150. :ref:`start:start Python gateway service` for more detail. Besides attribute :code:`run`, we have attribute
  151. :code:`submit` for object `ProcessDefinition` which just submits workflow to the daemon but does not set
  152. the workflow schedule information. For more detail, you could see :ref:`concept:process definition`.
  153. DAG Graph After Tutorial Run
  154. ----------------------------
  155. After we run the tutorial code, you could log in DolphinScheduler web UI, go and see the
  156. `DolphinScheduler project page`_. They is a new process definition be created by *PyDolphinScheduler* and it
  157. named "tutorial" or "tutorial_decorator". The task graph of workflow like below:
  158. .. literalinclude:: ../../src/pydolphinscheduler/examples/tutorial.py
  159. :language: text
  160. :lines: 24-28
  161. .. _`DolphinScheduler project page`: https://dolphinscheduler.apache.org/en-us/docs/latest/user_doc/guide/project.html
  162. .. _`Python context manager`: https://docs.python.org/3/library/stdtypes.html#context-manager-types