config.rst 14 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218
  1. .. Licensed to the Apache Software Foundation (ASF) under one
  2. or more contributor license agreements. See the NOTICE file
  3. distributed with this work for additional information
  4. regarding copyright ownership. The ASF licenses this file
  5. to you under the Apache License, Version 2.0 (the
  6. "License"); you may not use this file except in compliance
  7. with the License. You may obtain a copy of the License at
  8. .. http://www.apache.org/licenses/LICENSE-2.0
  9. .. Unless required by applicable law or agreed to in writing,
  10. software distributed under the License is distributed on an
  11. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  12. KIND, either express or implied. See the License for the
  13. specific language governing permissions and limitations
  14. under the License.
  15. Configuration
  16. =============
  17. pydolphinscheduler has a built-in module setting necessary configuration to start and run your workflow code.
  18. You could directly use them if you only want to run a quick start or for a simple job like POC. But if you
  19. want to deep use pydolphinscheduler and even use it in production. You should probably need to modify and
  20. change the built-in configuration.
  21. We have two ways to modify the configuration:
  22. - `Using Environment Variables`_: The more lightweight way to modify the configuration. it is useful in
  23. containerization scenarios, like docker and k8s, or when you like to temporarily override configs in the
  24. configuration file.
  25. - `Using Configuration File`_: The more general way to modify the configuration. It is useful when you want
  26. to persist and manage configuration files in one single file.
  27. Using Environment Variables
  28. ---------------------------
  29. You could change the configuration by adding or modifying the operating system's environment variables. No
  30. matter what way you used, as long as you can successfully modify the environment variables. We use two common
  31. ways, `Bash <by bash>`_ and `Python OS Module <by python os module>`_, as examples:
  32. By Bash
  33. ^^^^^^^
  34. Setting environment variables via `Bash` is the most straightforward and easiest way. We give some examples about
  35. how to change them by Bash.
  36. .. code-block:: bash
  37. # Modify Java Gateway Address
  38. export PYDS_JAVA_GATEWAY_ADDRESS="192.168.1.1"
  39. # Modify Workflow Default User
  40. export PYDS_WORKFLOW_USER="custom-user"
  41. After executing the commands above, both ``PYDS_JAVA_GATEWAY_ADDRESS`` and ``PYDS_WORKFLOW_USER`` will be changed.
  42. The next time you execute and submit your workflow, it will submit to host `192.168.1.1`, and with workflow's user
  43. named `custom-user`.
  44. By Python OS Module
  45. ^^^^^^^^^^^^^^^^^^^
  46. pydolphinscheduler is a Python API for Apache DolphinScheduler, and you could modify or add system environment
  47. variables via Python ``os`` module. In this example, we change variables as the same value as we change in
  48. `Bash <by bash>`_. It will take effect the next time you run your workflow, and call workflow ``run`` or ``submit``
  49. method next to ``os.environ`` statement.
  50. .. code-block:: python
  51. import os
  52. # Modify Java Gateway Address
  53. os.environ["PYDS_JAVA_GATEWAY_ADDRESS"] = "192.168.1.1"
  54. # Modify Workflow Default User
  55. os.environ["PYDS_WORKFLOW_USER"] = "custom-user"
  56. All Configurations in Environment Variables
  57. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  58. All environment variables as below, and you could modify their value via `Bash <by bash>`_ or `Python OS Module <by python os module>`_
  59. +------------------+------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  60. | Variable Section | Variable Name | description |
  61. +==================+====================================+====================================================================================================================+
  62. | | ``PYDS_JAVA_GATEWAY_ADDRESS`` | Default Java gateway address, will use its value when it is set. |
  63. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  64. | Java Gateway | ``PYDS_JAVA_GATEWAY_PORT`` | Default Java gateway port, will use its value when it is set. |
  65. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  66. | | ``PYDS_JAVA_GATEWAY_AUTO_CONVERT`` | Default boolean Java gateway auto convert, will use its value when it is set. |
  67. +------------------+------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  68. | | ``PYDS_USER_NAME`` | Default user name, will use when user's ``name`` when does not specify. |
  69. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  70. | | ``PYDS_USER_PASSWORD`` | Default user password, will use when user's ``password`` when does not specify. |
  71. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  72. | Default User | ``PYDS_USER_EMAIL`` | Default user email, will use when user's ``email`` when does not specify. |
  73. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  74. | | ``PYDS_USER_PHONE`` | Default user phone, will use when user's ``phone`` when does not specify. |
  75. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  76. | | ``PYDS_USER_STATE`` | Default user state, will use when user's ``state`` when does not specify. |
  77. +------------------+------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  78. | | ``PYDS_WORKFLOW_PROJECT`` | Default workflow project name, will use its value when workflow does not specify the attribute ``project``. |
  79. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  80. | | ``PYDS_WORKFLOW_TENANT`` | Default workflow tenant, will use its value when workflow does not specify the attribute ``tenant``. |
  81. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  82. | Default Workflow | ``PYDS_WORKFLOW_USER`` | Default workflow user, will use its value when workflow does not specify the attribute ``user``. |
  83. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  84. | | ``PYDS_WORKFLOW_QUEUE`` | Default workflow queue, will use its value when workflow does not specify the attribute ``queue``. |
  85. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  86. | | ``PYDS_WORKFLOW_WORKER_GROUP`` | Default workflow worker group, will use its value when workflow does not specify the attribute ``worker_group``. |
  87. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  88. | | ``PYDS_WORKFLOW_RELEASE_STATE`` | Default workflow release state, will use its value when workflow does not specify the attribute ``release_state``. |
  89. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  90. | | ``PYDS_WORKFLOW_TIME_ZONE`` | Default workflow worker group, will use its value when workflow does not specify the attribute ``timezone``. |
  91. + +------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  92. | | ``PYDS_WORKFLOW_WARNING_TYPE`` | Default workflow warning type, will use its value when workflow does not specify the attribute ``warning_type``. |
  93. +------------------+------------------------------------+--------------------------------------------------------------------------------------------------------------------+
  94. .. note::
  95. The scope of setting configuration via environment variable is in the workflow, and it will not change the
  96. value of the configuration file. The :doc:`CLI <cli>` command ``config --get`` and ``config --set`` operate
  97. the value of the configuration file, so the command ``config --get`` may return a different value from what
  98. you set in the environment variable, and command ``config --get`` will never change your environment variable.
  99. Using Configuration File
  100. ------------------------
  101. If you want to persist and manage configuration in a file instead of environment variables, or maybe you want
  102. want to save your configuration file to a version control system, like Git or SVN, and the way to change
  103. configuration by file is the best choice.
  104. Export Configuration File
  105. ^^^^^^^^^^^^^^^^^^^^^^^^^
  106. pydolphinscheduler allows you to change the built-in configurations via CLI or editor you like. pydolphinscheduler
  107. integrated built-in configurations in its package, but you could also export it locally by CLI
  108. .. code-block:: bash
  109. pydolphinscheduler config --init
  110. And it will create a new YAML file in the path `~/pydolphinscheduler/config.yaml` by default. If you want to export
  111. it to another path, you should set `PYDS_HOME` before you run command :code:`pydolphinscheduler config --init`.
  112. .. code-block:: bash
  113. export PYDS_HOME=<CUSTOM_PATH>
  114. pydolphinscheduler config --init
  115. After that, your configuration file will export into `<CUSTOM_PATH>/config.yaml` instead of the default path.
  116. Change Configuration
  117. ^^^^^^^^^^^^^^^^^^^^
  118. In section `export configuration file`_ you export the configuration file locally, and as a local file, you could
  119. edit it with any editor you like. After you save your change in your editor, the latest configuration will work
  120. when you run your workflow code.
  121. You could also query or change the configuration via CLI :code:`config --get <config>` or :code:`config --get <config> <val>`.
  122. Both `--get` and `--set` could be called one or more times in single command, and you could only set the leaf
  123. node of the configuration but could get the parent configuration, there are simple examples below:
  124. .. code-block:: bash
  125. # Get single configuration in the leaf node,
  126. # The output look like below:
  127. # java_gateway.address = 127.0.0.1
  128. pydolphinscheduler config --get java_gateway.address
  129. # Get multiple configuration in the leaf node,
  130. # The output look like below:
  131. # java_gateway.address = 127.0.0.1
  132. # java_gateway.port = 25333
  133. pydolphinscheduler config --get java_gateway.address --get java_gateway.port
  134. # Get parent configuration which contain multiple leaf nodes,
  135. # The output look like below:
  136. # java_gateway = ordereddict([('address', '127.0.0.1'), ('port', 25333), ('auto_convert', True)])
  137. pydolphinscheduler config --get java_gateway
  138. # Set single configuration,
  139. # The output look like below:
  140. # Set configuration done.
  141. pydolphinscheduler config --set java_gateway.address 192.168.1.1
  142. # Set multiple configuration
  143. # The output look like below:
  144. # Set configuration done.
  145. pydolphinscheduler config --set java_gateway.address 192.168.1.1 --set java_gateway.port 25334
  146. # Set configuration not in leaf node will fail
  147. # The output look like below:
  148. # Raise error.
  149. pydolphinscheduler config --set java_gateway 192.168.1.1,25334,True
  150. For more information about our CLI, you could see document :doc:`cli`.
  151. All Configurations in File
  152. ^^^^^^^^^^^^^^^^^^^^^^^^^^
  153. Here are all our configurations for pydolphinscheduler.
  154. .. literalinclude:: ../../src/pydolphinscheduler/core/default_config.yaml
  155. :language: yaml
  156. :lines: 18-
  157. Priority
  158. --------
  159. We have two ways to modify the configuration and there is a built-in config in pydolphinscheduler too. It is
  160. very important to understand the priority of the configuration when you use them. The overview of configuration
  161. priority is.
  162. ``Environment Variables > Configurations File > Built-in Configurations``
  163. This means that your setting in environment variables or configurations file will overwrite the built-in one.
  164. And you could temporarily modify configurations by setting environment variables without modifying the global
  165. config in the configuration file.