A: DolphinScheduler
A: DolphinScheduler consists of 5 services, MasterServer, WorkerServer, ApiServer, AlertServer, LoggerServer and UI.
Service | Description |
---|---|
MasterServer | Mainly responsible for DAG segmentation and task status monitoring |
WorkerServer/LoggerServer | Mainly responsible for the submission, execution and update of task status. LoggerServer is used for Rest Api to view logs through RPC |
ApiServer | Provides the Rest Api service for the UI to call |
AlertServer | Provide alarm service |
UI | Front page display |
Note:Due to the large number of services, it is recommended that the single-machine deployment is preferably 4 cores and 16G or more.
A: Support most mailboxes, qq, 163, 126, 139, outlook, aliyun, etc. are supported. Support TLS and SSL protocols, optionally configured in alert.properties
A: Please refer to 'System parameter' in the system-manual
A: This is the python connection Zookeeper needs to use, it is used to delete the master/worker temporary node info in the Zookeeper. so you can ignore error if it's your first install. after version 1.3.0, kazoo is not been needed, we use program to replace what kazoo done
A: version 1.2 and berfore, Use the administrator to create a Worker group, specify the Worker group when the process definition starts, or specify the Worker group on the task node. If not specified, use Default, Default is to select one of all the workers in the cluster to use for task submission and execution. version 1.3, you can set worker group for the worker
A: We also support the priority of processes and tasks. Priority We have five levels of HIGHEST, HIGH, MEDIUM, LOW and LOWEST. You can set the priority between different process instances, or you can set the priority of different task instances in the same process instance. For details, please refer to the task priority design in the architecture-design.
A: Execute in the root directory: mvn -U clean package assembly:assembly -Dmaven.test.skip=true , then refresh the entire project. version 1.3 not use grpc, we use netty directly
A: In theory, only the Worker needs to run on Linux. Other services can run normally on Windows. But it is still recommended to deploy on Linux.
A: Install npm install node-sass --unsafe-perm separately, then npm install
A: 1, if it is node startup, check whether the .env API_BASE configuration under dolphinscheduler-ui is the Api Server service address.
2, If it is nginx booted and installed via install-dolphinscheduler-ui.sh, check if the proxy_pass configuration in /etc/nginx/conf.d/dolphinscheduler.conf is the Api Server service address
3, if the above configuration is correct, then please check if the Api Server service is normal,
curl http://localhost:12345/dolphinscheduler/users/get-user-info, check the Api Server log,
if Prompt cn.dolphinscheduler.api.interceptor.LoginHandlerInterceptor:[76] - session info is null, which proves that the Api Server service is normal.
4, if there is no problem above, you need to check if server.context-path and server.port configuration in application.properties is correct
A: 1, first check whether the MasterServer service exists through jps, or directly check whether there is a master service in zk from the service monitoring.
2,If there is a master service, check the command status statistics or whether new records are added in t_ds_error_command. If it is added, please check the message field.
A: 1, first check whether the WorkerServer service exists through jps, or directly check whether there is a worker service in zk from the service monitoring.
2,If the WorkerServer service is normal, you need to check whether the MasterServer puts the task task in the zk queue. You need to check whether the task is blocked in the MasterServer log and the zk queue.
3, if there is no problem above, you need to locate whether the Worker group is specified, but the machine grouped by the worker is not online.
A: Provide Docker image and Dockerfile.
Docker image address: https://hub.docker.com/r/escheduler/escheduler_images
Dockerfile address: https://github.com/qiaozhanwei/escheduler_dockerfile/tree/master/docker_escheduler
A: 1, if the replacement variable contains special characters, use the \ transfer character to transfer
2, installPath="/data1_1T/dolphinscheduler", this directory can not be the same as the install.sh directory currently installed with one click.
3, deployUser = "dolphinscheduler", the deployment user must have sudo privileges, because the worker is executed by sudo -u tenant sh xxx.command
4, monitorServerState = "false", whether the service monitoring script is started, the default is not to start the service monitoring script. If the service monitoring script is started, the master and worker services are monitored every 5 minutes, and if the machine is down, it will automatically restart.
5, hdfsStartupSate="false", whether to enable HDFS resource upload function. The default is not enabled. If it is not enabled, the resource center cannot be used. If enabled, you need to configure the configuration of fs.defaultFS and yarn in conf/common/hadoop/hadoop.properties. If you use namenode HA, you need to copy core-site.xml and hdfs-site.xml to the conf root directory.
Note: The 1.0.x version does not automatically create the hdfs root directory, you need to create it yourself, and you need to deploy the user with hdfs operation permission.
A : For versions prior to 1.0.4, modify the code under the escheduler-api cn.escheduler.api.quartz package.
public boolean deleteJob(String jobName, String jobGroupName) {
lock.writeLock().lock();
try {
JobKey jobKey = new JobKey(jobName,jobGroupName);
if(scheduler.checkExists(jobKey)){
logger.info("try to delete job, job name: {}, job group name: {},", jobName, jobGroupName);
return scheduler.deleteJob(jobKey);
}else {
return true;
}
} catch (SchedulerException e) {
logger.error(String.format("delete job : %s failed",jobName), e);
} finally {
lock.writeLock().unlock();
}
return false;
}
A: No. Because the tenant created by HDFS is not started, the tenant directory will not be registered in HDFS. So the last resource will report an error.
A: Note: Master monitors Master and Worker services.
1,If the Master service is lost, other Masters will take over the process of the hanged Master and continue to monitor the Worker task status.
2,If the Worker service is lost, the Master will monitor that the Worker service is gone. If there is a Yarn task, the Kill Yarn task will be retried.
Please see the fault-tolerant design in the architecture for details.
A: The 1.0.3 version only implements the fault tolerance of the Master startup process, and does not take the Worker Fault Tolerance. That is to say, if the Worker hangs, no Master exists. There will be problems with this process. We will add Master and Worker startup fault tolerance in version 1.1.0 to fix this problem. If you want to manually modify this problem, you need to modify the running task for the running worker task that is running the process across the restart and has been dropped. The running process is set to the failed state across the restart. Then resume the process from the failed node.
A : Note when setting the timing. If the first digit (* * * * * ? *) is set to *, it means execution every second. We will add a list of recently scheduled times in version 1.1.0. You can see the last 5 running times online at http://cron.qqe2.com/
A: Yes, if the timing start and end time is the same time, then this timing will be invalid timing. If the end time of the start and end time is smaller than the current time, it is very likely that the timing will be automatically deleted.
A: 1, the task dependency between DAG, is from the zero degree of the DAG segmentation
2, there are task dependent nodes, you can achieve cross-process tasks or process dependencies, please refer to the (DEPENDENT) node design in the system-manual.
Note: Cross-project processes or task dependencies are not supported
A: 1, in the process definition list, click the Start button.
2, the process definition list adds a timer, scheduling start process definition.
3, process definition view or edit the DAG page, any task node right click Start process definition.
4, you can define DAG editing for the process, set the running flag of some tasks to prohibit running, when the process definition is started, the connection of the node will be removed from the DAG.
A: 1,for the version after 1.0.3 only need to modify PYTHON_HOME in bin/env/dolphinscheduler_env.sh
export PYTHON_HOME=/bin/python
Note: This is PYTHON_HOME , which is the absolute path of the python command, not the simple PYTHON_HOME. Also note that when exporting the PATH, you need to directly
export PATH=$HADOOP_HOME/bin:$SPARK_HOME1/bin:$SPARK_HOME2/bin:$PYTHON_HOME:$JAVA_HOME/bin:$HIVE_HOME/bin:$PATH
2,For versions prior to 1.0.3, the Python task only supports the Python version of the system. It does not support specifying the Python version.
A: We will add the kill task in 1.0.4 and kill all the various child processes generated by the task.
A : The queue in the DolphinScheduler can be configured on the user or the tenant. The priority of the queue specified by the user is higher than the priority of the tenant queue. For example, to specify a queue for an MR task, the queue is specified by mapreduce.job.queuename.
Note: When using the above method to specify the queue, the MR uses the following methods:
Configuration conf = new Configuration();
GenericOptionsParser optionParser = new GenericOptionsParser(conf, args);
String[] remainingArgs = optionParser.getRemainingArgs();
If it is a Spark task --queue mode specifies the queue
A : Change the value of master.properties master.reserved.memory under conf to a smaller value, say 0.1 or the value of worker.properties worker.reserved.memory is a smaller value, say 0.1
A : Will hive pom
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>2.1.0</version>
</dependency>
change into
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>1.1.0</version>
</dependency>
A: 1, Create deployment user and hosts mapping, please refer 1.3 part of cluster deployment
2, Configure hosts mapping and ssh access and modify directory permissions. please refer 1.4 part of cluster deployment
3, Copy the deployment directory from worker server that has already deployed
4, Go to bin dir, then start worker server
```
./dolphinscheduler-daemon.sh start worker-server
```
A: 1, The release process of Apache Project happens in the mailing list. You can subscribe DolphinScheduler's mailing list and then when the release is in process, you'll receive release emails. Please follow this introduction to subscribe DolphinScheduler's mailing list.
2, When new version published, there would be release note which describe the change log, and there also have upgrade document for the previous version to new's.
3, Version number is x.y.z, when x is increased, it represents the version of the new architecture. When y is increased, it means that it is incompatible with the y version before it needs to be upgraded by script or other manual processing. When the z increase represents a bug fix, the upgrade is fully compatible. No additional processing is required. Remaining problem, the 1.0.2 upgrade is not compatible with 1.0.1 and requires an upgrade script.
A: When start the workflow, you can set the task failure strategy: continue or failure.
A:
1.2.1 version
master.properties
Control the max parallel number of master node workflows
master.exec.threads=100
Control the max number of parallel tasks in each workflow
master.exec.task.number=20
worker.properties
Control the max parallel number of worker node tasks
worker.exec.threads=100
A: This bug describe the problem detail and it has been been solved in version 1.2.1.
For version under 1.2.1, some tips for this situation:
1. clear the task queue in zk for path: /dolphinscheduler/task_queue
2. change the state of the task to failed( integer value: 6).
3. run the work flow by recover from failed
A: bug fix:
1, confirm hostname
$hostname
hadoop1
2, hostname -i
127.0.0.1 10.3.57.15
3, edit /etc/hosts,delete hadoop1 from 127.0.0.1 record
$cat /etc/hosts
127.0.0.1 localhost
10.3.57.15 ds1 hadoop1
4, hostname -i
10.3.57.15
Hostname cmd return server hostname, hostname -i return all matched ips configured in /etc/hosts. So after I delete the hostname matched with 127.0.0.1, and only remain internal ip resolution, instead of remove all the 127.0.0.1 resolution record. As long as hostname cmd return the correct internal ip configured in /etc/hosts can fix this bug. DolphinScheduler use the first record returned by hostname -i command. In my opion, DS should not use hostname -i to get the ip , as in many companies the devops configured the server name, we suggest use ip configured in configuration file or znode instead of /etc/hosts.
A: The scheduling system not support second frequency task.
A: 1, cd dolphinscheduler-ui and delete node_modules directory
sudo rm -rf node_modules
2, install node-sass through npm.taobao.org
sudo npm uninstall node-sass
sudo npm i node-sass --sass_binary_site=https://npm.taobao.org/mirrors/node-sass/
3, if the 2nd step failure, please, referer url
sudo npm rebuild node-sass
When solved this problem, if you don't want to download this node every time, you can set system environment variable: SASS_BINARY_PATH= /xxx/xxx/xxx/xxx.node.
A: 1, Edit project root dir maven config file, remove scope test property so that mysql driver can be loaded.
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>${mysql.connector.version}</version>
<scope>test<scope>
</dependency>
2, Edit application-dao.properties and quzrtz.properties config file to use mysql driver. Default is postgresSql driver because of license problem.
A: 1, Where is the executed server? Specify one worker to run the task, you can create worker group in Security Center, then the task can be send to the particular worker. If a worker group have multiple servers, which server actually execute is determined by scheduling and has randomness.
2, If it is a shell file of a path on the server, how to point to the path? The server shell file, involving permissions issues, it is not recommended to do so. It is recommended that you use the storage function of the resource center, and then use the resource reference in the shell editor. The system will help you download the script to the execution directory. If the task dependent on resource center files, worker use "hdfs dfs -get" to get the resource files in HDFS, then run the task in /tmp/escheduler/exec/process, this path can be customized when installtion dolphinscheduler.
3, Which user execute the task? Task is run by the tenant through "sudo -u ${tenant}", tenant is a linux user.
A: 1, I suggest you use 3 nodes for stability if you don't have too many tasks to run. And deploy Master/Worker server on different nodes is better. If you only have one node, you of course only can deploy them together! By the way, how many machines you need is determined by your business. The DolphinScheduler system itself does not use too many resources. Test more, and you'll find the right way to use a few machines.
A: 1, DEPENDENT task node actually does not have script, it used for config data cycle dependent logic, and then add task node after that to realize task cycle dependent.
A: 1, User changed the config api server config file and item
, thus lead to the problem. After resume to the default value and problem solved.
You can use the three strategies provided by dolphinscheduler to get the available ip:
Modify the configuration in common.properties
:
# network IP gets priority, default: inner outer
# dolphin.scheduler.network.priority.strategy=default
After configuration is modified, restart the service to activation
If the ip address is still wrong, please download dolphinscheduler-netutils.jar to the machine, execute the following commands and feedback the output to the community developers:
java -jar target/dolphinscheduler-netutils.jar
Configure the sudo permission of the dolphinscheduler account to be an ordinary user manager within the scope of some ordinary users, and restrict specified users to run certain commands on the specified host. For detailed configuration, please see sudo rights management For example, sudo permission management configuration dolphinscheduler OS account can only operate the permissions of users userA, userB, userC (users userA, userB, and userC are used for multi-tenant submitting jobs to the big data cluster)
echo 'dolphinscheduler ALL=(userA,userB,userC) NOPASSWD: NOPASSWD: ALL' >> /etc/sudoers
sed -i 's/Defaults requirett/#Defaults requirett/g' /etc/sudoers
A:By deploying different worker in different yarn clusters,the steps are as follows(eg: AWS EMR):
Deploying the worker server on the master node of the EMR cluster
Changing yarn.application.status.address
to current emr's yarn url in the conf/common.properties
Execute command bin/dolphinscheduler-daemon.sh start worker-server
to start worker-server
A:Before DS 2.0.4 (after 2.0.0-alpha), there may be a problem of duplicate keys TaskDefinition due to version switching, which may cause the update workflow to fail; you can refer to the following SQL to delete duplicate data, taking MySQL as an example: (Note: Before operating, be sure to back up the original data, the SQL from pr#8408)
DELETE FROM t_ds_process_task_relation_log WHERE id IN
(
SELECT
x.id
FROM
(
SELECT
aa.id
FROM
t_ds_process_task_relation_log aa
JOIN
(
SELECT
a.process_definition_code
,MAX(a.id) as min_id
,a.pre_task_code
,a.pre_task_version
,a.post_task_code
,a.post_task_version
,a.process_definition_version
,COUNT(*) cnt
FROM
t_ds_process_task_relation_log a
JOIN (
SELECT
code
FROM
t_ds_process_definition
GROUP BY code
)b ON b.code = a.process_definition_code
WHERE 1=1
GROUP BY a.pre_task_code
,a.post_task_code
,a.pre_task_version
,a.post_task_version
,a.process_definition_code
,a.process_definition_version
HAVING COUNT(*) > 1
)bb ON bb.process_definition_code = aa.process_definition_code
AND bb.pre_task_code = aa.pre_task_code
AND bb.post_task_code = aa.post_task_code
AND bb.process_definition_version = aa.process_definition_version
AND bb.pre_task_version = aa.pre_task_version
AND bb.post_task_version = aa.post_task_version
AND bb.min_id != aa.id
)x
)
;
DELETE FROM t_ds_task_definition_log WHERE id IN
(
SELECT
x.id
FROM
(
SELECT
a.id
FROM
t_ds_task_definition_log a
JOIN
(
SELECT
code
,name
,version
,MAX(id) AS min_id
FROM
t_ds_task_definition_log
GROUP BY code
,name
,version
HAVING COUNT(*) > 1
)b ON b.code = a.code
AND b.name = a.name
AND b.version = a.version
AND b.min_id != a.id
)x
)
;
A:The repair can be completed by executing the following SQL in the database:
update t_ds_version set version='2.0.1';
After version 3.0.0-alpha, Python gateway server integrate into API server, and Python gateway service will start when you
start API server. If you want disabled when Python gateway service you could change API server configuration in path
api-server/conf/application.yaml
and change attribute python-gateway.enabled : false
.