300字范文,内容丰富有趣,生活中的好帮手!
300字范文 > flink on yarn模式出现The main method caused an error: Could not deploy Yarn job cluster问题排查+解决

flink on yarn模式出现The main method caused an error: Could not deploy Yarn job cluster问题排查+解决

时间:2024-07-24 05:22:13

相关推荐

flink on yarn模式出现The main method caused an error: Could not deploy Yarn job cluster问题排查+解决

报错复现:

flink run -m yarn-cluster -p 2 -yjm 700m -ytm 1024m -c WordCount target/bbb-1.0-SNAPSHOT.jar

完整报错如下:

The program finished with the following exception:org.apache.flink.client.program.ProgramInvocationException: The main method caused an error: Could not deploy Yarn job cluster.at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:335)at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:205)at org.apache.flink.client.ClientUtils.executeProgram(ClientUtils.java:138)at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:662)at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:210)at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:893)at org.apache.flink.client.cli.CliFrontend.lambda$main$10(CliFrontend.java:966)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:966)Caused by: org.apache.flink.client.deployment.ClusterDeploymentException: Could not deploy Yarn job cluster.at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:398)at org.apache.flink.client.deployment.executors.AbstractJobClusterExecutor.execute(AbstractJobClusterExecutor.java:70)at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.executeAsync(StreamExecutionEnvironment.java:1733)at org.apache.flink.streaming.api.environment.StreamContextEnvironment.executeAsync(StreamContextEnvironment.java:94)at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:63)at org.apache.flink.streaming.api.environment.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.java:1620)at WordCount.main(WordCount.java:47)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:321)... 11 moreCaused by: org.apache.flink.yarn.YarnClusterDescriptor$YarnDeploymentException: The YARN application unexpectedly switched to state FAILED during deployment. Diagnostics from YARN: Application application_1591614969089_0002 failed 1 times (global limit =2; local limit is =1) due to AM Container for appattempt_1591614969089_0002_000001 exited with exitCode: 1Failing this attempt.Diagnostics: [-06-08 19:18:12.457]Exception from container-launch.Container id: container_1591614969089_0002_01_000001Exit code: 1[-06-08 19:18:12.466]Container exited with a non-zero exit code 1. Error file: prelaunch.err.Last 4096 bytes of prelaunch.err :[-06-08 19:18:12.467]Container exited with a non-zero exit code 1. Error file: prelaunch.err.Last 4096 bytes of prelaunch.err :For more detailed output, check the application tracking page: http://Desktop:8188/applicationhistory/app/application_1591614969089_0002 Then click on links to logs of each attempt.. Failing the application.If log aggregation is enabled on your cluster, use this command to further investigate the issue:yarn logs -applicationId application_1591614969089_0002at org.apache.flink.yarn.YarnClusterDescriptor.startAppMaster(YarnClusterDescriptor.java:999)at org.apache.flink.yarn.YarnClusterDescriptor.deployInternal(YarnClusterDescriptor.java:488)at org.apache.flink.yarn.YarnClusterDescriptor.deployJobCluster(YarnClusterDescriptor.java:391)... 22 more-06-08 19:18:12,659 INFO org.apache.flink.yarn.YarnClusterDescriptor - Cancelling deployment from Deployment Failure Hook-06-08 19:18:12,660 INFO org.apache.hadoop.yarn.client.RMProxy- Connecting to ResourceManager at Desktop/192.168.0.103:8032-06-08 19:18:12,661 INFO org.apache.hadoop.yarn.client.AHSProxy - Connecting to Application History server at Desktop/192.168.0.103:10020-06-08 19:18:12,661 INFO org.apache.flink.yarn.YarnClusterDescriptor - Killing YARN application-06-08 19:18:12,668 INFO org.apache.hadoop.yarn.client.api.impl.YarnClientImpl - Killed application application_1591614969089_0002-06-08 19:18:12,769 INFO org.apache.flink.yarn.YarnClusterDescriptor - Deleting files in hdfs://Desktop:9000/user/appleyuchi/.flink/application_1591614969089_0002.

比较难排查的一个报错,注意确保HADOOP的日志服务器打开,即确保jps中有:

JobHistoryServer,启动命令为:

"$HADOOP_HOME/bin/mapred --daemon start historyserver"

打开时间线服务器

yarn timelineserver

进行完上述操作后,yarn界面的各个端口应该都能打开了。

#######################################################################################

然后在yarn界面的log中看到如下报错:

-06-08 19:21:02,071 INFO org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Shutting YarnJobClusterEntrypoint down with application status FAILED. Diagnostics org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.at org.apache.flink.ponent.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)Caused by: .BindException: Could not start rest endpoint on any port in port range 8082at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:228)at org.apache.flink.ponent.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)... 9 more.-06-08 19:21:02,076 INFO org.apache.flink.runtime.blob.BlobServer - Stopped BLOB server at 0.0.0.0:37633-06-08 19:21:02,077 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.-06-08 19:21:02,082 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopping Akka RPC service.-06-08 19:21:02,087 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.-06-08 19:21:02,088 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.-06-08 19:21:02,095 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Shutting down remote daemon.-06-08 19:21:02,095 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remote daemon shut down; proceeding with flushing remote transports.-06-08 19:21:02,110 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.-06-08 19:21:02,110 INFO akka.remote.RemoteActorRefProvider$RemotingTerminator - Remoting shut down.-06-08 19:21:02,130 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.-06-08 19:21:02,131 INFO org.apache.flink.runtime.rpc.akka.AkkaRpcService - Stopped Akka RPC service.-06-08 19:21:02,132 ERROR org.apache.flink.runtime.entrypoint.ClusterEntrypoint - Could not start cluster entrypoint .apache.flink.runtime.entrypoint.ClusterEntrypointException: Failed to initialize the cluster entrypoint YarnJobClusterEntrypoint.at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:187)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runClusterEntrypoint(ClusterEntrypoint.java:518)at org.apache.flink.yarn.entrypoint.YarnJobClusterEntrypoint.main(YarnJobClusterEntrypoint.java:119)Caused by: org.apache.flink.util.FlinkException: Could not create the DispatcherResourceManagerComponent.at org.apache.flink.ponent.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:261)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.runCluster(ClusterEntrypoint.java:215)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.lambda$startCluster$0(ClusterEntrypoint.java:169)at java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)at org.apache.flink.runtime.entrypoint.ClusterEntrypoint.startCluster(ClusterEntrypoint.java:168)... 2 moreCaused by: .BindException: Could not start rest endpoint on any port in port range 8082at org.apache.flink.runtime.rest.RestServerEndpoint.start(RestServerEndpoint.java:228)at org.apache.flink.ponent.DefaultDispatcherResourceManagerComponentFactory.create(DefaultDispatcherResourceManagerComponentFactory.java:165)... 9 more

##############################################################

端口问题,但是这个端口并没有占用啊,所以我也懵逼了一会儿。

犯错原因:

这两个文件中的端口要保持统一,我忘记修改masters文件了,从而导致了上述复杂的报错。

这里之所以默认的8081要改成8082是因为8081被spark给占用了,所以我当时修改完flink-conf.yaml就忘乎所以了。

最终解决方案:

flink-conf.yaml:rest.port: 8082

masters:Desktop:8082

然后别忘记这两个文件同步更新到集群中的其他节点。

关闭眼前的所有终端,重新开一个终端,因为配置文件只有在你开启新终端的情况下才会生效。

本内容不代表本网观点和政治立场,如有侵犯你的权益请联系我们处理。
网友评论
网友评论仅供其表达个人看法,并不表明网站立场。