发布时间:2025-06-24 17:58:54  作者:北方职教升学中心  阅读量:826


项目场景和问题描述:

Fllink 批任务运行一段时间后出现如下错误:

java.nio.file.NoSuchFileException: /tmp/flink-netty-shuffle-c5222ebc-a7bb-4fa1-bfd2-c7b5c9bd9b67/3740ddaa0f56ec8bcce80927e4a05443.channel.shuffle.data

详细信息如下:

2024-07-1814:27:54org.apache.flink.runtime.JobException:Recoveryis suppressed by NoRestartBackoffTimeStrategyat org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.handleFailure(ExecutionFailureHandler.java:139)at org.apache.flink.runtime.executiongraph.failover.flip1.ExecutionFailureHandler.getFailureHandlingResult(ExecutionFailureHandler.java:83)at org.apache.flink.runtime.scheduler.DefaultScheduler.recordTaskFailure(DefaultScheduler.java:256)at org.apache.flink.runtime.scheduler.DefaultScheduler.handleTaskFailure(DefaultScheduler.java:247)at org.apache.flink.runtime.scheduler.DefaultScheduler.onTaskFailed(DefaultScheduler.java:240)at org.apache.flink.runtime.scheduler.SchedulerBase.onTaskExecutionStateUpdate(SchedulerBase.java:738)at org.apache.flink.runtime.scheduler.SchedulerBase.updateTaskExecutionState(SchedulerBase.java:715)at org.apache.flink.runtime.scheduler.SchedulerNG.updateTaskExecutionState(SchedulerNG.java:78)at org.apache.flink.runtime.jobmaster.JobMaster.updateTaskExecutionState(JobMaster.java:477)at sun.reflect.GeneratedMethodAccessor91.invoke(UnknownSource)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)at java.lang.reflect.Method.invoke(Method.java:498)at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.lambda$handleRpcInvocation$1(AkkaRpcActor.java:309)at org.apache.flink.runtime.concurrent.akka.ClassLoadingUtils.runWithContextClassLoader(ClassLoadingUtils.java:83)at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcInvocation(AkkaRpcActor.java:307)at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleRpcMessage(AkkaRpcActor.java:222)at org.apache.flink.runtime.rpc.akka.FencedAkkaRpcActor.handleRpcMessage(FencedAkkaRpcActor.java:84)at org.apache.flink.runtime.rpc.akka.AkkaRpcActor.handleMessage(AkkaRpcActor.java:168)at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24)at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20)at scala.PartialFunction.applyOrElse(PartialFunction.scala:123)at scala.PartialFunction.applyOrElse$(PartialFunction.scala:122)at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20)at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:171)at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:172)at akka.actor.Actor.aroundReceive(Actor.scala:537)at akka.actor.Actor.aroundReceive$(Actor.scala:535)at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220)at akka.actor.ActorCell.receiveMessage(ActorCell.scala:580)at akka.actor.ActorCell.invoke(ActorCell.scala:548)at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)at akka.dispatch.Mailbox.run(Mailbox.scala:231)at akka.dispatch.Mailbox.exec(Mailbox.scala:243)at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289)at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056)at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692)at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157)Causedby:java.io.IOException:Failedtocreatefile writer.at org.apache.flink.runtime.io.network.partition.SortMergeResultPartition.setupInternal(SortMergeResultPartition.java:185)at org.apache.flink.runtime.io.network.partition.ResultPartition.setup(ResultPartition.java:161)at org.apache.flink.runtime.taskmanager.Task.setupPartitionsAndGates(Task.java:946)at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:639)at org.apache.flink.runtime.taskmanager.Task.run(Task.java:550)at java.lang.Thread.run(Thread.java:748)Causedby:java.nio.file.NoSuchFileException:/tmp/flink-netty-shuffle-45d4cbd5-f22e-47d0-98ff-7f11dd98a0b7/07391f7b184c5f8b8e234abee6649daa.channel.shuffle.data	at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)at sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)at java.nio.channels.FileChannel.open(FileChannel.java:287)at java.nio.channels.FileChannel.open(FileChannel.java:335)at org.apache.flink.runtime.io.network.partition.PartitionedFileWriter.openFileChannel(PartitionedFileWriter.java:133)at org.apache.flink.runtime.io.network.partition.PartitionedFileWriter.<init>(PartitionedFileWriter.java:121)at org.apache.flink.runtime.io.network.partition.SortMergeResultPartition.setupInternal(SortMergeResultPartition.java:182)...5more

原因分析:

Flink任务按Batch模式执行时,中间结果数据回进行落地,默认放在/tmp/目录下。Flink 会在这些目录之间均匀分配临时文件。

修改Flink flink-conf.yaml配置文件,配置taskmanager.tmp.dirs 文件目录,避免使用/tmp目录,如:taskmanager.tmp.dirs: /opt/flink_shuffle/


解决方案:

taskmanager.tmp.dirs:这个配置项允许你设置 TaskManager 使用的一个或多个临时文件目录列表。故如果shuffle目录被删除后,再执行任务就会报java.nio.file.NoSuchFileException: /tmp/flink-netty-shuffle-c5222ebc-a7bb-4fa1-bfd2-c7b5c9bd9b67/3740ddaa0f56ec8bcce80927e4a05443.channel.shuffle.data的错误。因/tmp目录linux系统一般默认每10天进行定期删除。