소스 검색

HPCC-27439 Ensure job done is sent to slaves after part failure

If a job fails during initialization (e.g. out of disk saving
query dll), we still need to make sure the job done call is
made to all slaves, to enure they are cleared up.
Without it the CJobSlave instance was leaked.

Signed-off-by: Jake Smith <jake.smith@lexisnexisrisk.com>
Jake Smith 3 년 전
부모
커밋
baac4ab5cf
2개의 변경된 파일3개의 추가작업 그리고 1개의 파일을 삭제
  1. 1 1
      thorlcr/graph/thgraphmaster.cpp
  2. 2 0
      thorlcr/slave/slavmain.cpp

+ 1 - 1
thorlcr/graph/thgraphmaster.cpp

@@ -1707,9 +1707,9 @@ void CJobMaster::sendQuery()
     compressToBuffer(msg, tmp.length(), tmp.toByteArray());
 
     CTimeMon queryToSlavesTimer;
+    querySent = true;
     broadcast(queryNodeComm(), msg, masterSlaveMpTag, LONGTIMEOUT, "sendQuery");
     PROGLOG("Serialization of query init info (%d bytes) to slaves took %d ms", msg.length(), queryToSlavesTimer.elapsed());
-    querySent = true;
 }
 
 void CJobMaster::jobDone()

+ 2 - 0
thorlcr/slave/slavmain.cpp

@@ -1916,6 +1916,8 @@ public:
                         StringAttr key;
                         msg.read(key);
                         CJobSlave *job = jobs.find(key.get());
+                        if (!job)
+                            throw makeStringException(0, "QueryDone: job not found"); // can happen if job failed during initialization on some slaves
                         StringAttr wuid = job->queryWuid();
                         StringAttr graphName = job->queryGraphName();