Procházet zdrojové kódy

HPCC-27439 Ensure job done is sent to slaves after part failure

If a job fails during initialization (e.g. out of disk saving
query dll), we still need to make sure the job done call is
made to all slaves, to enure they are cleared up.
Without it the CJobSlave instance was leaked.

Signed-off-by: Jake Smith <jake.smith@lexisnexisrisk.com>
Jake Smith před 3 roky
rodič
revize
baac4ab5cf
2 změnil soubory, kde provedl 3 přidání a 1 odebrání
  1. 1 1
      thorlcr/graph/thgraphmaster.cpp
  2. 2 0
      thorlcr/slave/slavmain.cpp

+ 1 - 1
thorlcr/graph/thgraphmaster.cpp

@@ -1707,9 +1707,9 @@ void CJobMaster::sendQuery()
     compressToBuffer(msg, tmp.length(), tmp.toByteArray());
 
     CTimeMon queryToSlavesTimer;
+    querySent = true;
     broadcast(queryNodeComm(), msg, masterSlaveMpTag, LONGTIMEOUT, "sendQuery");
     PROGLOG("Serialization of query init info (%d bytes) to slaves took %d ms", msg.length(), queryToSlavesTimer.elapsed());
-    querySent = true;
 }
 
 void CJobMaster::jobDone()

+ 2 - 0
thorlcr/slave/slavmain.cpp

@@ -1916,6 +1916,8 @@ public:
                         StringAttr key;
                         msg.read(key);
                         CJobSlave *job = jobs.find(key.get());
+                        if (!job)
+                            throw makeStringException(0, "QueryDone: job not found"); // can happen if job failed during initialization on some slaves
                         StringAttr wuid = job->queryWuid();
                         StringAttr graphName = job->queryGraphName();