Teach lib.parallel to handle misbehaving child processes.
child processes of lib.parallel can sometimes get killed unexpectedly, for
example using kill -9. When this happens, cbuildbot hangs forever, and
buildbot eventually kills the whole tree of processes, also killing
all of our debug information. That's bad.
Instead, I've adjusted cbuildbot to correctly handle the case where a process
exits unexpectedly.
As part of this change, I refactored the parallel module a bit to make it
easier to test and mock out attributes (you can mock out class attributes,
but not module attributes).
BUG=chromium:216309
TEST=Hundreds of trybot runs, including ones where they hang.
Change-Id: I101845540956873729ff50071cf95f7f661eb25c
Reviewed-on: https://gerrit.chromium.org/gerrit/45855
Commit-Queue: David James <davidjames@chromium.org>
Reviewed-by: David James <davidjames@chromium.org>
Tested-by: David James <davidjames@chromium.org>
diff --git a/scripts/cbuildbot.py b/scripts/cbuildbot.py
index e7d24f8..309411f 100644
--- a/scripts/cbuildbot.py
+++ b/scripts/cbuildbot.py
@@ -349,6 +349,22 @@
return sync_stage
+ @staticmethod
+ def _RunParallelStages(stage_objs):
+ """Run the specified stages in parallel."""
+ steps = [stage.Run for stage in stage_objs]
+ try:
+ parallel.RunParallelSteps(steps)
+ except BaseException as ex:
+ # If a stage threw an exception, it might not have correctly reported
+ # results (e.g. because it was killed before it could report the
+ # results.) In this case, attribute the exception to any stages that
+ # didn't report back correctly (if any).
+ for stage in stage_objs:
+ if not results_lib.Results.StageHasResults(stage.name):
+ results_lib.Results.Record(stage.name, ex, str(ex))
+ raise
+
def _RunBackgroundStagesForBoard(self, board):
"""Run background board-specific stages for the specified board."""
archive_stage = self.archive_stages[board]
@@ -372,8 +388,8 @@
stage_list.append([stages.ASyncHWTestStage, board, archive_stage,
suite])
- steps = [self._GetStageInstance(*x, config=config).Run for x in stage_list]
- parallel.RunParallelSteps(steps + [archive_stage.Run])
+ stage_objs = [self._GetStageInstance(*x, config=config) for x in stage_list]
+ self._RunParallelStages(stage_objs + [archive_stage])
def RunStages(self):
"""Runs through build process."""