Python parallelism-The runner that was too nice.

Azure offers the possibility to send commands and scripts (using Powershell if they are Windows hosts) to your VMs using the Run Command. This is something we can take advantage of to reduce toil and comfortably manage large numbers of servers. One of the projects in which I participate is the development of an in-house Python tool that, among other things, allows parallelizing this function by sending the same command or script to several VMs.

During development I have been in charge of this part of the tool since its initial version. But I wasn’t happy with it, the first version was too nice: It waited for all instances of the scripts launched in parallel to finish without giving any context or intermediate result, even if one crashed or was incredibly slow. It was time to debug.

Let’s take a look at how the program is organized (the code has been truncated and any referenced modules should be assumed to be correctly imported)

The polite Runner

class Runner:    
    """     
    Class to manage the launch of scripts in parallel.     
    ASCYNCHRONOUS.     
    """     
    def __init__(targetvmlist):       
        [...]
    def runScripts(self):         
        """
        Run the script against all the VMs specified during object instantiation.
        """         
        with ThreadPoolExecutor(max_workers=30) as executor:
            futures = []
            for vm in self.targetvmlist:                 
            #Instantiate a PowerShellScript object per VM to launch in. Sumbit all to the executor                 
            pwshS = PowerShellScript(vm)                 
            futures.append(executor.submit(pwshS.run))             
            execution_outputs=[]             
            #When execution is complete, return output             
            for future in as_completed(futures):                 
                execution_outputs.append(future.result())
            return(execution_outputs)

PowershellScript

class PowerShellScript:
    """
    Represents a powershell script object
    """     
    def __init__([...]):
        [...]     
    
    def run(self):         
        """         
        Runs a powershell script in a certain VM.         
        """         
        # Commands are prepared and readied here         
        [...]         
        ran_script = subprocess.run(command_with_parameters, capture_output=True)         
        #Simplified, I actually prepare an output dict         
        [...]
        return ran_script.stdout

The PowershellScript class uses subprocess.run to launch the AzCLI command. It is the method that is parallelized.
The Runner class uses ThreadPoolExecutor as a context manager to parallelize execution, based on ThreadPoolExecutor example in the official docs. The script objects to send (its .run method) are queued by the ThreadPoolExecutor and represented as Futures. A Future is, in parallel execution, an abstract concept that represents the function that has been sent to execute.
According to the ThreadPoolExecutor documentation, each queued function will use its own thread. So for n Powershell scripts we will have n threads.
Futures complete execution as fast as possible and return their result to the ThreadPoolExecutor using as_completed. This async approach makes sense and allows that, if the first function in the queue takes longer, others can finish anyway and exit to return output.
The first version is incredibly naive, with no error handling or timeouts of any kind. A slow script can hang the entire program even if all other threads complete perfectly. And this happened often.

So my first step, after reading more about concurrent.futures, was to add a timeout to as_completed, which throws a TimeoutError exception if a thread stays running longer than the specified number of seconds.

Less polite Runner

class Runner:     
    """     
    Class to manage the launch of scripts in parallel.     
    ASCYNCHRONOUS.     
    """     
    def __init__(targetvmlist):       
        [...]         
    
    def runScripts(self):         
        """         
        Run the script against all the VMs specified during object instantiation.         
        """         
        with ThreadPoolExecutor(max_workers=30) as executor:
            futures = []             
            for vm in self.targetvmlist:                 
            #Instantiate a PowerShellScript object per VM to launch in. Sumbit all to the executor                 
            pwshS = PowerShellScript(vm)                 
            futures.append(executor.submit(pwshS.run))             
            execution_outputs=[]             
            #When execution is complete, return output             
            try:                 
                for future in as_completed(futures, timeout=240):                     
                    execution_outputs.append(future.result())             
            except TimeoutError as error:                 
                execution_outputs.append(f"TIMEOUT ERROR")             
            finally:
                  return(execution_outputs)

This is a good start, right? Yes and no. Runner was still extremely nice: if a thread was running for more than 240 seconds, it would throw the exception but not terminate execution until all threads had finished. It was not returning control to the function calling Runner. A quick look through the docs made me think that I could use future.cancel() to kill the thread. But you can only kill threads that are queued but haven’t started running yet, and if one of my threads was throwing TimeoutError then they surely were running.

Killing a thread that has started executing is tricky, in my limited experience. That’s when it occurred to me that doing things well and simply is usually best. At that moment I have:

An instance of the Runner class that uses the ThreadPoolExecutor to execute in parallel functions it receives, in this case representing executions of a remote Powershell script using subprocess.
That, essentially, means that I have x threads with subprocess prepared to execute with a ThreadPoolExecutor.
When one of those threads encounters a problem and stays running, the entire program hangs waiting for it. My Runner leaves no one behind.
This is because the ThreadPoolExecutor, and thus my Runner, is still too nice. Even throwing TimeoutError and catching that exception, the thread continues to run. When the rest have finished, the program continues to wait for the thread that timed out and hasn’t finished executing yet. This is by ThreadPoolExecutor´s design.

At this point I could choose to catch the timeout, continue and not wait for all the results, but the thread would still run and the ThreadPoolExecutor would still wait. I didn’t like the idea too much.

I also considered trying to force ThreadPoolExecutor to stop and kill any threads that haven’t terminated yet, but this too seemed drastic and counter-intuitive to me. Surely there was a simpler, more correct solution. What was blocked or could run slower than expected? It was not some abstract process over which he had no control. It was an instance of my PowerShellScript class, which inside used subprocess. The easiest way to approach this was not to force ThreadPoolExecutor to care and control what that object that executed the script using subprocess did, it was to force those objects to behave appropriately. I had to add a timeout to subprocess so it wouldn’t force the rest of the program to wait indefinitely.

Less narcissitic PowershellScript

class PowerShellScript:     
    """
    Represents a powershell script object
    """     
        def __init__([...]):
            [...]     
        def run(self):         
            """         
            Runs a powershell script in a certain VM.         
            """         
            # Commands are prepared and readied here         
            [...]         
            ran_script = subprocess.run(command_with_parameters, capture_output=True, timeout=220)         
            #Simplified, I actually prepare an output dict         
            [...]
            return ran_script.stdout

Now, if a script takes too long, it’s a problem that is handled in the class that represents it, and the timeout occurs there. I thought I was having a problem with parallel execution, but I think I was having a coupling problem. Forcing a class to handle a timeout and execution that should never have been its problem.

Like I said, I’m learning parallelism in Python and other intermediate concepts (I’m reluctant to call anything I’m learning yet “advanced”), but along the way I keep realizing that the basics are never sufficiently understood.