Triton inference automatic resubmit

Right now, the `triton_wrapper` seems to work fine as long as the server hosting the model scales correctly with the number of active jobs. If the server doesn't keep up, a random chunk would raise a `tritonclient.utils.InferenceServerExpection`, and the whole evaluation is terminated (Which if very difficult to track down). 

We can have this handled either upstream in the `trion_wrapper` instance, or if we think these server configuration handling should be done on the analyst side, it can be patched into any subclass of triton_wrapper with something like: 


```python
from coffea.ml_tools.triton_wrapper import triton_wrapper
from tritonclient.utils import InferenceServerException

class my_model_wrapper(triton_wrapper):
    def prepare_awkward(self, *args, **kwargs):
        pass # Or whatever is needed
        
    def numpy_call(self, output_list, input_dict, ncalls=0):
        """
        Overloading the upstream numpy_call to allow for up to 3 times failure 
        from transient issues with server. (Adding default ncalls will allow this to 
        work seemlessly with the the existing upstream value)
        """
        try:
            return super().numpy_call(output_list, input_dict)
        except InferenceServerException as err:
            print("Caught inference server exception:", err)
            if ncalls > 3:
                print("Not resolved after 3 retries... this is an actual error")
                raise err
            else:
                return self.numpy_call(output_list, input_dict, ncalls + 1)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Triton inference automatic resubmit #1190

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Triton inference automatic resubmit #1190

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions