-
Notifications
You must be signed in to change notification settings - Fork 134
Labels
enhancementNew feature or requestNew feature or request
Description
Right now, the triton_wrapper seems to work fine as long as the server hosting the model scales correctly with the number of active jobs. If the server doesn't keep up, a random chunk would raise a tritonclient.utils.InferenceServerExpection, and the whole evaluation is terminated (Which if very difficult to track down).
We can have this handled either upstream in the trion_wrapper instance, or if we think these server configuration handling should be done on the analyst side, it can be patched into any subclass of triton_wrapper with something like:
from coffea.ml_tools.triton_wrapper import triton_wrapper
from tritonclient.utils import InferenceServerException
class my_model_wrapper(triton_wrapper):
def prepare_awkward(self, *args, **kwargs):
pass # Or whatever is needed
def numpy_call(self, output_list, input_dict, ncalls=0):
"""
Overloading the upstream numpy_call to allow for up to 3 times failure
from transient issues with server. (Adding default ncalls will allow this to
work seemlessly with the the existing upstream value)
"""
try:
return super().numpy_call(output_list, input_dict)
except InferenceServerException as err:
print("Caught inference server exception:", err)
if ncalls > 3:
print("Not resolved after 3 retries... this is an actual error")
raise err
else:
return self.numpy_call(output_list, input_dict, ncalls + 1)Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request
Type
Projects
Status
In Progress