-
Notifications
You must be signed in to change notification settings - Fork 55
Description
Describe the bug
As part of my automated testing, I observed that some requests were returning very quickly.
Investigations highlighted that min_new_tokens: 0 was the reason for the quick return:
max_new_tokens: 25, min_new_tokens: 0Request generated 5 tokens before EosToken
while the query is set with max_new_tokens == min_new_tokens.
Further investigations let to this reproducer:
HOST_1=flan-t5-small-cpu-1-predictor-watsonx-e2e-0.apps.20231017-06h53-watsonx-ci-kpouget.psap.aws.rhperfscale.org:443;
HOST_2=flan-t5-small-cpu-2-predictor-watsonx-e2e-1.apps.20231017-06h53-watsonx-ci-kpouget.psap.aws.rhperfscale.org:443;
get_proto() {
grpcurl -insecure $1 describe caikit.runtime.Nlp.TextGenerationTaskRequest
}
get_proto $HOST_1 > 1
get_proto $HOST_2 > 2
echo diff
diff --side <(get_proto $HOST_1) <(get_proto $HOST_2)
which highlights that the protos returned by two endpoints running the same image are different.
Image is quay.io/opendatahub/caikit-tgis-serving@sha256:794adc22d52cb3ac4b5aadfb286e8431cca829acdc4909719329cf8c4fabb4ec
Platform
Caikit packages in this image have this version:
caikit 0.19.3
caikit-nlp 0.0.1 /caikit/src/caikit-nlp
caikit-tgis-backend 0.1.18
Python 3.9
Sample Code
See above.
The invalid launch happens ~50% of the time, from what I observed.
Expected behavior
The prototypes are always the same.
Observed behavior
The prototypes do not have the same ordering.
No error printed anywhere.
Additional info
The location of this block (+ the field numbering) is the key difference between the "different versions" of the protos:
oneof _preserve_input_text {
bool preserve_input_text = 15;
}
