generate_text_func currently does not correctly return finish_reason=TOKEN_LIMIT when reaching the model token limit:
TOKEN_LIMIT refers to the maximum number of tokens limit defined by the model whereas the MAX_TOKENS refers to the maximum number defined by the user. So one can reach TOKEN_LIMIT before MAX_TOKENS
Originally posted by @gkumbhat in #210 (comment)