-
Notifications
You must be signed in to change notification settings - Fork 3.4k
Description
Trino will fail an entire query if there's a single bad file. For example if files are gzipped and there's a gzip file with an incomplete final chunk in the file.
If you have full control of the data being queried and can sanitize it to remove all such files then you can work around this issue but that is not always possible.
Trino could support this use case by allowing for an option to ignore malformed files.
Even better - an option specifically for gzipped files to search whatever portion of the file can be read.
A smaller feature than the above but one that would help significantly in these cases would be to return the specific file causing the problem in the error message - https://github.com/trinodb/trino/blob/421/plugin/trino-hive/src/main/java/io/trino/plugin/hive/GenericHiveRecordCursor.java#L236.
Today its difficult to locate the file that's causing the problem. And once located - one still can't use Trino for queries over all the files until all the invalid files are cleared or fixed in some way.