-
Notifications
You must be signed in to change notification settings - Fork 290
Description
Before Reporting 报告之前
-
I have pulled the latest code of main branch to run again and the bug still existed. 我已经拉取了主分支上最新的代码,重新运行之后,问题仍不能解决。
-
I have read the README carefully and no error occurred during the installation process. (Otherwise, we recommend that you can ask a question using the Question template) 我已经仔细阅读了 README 上的操作指引,并且在安装过程中没有错误发生。(否则,我们建议您使用Question模板向我们进行提问)
Search before reporting 先搜索,再报告
OS 系统
Linux
Installation Method 安装方式
pip
Data-Juicer Version Data-Juicer版本
latest
Python Version Python版本
3.10
Describe the bug 描述这个bug
yaml文件中配置 validator field_type(官方脚本)
`validators: # validators are a list of validators to be applied when loading a dataset
# it checks a sample of the dataset for each validator
# check data_juicer/ore/data/data_validator.py for more validator options
- type: 'required_fields' # required_fields is a validator to check the required fields in the dataset.
required_fields: # required_fields is a list of required fields.- "text"
field_types: # field_types is a dictionary of field types.
text: 'str'`
其中 field_types 在 data_juicer/core/data/data_validator.py 中被设置为expected_type = self.field_types.get(field)
这会导致读取到的 expected_type 为字符串类型的 str、list....
在校验时 invalid_types = [type(v) for v in sample_values if v is not None and not isinstance(v, expected_type)] 没有将 expected_type 转为 type 类型,导致抛出异常
TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union
To Reproduce 如何复现
只要 validator 的yaml 文件设置 field_types 即可复现
Configs 配置信息
No response
Logs 报错日志
TypeError: isinstance() arg 2 must be a type, a tuple of types, or a union
Screenshots 截图
No response
Additional 额外信息
只需要对expected_type进行类型转换即可解决此问题