There is no config.json in pre-trained BERT models by DeepPavlov
Created by: yurakuratov
BERT pre-trained models from http://docs.deeppavlov.ai/en/master/features/pretrained_vectors.html#bert have bert_config.json
instead of config.json
. This leads to errors when these models are used with HuggingFace Transformers:
from transformers import AutoTokenizer
t = AutoTokenizer.from_pretrained("./conversational_cased_L-12_H-768_A-12_v1")
OSError Traceback (most recent call last)
<ipython-input-2-1a3f920b5ef3> in <module>
----> 1 t = AutoTokenizer.from_pretrained("/home/yurakuratov/.deeppavlov/downloads/bert_models/conversational_cased_L-12_H-768_A-12_v1")
~/anaconda3/envs/dp_tf1.15/lib/python3.7/site-packages/transformers/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
184 config = kwargs.pop("config", None)
185 if not isinstance(config, PretrainedConfig):
--> 186 config = AutoConfig.from_pretrained(pretrained_model_name_or_path, **kwargs)
187
188 if "bert-base-japanese" in pretrained_model_name_or_path:
~/anaconda3/envs/dp_tf1.15/lib/python3.7/site-packages/transformers/configuration_auto.py in from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
185 """
186 config_dict, _ = PretrainedConfig.get_config_dict(
--> 187 pretrained_model_name_or_path, pretrained_config_archive_map=ALL_PRETRAINED_CONFIG_ARCHIVE_MAP, **kwargs
188 )
189
~/anaconda3/envs/dp_tf1.15/lib/python3.7/site-packages/transformers/configuration_utils.py in get_config_dict(cls, pretrained_model_name_or_path, pretrained_config_archive_map, **kwargs)
268 )
269 )
--> 270 raise EnvironmentError(msg)
271
272 except json.JSONDecodeError:
OSError: Can't load '/home/yurakuratov/.deeppavlov/downloads/bert_models/conversational_cased_L-12_H-768_A-12_v1'. Make sure that:
- '/home/yurakuratov/.deeppavlov/downloads/bert_models/conversational_cased_L-12_H-768_A-12_v1' is a correct model identifier listed on 'https://huggingface.co/models'
- or '/home/yurakuratov/.deeppavlov/downloads/bert_models/conversational_cased_L-12_H-768_A-12_v1' is the correct path to a directory containing a 'config.json' file
Renaming bert_config.json
to config.json
should solve the problem.