Converting model to pytorch

# 🐛 Bug
Folks, I am trying to convert the Biobert model to Pytorch. Here are the things that I did so far:

**1. For the vocab:**  I am trying to convert the vocab using solution from #69 : 
```tokenizer = BartTokenizer.from_pretrained('/content/biobert_v1.1_pubmed/vocab.txt')```

I get :
`OSError: Model name '/content/biobert_v1.1_pubmed' was not found in tokenizers model name list (bart-large, bart-large-mnli, bart-large-cnn, bart-large-xsum). We assumed '/content/biobert_v1.1_pubmed' was a path, a model identifier, or url to a directory containing vocabulary files named ['vocab.json', 'merges.txt'] but couldn't find such vocabulary files at this path or url.`
	
I don’t have the vocab.json, so I how do I convert the vocab for the tokenizer ? 
	
**2. For the model:** As the out of the box `pytorch_pretrained_bert.convert_tf_checkpoint_to_pytorch` did not work I customized it per #2 by adding: 

```
excluded = ['BERTAdam','_power','global_step']
init_vars = list(filter(lambda x:all([True if e not in x[0] else False for e in excluded]),init_vars))

```
With this the model 'seems' to be converting fine.  But When I load this using:

`model = BartForConditionalGeneration.from_pretrained('path/to/model/biobert_v1.1_pubmed_pytorch.model') `

I still get 

`UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte
`

Can you pl. help me to understand what is going on here ? 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Converting model to pytorch #4712

🐛 Bug

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Converting model to pytorch #4712

Description

🐛 Bug

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions