Skip to content

Duplicate sequences in benchmark data #24

@iaposto

Description

@iaposto

Hello. Congrats on the paper and very interesting tool!
I am working with the data provided in the documentation, specifically the New AutoPeptideML Benchmarks set that you used for model development.
I noticed there are duplicate sequences in the training and test sets for most bioactivity datasets, as well as overlap of peptides in the two sets. Was this intended?

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions