An Open Ethics evaluation dataset for Open-Assistant

A lot of people will interact with OA. One main objective would be to keep the bot away from many biases that may originate from the base model. However, there are not many ethics-related datasets let alone a systematic evaluation.

Generating a systematic evaluation on an ethics-related dataset would be very difficult since ethics & values are totally different in many parts of the world. A good practical example would be the current "Football World Cup". People from all parts of the world join to celebrate football but still, there were cultural differences (like LGBTQ beliefs between the Middle Eastern vs Western cultures). Now when you train your base model with text from WOKE culture your model is subject to that bias. The current system of training framework (SGD variant optimization algos) cannot avoid these features.

So planning a systematic evaluation would require a large community effort. Here's a tentative proposal of how we should attempt to solve this,

1. **Building a systematic data pipeline**: This is the hardest part that we won't be able to automate. We need to scrape through literature and find "thought experiments" (like "[Trolley Problem](https://en.wikipedia.org/wiki/Trolley_problem)") and integrate them into the dataset. This should be the systematic approach. Crowdsourcing would be much more difficult because ethics and philosophy are different for different people. We need an actual **domain expert** to categorize different concepts of philosophy and Ethics. We shouldn't randomly add any evaluation just because it feels like correct to our own ethics. Like a simple question, do you want your chatbot to follow "[Utilitarian morality](https://en.wikipedia.org/wiki/Utilitarianism)" or "[Deontological Morality](https://plato.stanford.edu/entries/ethics-deontological/})"? I know building something like this would be much more difficult in the first iteration, but at least starting a pipeline would be great.

2. **Evaluation**: Doing automatic evaluation on ethics & philosophy based question would not be possible. This can be crowd-sourced and a lot of people can contribute to this. I would deeply recommend not to automate the evaluation, rather always perform a human evaluation.

3. **Training Pipeline to remove the found biases**: As we find new biases, we need a faster approach to train the model (prompt training/prefix training/full model training etc.) to remove the biases from the base model. I think planning ahead for this feature would save a lot of time & compute down to the line.

4. **Interpretability Layer**: I think this is the hardest part. Finding the reason why the chatbot is generating such text would be really good (i.e., https://www.perplexity.ai/).   I think this is a fundamental feature that would be a requirement for any chatbot not strictly related to Ethics. Fundamentally, successful integration of the interoperability layer would change the landscape for Ethics and Lincencing issues in the language model.

**Personal Note**: I'm by no means a student of "Ethics and Philosophy". If you are interested, I would recommend following this course, https://www.youtube.com/watch?v=kBdfcR-8hEY
Stanford also has some good resource here, https://stanford-cs324.github.io/winter2022/lectures/harms-1/
I'm here to learn and possibly facilitate creating the dataset. I would really appreciate it if particular domain experts join in the discussion. 

** Creating this issue after discussing the stuff with @ontocord . Hope this helps the community. 





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

An Open Ethics evaluation dataset for Open-Assistant #883

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

An Open Ethics evaluation dataset for Open-Assistant #883

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions