|
| 1 | + |
| 2 | +# Human study about playing text-based games |
| 3 | + |
| 4 | +## What we want to show |
| 5 | + |
| 6 | +- Which priors do we need to remove to make the Human perfomance comparable to an RL agent? |
| 7 | +- Which priors are more important? |
| 8 | + |
| 9 | +## What are the prior embeded into Human brings to a new task? |
| 10 | + |
| 11 | +- object permenance |
| 12 | + |
| 13 | +## Standard games |
| 14 | + |
| 15 | +We are going to use games from the First TextWorld Competition. The goal in those games is to find the recipe, read it, gather the ingredient, and process them according to the recipe. |
| 16 | + |
| 17 | +- Nb. rooms: 12 |
| 18 | +- Recipe: 3 ingredients |
| 19 | +- Doors/containers are initially closed |
| 20 | +- Player will have to slice/dice/chop and cook ingredients. |
| 21 | + |
| 22 | +Questions: |
| 23 | + |
| 24 | +- Do we limit the number of moves or simply have time limit? |
| 25 | + |
| 26 | +## Study methodology |
| 27 | + |
| 28 | +Definitions: |
| 29 | + |
| 30 | +- *entity*: any interactable object, a room or a direction. |
| 31 | +- *fake word*: a sequence of characters that looks like English |
| 32 | + |
| 33 | +Fake words should be consistent across the games played by one annotator. |
| 34 | + |
| 35 | +Questions: |
| 36 | + |
| 37 | +- how much time per game? |
| 38 | +- how much time overall? |
| 39 | +- how many games per annotator? |
| 40 | + |
| 41 | +## Variations |
| 42 | + |
| 43 | +Here we described the pertubations/treatments we are going to apply on standard games. |
| 44 | + |
| 45 | +### Highlighting entities |
| 46 | + |
| 47 | +- *Motivation:* speed up experiments by reducing the cognitive load. |
| 48 | +- *Expectation:* the player will spend less time per screen since parsing the text for relevant should be easier. |
| 49 | +- *RL setting:* this corresponds in providing the list of entities relevant to the game. |
| 50 | + |
| 51 | +### Replacing all entity names with made-up words |
| 52 | + |
| 53 | +- *Motivation:* remove prior information Humans have about the affordances of an entity based on its name. |
| 54 | +- *Expectation:* the player will have to spend more commands trying to figure out what entity corresponds to a knife when cutting is needed, and what entity corresponds to a heat source when cooking is needed. |
| 55 | +- *RL setting:* this corresponds to not using pre-trained word embeddings for the entities. |
| 56 | + |
| 57 | +### Removing context around entities |
| 58 | + |
| 59 | +- *Motivation:* check if the context's words are essential to understand it when entities are provided. |
| 60 | +- *Expectation:* I believe Human can infer a lot from the entities. |
| 61 | +- *RL setting:* |
| 62 | + |
| 63 | +### Replacing the names of the commands |
| 64 | + |
| 65 | +*The list of commands will still be visible in the help but without any explanation. |
| 66 | + |
| 67 | +- *Motivation:* remove prior Humans have about the effect commands have. |
| 68 | +- *Expectation:* players will have to spend more time understanding the feedback information in order to figure out what happen and what a particular command does. |
| 69 | +- *RL setting:* this corresponds to not using pre-trained word embeddings for the words found in text commands. |
| 70 | + |
| 71 | +### Random order of words in sentence |
| 72 | + |
| 73 | +*I'm not sure about this one. Unless we have a smart way of randomizing the words so it creates ambiguity, I don't see the point in doing that. |
| 74 | + |
| 75 | +- *Motivation:* |
| 76 | +- *Expectation:* |
| 77 | +- *RL setting:* |
| 78 | + |
| 79 | +### Player force to make a map on paper vs. a mental one |
| 80 | + |
| 81 | +- *Motivation:* |
| 82 | +- *Expectation:* human with access to paper to draw a map will be more efficient at exploring the world and coming back to previously visited room. |
| 83 | +- *RL setting:* use explicit memory (we the hope it will be used to keep track of the rooms layout). |
| 84 | + |
| 85 | +### Replacing characters or inverting characters within words |
| 86 | + |
| 87 | +*Since we are focusing on word-level language model, I don't think it makes sense to change characters. |
| 88 | + |
| 89 | +## Information to provide/ask the annotators |
| 90 | + |
| 91 | +- Introduction/Tutorial to text-based games? |
| 92 | + |
| 93 | +- The nature of the experiment |
| 94 | + |
| 95 | + - Quest oriented |
| 96 | + |
| 97 | +- What are the limitation: |
| 98 | + |
| 99 | + - Moves (less moves payoff, with a cliff first X moves are "free") |
| 100 | + - Time (should fix) |
| 101 | + - Deaths |
| 102 | + |
| 103 | +- What information we are going to capture during playthrough: |
| 104 | + |
| 105 | + - Command |
| 106 | + - Time on each "screen" |
| 107 | + |
| 108 | +Outro: |
| 109 | + |
| 110 | +- Words association |
| 111 | + |
| 112 | +### From a discussion with Akshay Krishnamurthy |
| 113 | + |
| 114 | +Relation to ICML paper [Investigating Human Priors for Playing Video Games][1]. We should replace all words with made-up ones. |
| 115 | +Our experiments might be able to show that human priors are more "important" in text-based games than video games. Reversely, RL agents have to learn more to be able to performance decently on text-based games. |
| 116 | + |
| 117 | +For instance, even if we change the textures, human can still easily locate their agent since only a small part of the screen might change in response to an action. On top of that, since there are only a few actions a human can try, the mapping phase (action-reaction) is easier and faster to build. |
| 118 | + |
| 119 | +Adam: maybe translating the words to a new language would be better as it would keep some structure. |
| 120 | + |
| 121 | +## TODO |
| 122 | + |
| 123 | +- Ask Emery to test the games to get some sense of the duration. |
| 124 | + |
| 125 | +[1]: https://arxiv.org/pdf/1802.10217.pdf |
0 commit comments