diff --git a/utilities/Convert_sqlite2csv-Clean.ipynb b/utilities/Convert_sqlite2csv-Clean.ipynb
new file mode 100644
index 000000000..e7badbafa
--- /dev/null
+++ b/utilities/Convert_sqlite2csv-Clean.ipynb
@@ -0,0 +1,617 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Use metadata to filter out full datasets\n",
+    "\n",
+    "From Felix:\n",
+    "\n",
+    "The id is the value that later ends up in observation, and denotes a unique dataset. The payload designates a given data transmission as either incremental, meaning that it’s a slice of data that’s transmitted during the experiment, or full, meaning that it contains the entire dataset to that point (we transmit the full dataset upon completion of the experiment). For incremental datasets, the slice indicates the first row from the dataset contained in that transmission.\n",
+    "\n",
+    "\n",
+    "So the idea here is to convert the `metadata` which is a series of STRING dictonaries <sup>[1](#myfootnote1)</sup> (those are dictionaries but saved as strings) into a DF. Following that to get the corresponding id or row slice number and select this row from the data list of JSONs.\n",
+    "\n",
+    "\n",
+    " \n",
+    " <a name=\"myfootnote1\">1</a>: For example see the following output:\n",
+    "\n",
+    "```\n",
+    "surveys_df['metadata'][0]\n",
+    "\n",
+    "'{\"slice\":47,\"id\":\"17734ede-5c5d-4c50-a48e-2acfaac1aa68\",\"payload\":\"incremental\"}'\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Follow Felix's R code"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Import data from SQLite file"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import pandas as pd\n",
+    "import sqlite3\n",
+    "import numpy as np\n",
+    "\n",
+    "con = sqlite3.connect(\"./data.sqlite\")\n",
+    "\n",
+    "# Load all the data into a DataFrame\n",
+    "database_d = pd.read_sql_query(\"SELECT * from labjs\", con)\n",
+    "\n",
+    "# Be sure to close the connection\n",
+    "con.close()"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Convert JSON data to table"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Extract metadata"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create a dataframe from the metadata column\n",
+    "database_d_meta = database_d['metadata'].apply(eval).apply(pd.Series).rename(columns={\"id\":\"observation\"})"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>slice</th>\n",
+       "      <th>observation</th>\n",
+       "      <th>payload</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>0</td>\n",
+       "      <td>0a460f84-7833-4d43-b505-96d04bdf76d5</td>\n",
+       "      <td>full</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>0</td>\n",
+       "      <td>0a460f84-7833-4d43-b505-96d04bdf76d5</td>\n",
+       "      <td>incremental</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   slice                           observation      payload\n",
+       "0      0  0a460f84-7833-4d43-b505-96d04bdf76d5         full\n",
+       "1      0  0a460f84-7833-4d43-b505-96d04bdf76d5  incremental"
+      ]
+     },
+     "execution_count": 3,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "database_d_meta"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Merge metadata back"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# merge the two and remove metadata column\n",
+    "database_d = database_d.join(database_d_meta).drop(\"metadata\", axis=1)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>id</th>\n",
+       "      <th>session</th>\n",
+       "      <th>timestamp</th>\n",
+       "      <th>url</th>\n",
+       "      <th>data</th>\n",
+       "      <th>slice</th>\n",
+       "      <th>observation</th>\n",
+       "      <th>payload</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1</td>\n",
+       "      <td>o5pbac2mcj749kscpssso4f56d</td>\n",
+       "      <td>2021-06-08T09:32:43+00:00</td>\n",
+       "      <td>http://face-experiment.weizmann.ac.il/3/</td>\n",
+       "      <td>[{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0a460f84-7833-4d43-b505-96d04bdf76d5</td>\n",
+       "      <td>full</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2</td>\n",
+       "      <td>o5pbac2mcj749kscpssso4f56d</td>\n",
+       "      <td>2021-06-08T09:32:43+00:00</td>\n",
+       "      <td>http://face-experiment.weizmann.ac.il/3/</td>\n",
+       "      <td>[{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0a460f84-7833-4d43-b505-96d04bdf76d5</td>\n",
+       "      <td>incremental</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   id                     session                  timestamp  \\\n",
+       "0   1  o5pbac2mcj749kscpssso4f56d  2021-06-08T09:32:43+00:00   \n",
+       "1   2  o5pbac2mcj749kscpssso4f56d  2021-06-08T09:32:43+00:00   \n",
+       "\n",
+       "                                        url  \\\n",
+       "0  http://face-experiment.weizmann.ac.il/3/   \n",
+       "1  http://face-experiment.weizmann.ac.il/3/   \n",
+       "\n",
+       "                                                data  slice  \\\n",
+       "0  [{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...      0   \n",
+       "1  [{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...      0   \n",
+       "\n",
+       "                            observation      payload  \n",
+       "0  0a460f84-7833-4d43-b505-96d04bdf76d5         full  \n",
+       "1  0a460f84-7833-4d43-b505-96d04bdf76d5  incremental  "
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "database_d"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Shorten the random ids"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "To check how many letters in the ID we need to have as short as possible but unique identifiers, it goes over the ID and observation column and checks if number of unique parts of the string equals to number of the unique whole tags.\n",
+    "\n",
+    "We are looking for a shortest sequence -- what is the minimal number of characters which will preserve how many unique inputs there is."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "5\n"
+     ]
+    }
+   ],
+   "source": [
+    "for i in range(5,37):\n",
+    "    if (database_d.session.str.slice(stop=i).unique().size == database_d.session.unique().size ) \\\n",
+    "    and (database_d.observation.str.slice(stop=i).unique().size==database_d.observation.unique().size):\n",
+    "        print(i)        \n",
+    "        # if found the minimal code, replace the original session and observation with this shorted code\n",
+    "        database_d.session = database_d.session.str.slice(stop=i)\n",
+    "        database_d.observation = database_d.observation.str.slice(stop=i)\n",
+    "        break\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {
+    "scrolled": false
+   },
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>id</th>\n",
+       "      <th>session</th>\n",
+       "      <th>timestamp</th>\n",
+       "      <th>url</th>\n",
+       "      <th>data</th>\n",
+       "      <th>slice</th>\n",
+       "      <th>observation</th>\n",
+       "      <th>payload</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>1</td>\n",
+       "      <td>o5pba</td>\n",
+       "      <td>2021-06-08T09:32:43+00:00</td>\n",
+       "      <td>http://face-experiment.weizmann.ac.il/3/</td>\n",
+       "      <td>[{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0a460</td>\n",
+       "      <td>full</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>2</td>\n",
+       "      <td>o5pba</td>\n",
+       "      <td>2021-06-08T09:32:43+00:00</td>\n",
+       "      <td>http://face-experiment.weizmann.ac.il/3/</td>\n",
+       "      <td>[{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...</td>\n",
+       "      <td>0</td>\n",
+       "      <td>0a460</td>\n",
+       "      <td>incremental</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "</div>"
+      ],
+      "text/plain": [
+       "   id session                  timestamp  \\\n",
+       "0   1   o5pba  2021-06-08T09:32:43+00:00   \n",
+       "1   2   o5pba  2021-06-08T09:32:43+00:00   \n",
+       "\n",
+       "                                        url  \\\n",
+       "0  http://face-experiment.weizmann.ac.il/3/   \n",
+       "1  http://face-experiment.weizmann.ac.il/3/   \n",
+       "\n",
+       "                                                data  slice observation  \\\n",
+       "0  [{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...      0       0a460   \n",
+       "1  [{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...      0       0a460   \n",
+       "\n",
+       "       payload  \n",
+       "0         full  \n",
+       "1  incremental  "
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "database_d"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Prepare to extract JSON data"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Extract complete datasets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "Int64Index([0], dtype='int64')\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "1"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "completed_idx = database_d.loc[database_d.payload == 'full'].index\n",
+    "print(completed_idx)\n",
+    "completed_idx.size"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# if no experiment ended on full, skip this part\n",
+    "if not completed_idx.empty:\n",
+    "    # get all the metadata which end up on full AND add the observation column to each register\n",
+    "    database_d_full = pd.concat([pd.read_json(database_d['data'].iloc[i], dtype={'sender_id': object}).join(\n",
+    "        pd.DataFrame({'observation': [database_d.loc[i].observation]})) for i in completed_idx])\n",
+    "\n",
+    "    # unpack the first JSON meta layer\n",
+    "    temp = pd.DataFrame.from_dict(database_d_full.meta.apply(pd.Series)).drop(0, axis=1)\n",
+    "\n",
+    "    # Unpack the inner meta layer, add prefix\n",
+    "    temp = pd.concat([temp, temp['labjs_build'].apply(pd.Series).drop(0, axis=1)],\n",
+    "                     axis=1).drop('labjs_build', axis=1).add_prefix(\"labjs_build_\")\n",
+    "\n",
+    "    # concat with the original df and drop id and data columns and rename the empty col to be called x as in the R version\n",
+    "    # also add a prefix meta_ to all the meta columns\n",
+    "    database_d_full = pd.concat([database_d_full, temp.add_prefix(\"meta_\")], axis=1).rename(columns={\"\": \"x\"}).drop(\n",
+    "        \"meta\", axis=1)\n",
+    "\n",
+    "    # add observation to all columns\n",
+    "    database_d_full.observation.fillna(method=\"ffill\", inplace=True)\n",
+    "    \n",
+    "else:\n",
+    "    database_d_full=pd.DataFrame({\"observation\":[]})\n",
+    "    print(\"No experiment ended on full!\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Extract incremental datasets\n",
+    "\n",
+    "do the exact same thing but with the refisters which did not end up on full and are only partial"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "temp = None\n",
+    "# get all the metadata which does not end up on full AND add the observation column to each register\n",
+    "# dtype={'sender_id': object} is necessary for https://github.com/pandas-dev/pandas/issues/41408, https://github.com/pandas-dev/pandas/issues/17869\n",
+    "database_d_incremental = pd.concat([pd.read_json(database_d['data'].iloc[i], dtype={'sender_id': object}).join(\n",
+    "    pd.DataFrame({'observation': database_d.loc[i].observation}, index=[0])) for i in \\\n",
+    "                                    database_d.loc[database_d.payload.isin(['latest', 'incremental'])].index])\n",
+    "\n",
+    "# unpack the first meta layer\n",
+    "temp = pd.DataFrame.from_dict(database_d_incremental.meta.apply(pd.Series)).drop(0, axis=1)\n",
+    "\n",
+    "# Unpack the inner layer, add prefix\n",
+    "temp = pd.concat([temp, temp['labjs_build'].apply(pd.Series).drop(0, axis=1)],\n",
+    "                 axis=1).drop('labjs_build', axis=1).add_prefix(\"labjs_build_\")\n",
+    "\n",
+    "# concat with the original df and drop id and data columns and rename the empty col to be called x as in the R version\n",
+    "# also add a prefix meta_ to all the meta columns\n",
+    "database_d_incremental = pd.concat([database_d_incremental, temp.add_prefix(\"meta_\")], axis=1).rename(\n",
+    "    columns={\"\": \"x\"}).drop(\"meta\", axis=1)\n",
+    "\n",
+    "\n",
+    "# add observations to all rows\n",
+    "database_d_incremental.observation.fillna(method=\"ffill\", inplace=True)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Merge datasets"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "database_d_output = pd.concat([database_d_full, \\\n",
+    "               database_d_incremental[~database_d_incremental.observation.isin(database_d_full.observation)]])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# If there is a userID column defined, show how many and which users we have\n",
+    "if \"userID\" in database_d_output.columns:\n",
+    "    print(f\"There are {database_d_output.userID.unique().shape} participants with the following names:\")\n",
+    "    print(database_d_output.userID.unique())"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Postprocessing"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Fill nans"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if \"code\" in database_d_full.columns:\n",
+    "    database_d_full['code'].fillna(method='ffill')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Remove sensitive data"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "if \"Email\" in database_d_full.columns:\n",
+    "    database_d_full.columns.drop(\"Email\", axis=1)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Save dataset as csv"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "name = 'Felix'\n",
+    "database_d_full.to_csv(name+'_database_full.csv')\n",
+    "database_d_incremental.to_csv(name+'_database_incremental.csv')"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### End of Felix's"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.8.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}

	slice	observation	payload
0	0	0a460f84-7833-4d43-b505-96d04bdf76d5	full
1	0	0a460f84-7833-4d43-b505-96d04bdf76d5	incremental
	id	session	timestamp	url	data	slice	observation	payload
0	1	o5pbac2mcj749kscpssso4f56d	2021-06-08T09:32:43+00:00	http://face-experiment.weizmann.ac.il/3/	[{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...	0	0a460f84-7833-4d43-b505-96d04bdf76d5	full
1	2	o5pbac2mcj749kscpssso4f56d	2021-06-08T09:32:43+00:00	http://face-experiment.weizmann.ac.il/3/	[{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...	0	0a460f84-7833-4d43-b505-96d04bdf76d5	incremental
	id	session	timestamp	url	data	slice	observation	payload
0	1	o5pba	2021-06-08T09:32:43+00:00	http://face-experiment.weizmann.ac.il/3/	[{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...	0	0a460	full
1	2	o5pba	2021-06-08T09:32:43+00:00	http://face-experiment.weizmann.ac.il/3/	[{\"url\":[],\"meta\":{\"labjs_version\":\"20.2.2\",\"l...	0	0a460	incremental