From 3e3b1850572bc9383a8f68bd4fa1f88f9590f616 Mon Sep 17 00:00:00 2001
From: Chelsea Lin <chelsealin@google.com>
Date: Mon, 5 Aug 2024 23:25:32 +0000
Subject: [PATCH 1/3] docs: create sample notebook to manipulate struct and
 array data

---
 .../dataframes/struct_and_array_dtypes.ipynb  | 658 ++++++++++++++++++
 1 file changed, 658 insertions(+)
 create mode 100644 notebooks/dataframes/struct_and_array_dtypes.ipynb
diff --git a/notebooks/dataframes/struct_and_array_dtypes.ipynb b/notebooks/dataframes/struct_and_array_dtypes.ipynb
new file mode 100644
index 0000000000..b056e78bd3
--- /dev/null
+++ b/notebooks/dataframes/struct_and_array_dtypes.ipynb
@@ -0,0 +1,658 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Copyright 2023 Google LLC\n",
+    "#\n",
+    "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
+    "# you may not use this file except in compliance with the License.\n",
+    "# You may obtain a copy of the License at\n",
+    "#\n",
+    "#     https://www.apache.org/licenses/LICENSE-2.0\n",
+    "#\n",
+    "# Unless required by applicable law or agreed to in writing, software\n",
+    "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
+    "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
+    "# See the License for the specific language governing permissions and\n",
+    "# limitations under the License."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# A Guide to Array and Struct Data Types in BigQuery DataFrames"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Set up your environment\n",
+    "\n",
+    "Please refer to the notebooks in the `getting_started` folder for instructions on setting up your environment. Once your environment is ready, run the following code to import the necessary packages for working with BigFrames arrays:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import bigframes.pandas as bpd\n",
+    "import bigframes.bigquery as bbq\n",
+    "import pyarrow as pa"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 18,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "REGION = \"US\"  # @param {type: \"string\"}\n",
+    "bpd.options.display.progress_bar = None\n",
+    "bpd.options.bigquery.location = REGION\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Array Data Types\n",
+    "\n",
+    "In BigQuery, an [array](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_type), also referred to as a `repeated` column, is an ordered list of zero or more non-array elements. These elements must be of the same data type, and arrays cannot contain other arrays. Furthermore, query results cannot include arrays with `NULL` elements.\n",
+    "\n",
+    "BigFrames DataFrames, inheriting these properties, map BigQuery array types to `pandas.ArrowDtype(pa.list_())`. This section provides code examples demonstrating how to effectively work with array columns within BigFrames DataFrames."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create DataFrames with array columns \n",
+    "\n",
+    "Let's create a sample BigFrames DataFrame where the `Scores` column holds array data of type `list<int64>[pyarrow]`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 4,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Name</th>\n",
+       "      <th>Scores</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Alice</td>\n",
+       "      <td>[95 88 92]</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>Bob</td>\n",
+       "      <td>[78 81]</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>Charlie</td>\n",
+       "      <td>[ 82  89  94 100]</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>3 rows × 2 columns</p>\n",
+       "</div>[3 rows x 2 columns in total]"
+      ],
+      "text/plain": [
+       "      Name             Scores\n",
+       "0    Alice         [95 88 92]\n",
+       "1      Bob            [78 81]\n",
+       "2  Charlie  [ 82  89  94 100]\n",
+       "\n",
+       "[3 rows x 2 columns]"
+      ]
+     },
+     "execution_count": 4,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df = bpd.DataFrame({\n",
+    "    'Name': ['Alice', 'Bob', 'Charlie'],\n",
+    "    'Scores': [[95, 88, 92], [78, 81], [82, 89, 94, 100]],\n",
+    "})\n",
+    "df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Name                 string[pyarrow]\n",
+       "Scores    list<item: int64>[pyarrow]\n",
+       "dtype: object"
+      ]
+     },
+     "execution_count": 5,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df.dtypes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## CRUD operations for array data\n",
+    "\n",
+    "While Pandas offers vectorized operations and lambda expressions to manipulate array data, BigFrames leverages BigQuery's computational power. BigFrames introduces the [`bigframes.bigquery`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery) package to provide access to a variety of native BigQuery array operations, such as [array_agg](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery#bigframes_bigquery_array_agg), [array_length](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery#bigframes_bigquery_array_length), and others. This module allows you to seamlessly perform create, read, update, and delete (CRUD) operations on array data within your BigFrames DataFrames.\n",
+    "\n",
+    "Let's delve into how you can utilize these functions to effectively manipulate array data in BigFrames."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0    3\n",
+       "1    2\n",
+       "2    4\n",
+       "Name: Scores, dtype: Int64"
+      ]
+     },
+     "execution_count": 6,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Find the length in each array\n",
+    "bbq.array_length(df['Scores'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 7,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0     95\n",
+       "0     88\n",
+       "0     92\n",
+       "1     78\n",
+       "1     81\n",
+       "2     82\n",
+       "2     89\n",
+       "2     94\n",
+       "2    100\n",
+       "Name: Scores, dtype: Int64"
+      ]
+     },
+     "execution_count": 7,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Explode array elements into rows\n",
+    "scores = df['Scores'].explode()\n",
+    "scores"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0    95.238095\n",
+       "0    88.571429\n",
+       "0    92.380952\n",
+       "1    79.047619\n",
+       "1    81.904762\n",
+       "2    82.857143\n",
+       "2     89.52381\n",
+       "2    94.285714\n",
+       "2        100.0\n",
+       "Name: Scores, dtype: Float64"
+      ]
+     },
+     "execution_count": 8,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Adjuste the scores\n",
+    "adj_scores = (scores + 5) / 105.0 * 100.0\n",
+    "adj_scores"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 9,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0                [95.23809524 88.57142857 92.38095238]\n",
+       "1                            [79.04761905 81.9047619 ]\n",
+       "2    [ 82.85714286  89.52380952  94.28571429 100.  ...\n",
+       "Name: Scores, dtype: list<item: double>[pyarrow]"
+      ]
+     },
+     "execution_count": 9,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Aggregate adjusted scores back into arrays\n",
+    "adj_scores_arr = bbq.array_agg(adj_scores.groupby(level=0))\n",
+    "adj_scores_arr"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 10,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Name</th>\n",
+       "      <th>Scores</th>\n",
+       "      <th>NewScores</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Alice</td>\n",
+       "      <td>[95 88 92]</td>\n",
+       "      <td>[95.23809524 88.57142857 92.38095238]</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>Bob</td>\n",
+       "      <td>[78 81]</td>\n",
+       "      <td>[79.04761905 81.9047619 ]</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>Charlie</td>\n",
+       "      <td>[ 82  89  94 100]</td>\n",
+       "      <td>[ 82.85714286  89.52380952  94.28571429 100.  ...</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>3 rows × 3 columns</p>\n",
+       "</div>[3 rows x 3 columns in total]"
+      ],
+      "text/plain": [
+       "      Name             Scores  \\\n",
+       "0    Alice         [95 88 92]   \n",
+       "1      Bob            [78 81]   \n",
+       "2  Charlie  [ 82  89  94 100]   \n",
+       "\n",
+       "                                           NewScores  \n",
+       "0              [95.23809524 88.57142857 92.38095238]  \n",
+       "1                          [79.04761905 81.9047619 ]  \n",
+       "2  [ 82.85714286  89.52380952  94.28571429 100.  ...  \n",
+       "\n",
+       "[3 rows x 3 columns]"
+      ]
+     },
+     "execution_count": 10,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Incorporate adjusted scores into the DataFrame\n",
+    "df['NewScores'] = adj_scores_arr\n",
+    "df"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Struct Data Types\n",
+    "\n",
+    "In BigQuery, an [struct](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type) (also known as a `record`) is a collection of ordered fields, each with a defined data type (required) and an optional field name. BigFrames maps BigQuery struct types to the Pandas equivalent, `pandas.ArrowDtype(pa.struct())`. In this section, we'll explore practical code examples illustrating how to work with struct columns within your BigFrames DataFrames."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Create DataFrames with struct columns \n",
+    "\n",
+    "Let's create a sample BigFrames DataFrame where the `Address` column holds struct data of type `struct<City: string, State: string>[pyarrow]`:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 11,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stderr",
+     "output_type": "stream",
+     "text": [
+      "/usr/local/google/home/chelsealin/src/bigframes2/venv/lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py:537: UserWarning: Pyarrow could not determine the type of columns: bigframes_unnamed_index.\n",
+      "  warnings.warn(\n"
+     ]
+    },
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>Name</th>\n",
+       "      <th>Address</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>Alice</td>\n",
+       "      <td>{'City': 'New York', 'State': 'NY'}</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>Bob</td>\n",
+       "      <td>{'City': 'San Francisco', 'State': 'CA'}</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>Charlie</td>\n",
+       "      <td>{'City': 'Seattle', 'State': 'WA'}</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>3 rows × 2 columns</p>\n",
+       "</div>[3 rows x 2 columns in total]"
+      ],
+      "text/plain": [
+       "      Name                                   Address\n",
+       "0    Alice       {'City': 'New York', 'State': 'NY'}\n",
+       "1      Bob  {'City': 'San Francisco', 'State': 'CA'}\n",
+       "2  Charlie        {'City': 'Seattle', 'State': 'WA'}\n",
+       "\n",
+       "[3 rows x 2 columns]"
+      ]
+     },
+     "execution_count": 11,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "names = bpd.Series(['Alice', 'Bob', 'Charlie'])\n",
+    "address = bpd.Series(\n",
+    "    [\n",
+    "        {'City': 'New York', 'State': 'NY'},\n",
+    "        {'City': 'San Francisco', 'State': 'CA'},\n",
+    "        {'City': 'Seattle', 'State': 'WA'}\n",
+    "    ],\n",
+    "    dtype=bpd.ArrowDtype(pa.struct(\n",
+    "         [('City', pa.string()), ('State', pa.string())]\n",
+    "    )))\n",
+    "\n",
+    "df = bpd.DataFrame({'Name': names, 'Address': address})\n",
+    "df"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "Name                                    string[pyarrow]\n",
+       "Address    struct<City: string, State: string>[pyarrow]\n",
+       "dtype: object"
+      ]
+     },
+     "execution_count": 12,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "df.dtypes"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## CRUD operations for struct data\n",
+    "\n",
+    "Similar to Pandas, BigFrames provides a [`StructAccessor`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.operations.structs.StructAccessor) to streamline the manipulation of struct data. Let's explore how you can utilize this feature for efficient CRUD operations on your nested struct columns."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "City     string[pyarrow]\n",
+       "State    string[pyarrow]\n",
+       "dtype: object"
+      ]
+     },
+     "execution_count": 13,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Return the dtype object of each child field of the struct.\n",
+    "df['Address'].struct.dtypes()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/plain": [
+       "0         New York\n",
+       "1    San Francisco\n",
+       "2          Seattle\n",
+       "Name: City, dtype: string"
+      ]
+     },
+     "execution_count": 14,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Extract a child field as a Series\n",
+    "city = df['Address'].struct.field(\"City\")\n",
+    "city"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 15,
+   "metadata": {},
+   "outputs": [
+    {
+     "data": {
+      "text/html": [
+       "<div>\n",
+       "<style scoped>\n",
+       "    .dataframe tbody tr th:only-of-type {\n",
+       "        vertical-align: middle;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe tbody tr th {\n",
+       "        vertical-align: top;\n",
+       "    }\n",
+       "\n",
+       "    .dataframe thead th {\n",
+       "        text-align: right;\n",
+       "    }\n",
+       "</style>\n",
+       "<table border=\"1\" class=\"dataframe\">\n",
+       "  <thead>\n",
+       "    <tr style=\"text-align: right;\">\n",
+       "      <th></th>\n",
+       "      <th>City</th>\n",
+       "      <th>State</th>\n",
+       "    </tr>\n",
+       "  </thead>\n",
+       "  <tbody>\n",
+       "    <tr>\n",
+       "      <th>0</th>\n",
+       "      <td>New York</td>\n",
+       "      <td>NY</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>1</th>\n",
+       "      <td>San Francisco</td>\n",
+       "      <td>CA</td>\n",
+       "    </tr>\n",
+       "    <tr>\n",
+       "      <th>2</th>\n",
+       "      <td>Seattle</td>\n",
+       "      <td>WA</td>\n",
+       "    </tr>\n",
+       "  </tbody>\n",
+       "</table>\n",
+       "<p>3 rows × 2 columns</p>\n",
+       "</div>[3 rows x 2 columns in total]"
+      ],
+      "text/plain": [
+       "            City State\n",
+       "0       New York    NY\n",
+       "1  San Francisco    CA\n",
+       "2        Seattle    WA\n",
+       "\n",
+       "[3 rows x 2 columns]"
+      ]
+     },
+     "execution_count": 15,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "# Extract all child fields of a struct as a DataFrame.\n",
+    "address_df = df['Address'].struct.explode()\n",
+    "address_df"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "venv",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.1"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}

From f3ce5310a47a235deff8016aeb75d1c190a1202d Mon Sep 17 00:00:00 2001
From: Chelsea Lin <chelsealin@google.com>
Date: Wed, 7 Aug 2024 20:27:38 +0000
Subject: [PATCH 2/3] typo

---
 notebooks/dataframes/struct_and_array_dtypes.ipynb | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/notebooks/dataframes/struct_and_array_dtypes.ipynb b/notebooks/dataframes/struct_and_array_dtypes.ipynb
index b056e78bd3..3ba07d1b88 100644
--- a/notebooks/dataframes/struct_and_array_dtypes.ipynb
+++ b/notebooks/dataframes/struct_and_array_dtypes.ipynb
@@ -266,7 +266,7 @@
     }
    ],
    "source": [
-    "# Adjuste the scores\n",
+    "# Adjust the scores\n",
     "adj_scores = (scores + 5) / 105.0 * 100.0\n",
     "adj_scores"
    ]
@@ -382,7 +382,7 @@
    "source": [
     "# Struct Data Types\n",
     "\n",
-    "In BigQuery, an [struct](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type) (also known as a `record`) is a collection of ordered fields, each with a defined data type (required) and an optional field name. BigFrames maps BigQuery struct types to the Pandas equivalent, `pandas.ArrowDtype(pa.struct())`. In this section, we'll explore practical code examples illustrating how to work with struct columns within your BigFrames DataFrames."
+    "In BigQuery, a [struct](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type) (also known as a `record`) is a collection of ordered fields, each with a defined data type (required) and an optional field name. BigFrames maps BigQuery struct types to the Pandas equivalent, `pandas.ArrowDtype(pa.struct())`. In this section, we'll explore practical code examples illustrating how to work with struct columns within your BigFrames DataFrames."
    ]
   },
   {

From e172cf8cb0a8d236da5dee90c805548760a74ec0 Mon Sep 17 00:00:00 2001
From: Chelsea Lin <chelsealin@google.com>
Date: Tue, 13 Aug 2024 17:55:01 +0000
Subject: [PATCH 3/3] address comments

---
 .../dataframes/struct_and_array_dtypes.ipynb  | 92 +++++++++----------
 1 file changed, 45 insertions(+), 47 deletions(-)

diff --git a/notebooks/dataframes/struct_and_array_dtypes.ipynb b/notebooks/dataframes/struct_and_array_dtypes.ipynb
index 3ba07d1b88..3bcdaf40f7 100644
--- a/notebooks/dataframes/struct_and_array_dtypes.ipynb
+++ b/notebooks/dataframes/struct_and_array_dtypes.ipynb
@@ -34,12 +34,12 @@
    "source": [
     "# Set up your environment\n",
     "\n",
-    "Please refer to the notebooks in the `getting_started` folder for instructions on setting up your environment. Once your environment is ready, run the following code to import the necessary packages for working with BigFrames arrays:"
+    "To get started, follow the instructions in the notebooks within the `getting_started` folder to set up your environment.  Once your environment is ready, you can import the necessary packages by running the following code:"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 17,
+   "execution_count": 2,
    "metadata": {},
    "outputs": [],
    "source": [
@@ -50,13 +50,14 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 18,
+   "execution_count": 3,
    "metadata": {},
    "outputs": [],
    "source": [
     "REGION = \"US\"  # @param {type: \"string\"}\n",
+    "\n",
     "bpd.options.display.progress_bar = None\n",
-    "bpd.options.bigquery.location = REGION\n"
+    "bpd.options.bigquery.location = REGION"
    ]
   },
   {
@@ -65,18 +66,18 @@
    "source": [
     "# Array Data Types\n",
     "\n",
-    "In BigQuery, an [array](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_type), also referred to as a `repeated` column, is an ordered list of zero or more non-array elements. These elements must be of the same data type, and arrays cannot contain other arrays. Furthermore, query results cannot include arrays with `NULL` elements.\n",
+    "In BigQuery, an [array](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#array_type) (also called a repeated column) is an ordered list of zero or more elements of the same data type. Arrays cannot contain other arrays or `NULL` elements.\n",
     "\n",
-    "BigFrames DataFrames, inheriting these properties, map BigQuery array types to `pandas.ArrowDtype(pa.list_())`. This section provides code examples demonstrating how to effectively work with array columns within BigFrames DataFrames."
+    "BigQuery DataFrames map BigQuery array types to `pandas.ArrowDtype(pa.list_())`. The following code examples illustrate how to work with array columns in BigQuery DataFrames."
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Create DataFrames with array columns \n",
+    "## Create DataFrames with array columns\n",
     "\n",
-    "Let's create a sample BigFrames DataFrame where the `Scores` column holds array data of type `list<int64>[pyarrow]`:"
+    "Create a DataFrame in BigQuery DataFrames from local sample data. Use a list of lists to create a column with the `list<int64>[pyarrow]` dtype, which corresponds to the `ARRAY<INT64>` type in BigQuery."
    ]
   },
   {
@@ -178,11 +179,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## CRUD operations for array data\n",
-    "\n",
-    "While Pandas offers vectorized operations and lambda expressions to manipulate array data, BigFrames leverages BigQuery's computational power. BigFrames introduces the [`bigframes.bigquery`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery) package to provide access to a variety of native BigQuery array operations, such as [array_agg](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery#bigframes_bigquery_array_agg), [array_length](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery#bigframes_bigquery_array_length), and others. This module allows you to seamlessly perform create, read, update, and delete (CRUD) operations on array data within your BigFrames DataFrames.\n",
+    "## Operate on array data\n",
     "\n",
-    "Let's delve into how you can utilize these functions to effectively manipulate array data in BigFrames."
+    "While pandas offers vectorized operations and lambda expressions for array manipulation, BigQuery DataFrames leverages the computational power of BigQuery itself. You can access a variety of native BigQuery array operations, such as [`array_agg`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery#bigframes_bigquery_array_agg) and [`array_length`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery#bigframes_bigquery_array_length), through the [`bigframes.bigquery`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.bigquery) package (abbreviated as `bbq` in the following code samples)."
    ]
   },
   {
@@ -205,7 +204,7 @@
     }
    ],
    "source": [
-    "# Find the length in each array\n",
+    "# Find the length in each array.\n",
     "bbq.array_length(df['Scores'])"
    ]
   },
@@ -235,7 +234,9 @@
     }
    ],
    "source": [
-    "# Explode array elements into rows\n",
+    "# Transforms array elements into individual rows, preserving original order when in ordering\n",
+    "# mode. If an array has multiple elements, exploded rows are ordered by the element's index\n",
+    "# within its original array.\n",
     "scores = df['Scores'].explode()\n",
     "scores"
    ]
@@ -248,15 +249,15 @@
     {
      "data": {
       "text/plain": [
-       "0    95.238095\n",
-       "0    88.571429\n",
-       "0    92.380952\n",
-       "1    79.047619\n",
-       "1    81.904762\n",
-       "2    82.857143\n",
-       "2     89.52381\n",
-       "2    94.285714\n",
-       "2        100.0\n",
+       "0    100.0\n",
+       "0     93.0\n",
+       "0     97.0\n",
+       "1     83.0\n",
+       "1     86.0\n",
+       "2     87.0\n",
+       "2     94.0\n",
+       "2     99.0\n",
+       "2    105.0\n",
        "Name: Scores, dtype: Float64"
       ]
      },
@@ -266,8 +267,8 @@
     }
    ],
    "source": [
-    "# Adjust the scores\n",
-    "adj_scores = (scores + 5) / 105.0 * 100.0\n",
+    "# Adjust the scores.\n",
+    "adj_scores = scores + 5.0\n",
     "adj_scores"
    ]
   },
@@ -279,9 +280,9 @@
     {
      "data": {
       "text/plain": [
-       "0                [95.23809524 88.57142857 92.38095238]\n",
-       "1                            [79.04761905 81.9047619 ]\n",
-       "2    [ 82.85714286  89.52380952  94.28571429 100.  ...\n",
+       "0         [100.  93.  97.]\n",
+       "1                [83. 86.]\n",
+       "2    [ 87.  94.  99. 105.]\n",
        "Name: Scores, dtype: list<item: double>[pyarrow]"
       ]
      },
@@ -291,7 +292,7 @@
     }
    ],
    "source": [
-    "# Aggregate adjusted scores back into arrays\n",
+    "# Aggregate adjusted scores back into arrays.\n",
     "adj_scores_arr = bbq.array_agg(adj_scores.groupby(level=0))\n",
     "adj_scores_arr"
    ]
@@ -332,19 +333,19 @@
        "      <th>0</th>\n",
        "      <td>Alice</td>\n",
        "      <td>[95 88 92]</td>\n",
-       "      <td>[95.23809524 88.57142857 92.38095238]</td>\n",
+       "      <td>[100.  93.  97.]</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>1</th>\n",
        "      <td>Bob</td>\n",
        "      <td>[78 81]</td>\n",
-       "      <td>[79.04761905 81.9047619 ]</td>\n",
+       "      <td>[83. 86.]</td>\n",
        "    </tr>\n",
        "    <tr>\n",
        "      <th>2</th>\n",
        "      <td>Charlie</td>\n",
        "      <td>[ 82  89  94 100]</td>\n",
-       "      <td>[ 82.85714286  89.52380952  94.28571429 100.  ...</td>\n",
+       "      <td>[ 87.  94.  99. 105.]</td>\n",
        "    </tr>\n",
        "  </tbody>\n",
        "</table>\n",
@@ -352,15 +353,10 @@
        "</div>[3 rows x 3 columns in total]"
       ],
       "text/plain": [
-       "      Name             Scores  \\\n",
-       "0    Alice         [95 88 92]   \n",
-       "1      Bob            [78 81]   \n",
-       "2  Charlie  [ 82  89  94 100]   \n",
-       "\n",
-       "                                           NewScores  \n",
-       "0              [95.23809524 88.57142857 92.38095238]  \n",
-       "1                          [79.04761905 81.9047619 ]  \n",
-       "2  [ 82.85714286  89.52380952  94.28571429 100.  ...  \n",
+       "      Name             Scores              NewScores\n",
+       "0    Alice         [95 88 92]       [100.  93.  97.]\n",
+       "1      Bob            [78 81]              [83. 86.]\n",
+       "2  Charlie  [ 82  89  94 100]  [ 87.  94.  99. 105.]\n",
        "\n",
        "[3 rows x 3 columns]"
       ]
@@ -371,7 +367,9 @@
     }
    ],
    "source": [
-    "# Incorporate adjusted scores into the DataFrame\n",
+    "# Add adjusted scores into the DataFrame. This operation requires an implicit join \n",
+    "# between the two tables, necessitating a unique index in the DataFrame (guaranteed \n",
+    "# in the default ordering and index mode).\n",
     "df['NewScores'] = adj_scores_arr\n",
     "df"
    ]
@@ -382,7 +380,7 @@
    "source": [
     "# Struct Data Types\n",
     "\n",
-    "In BigQuery, a [struct](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type) (also known as a `record`) is a collection of ordered fields, each with a defined data type (required) and an optional field name. BigFrames maps BigQuery struct types to the Pandas equivalent, `pandas.ArrowDtype(pa.struct())`. In this section, we'll explore practical code examples illustrating how to work with struct columns within your BigFrames DataFrames."
+    "In BigQuery, a [struct](https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#struct_type) (also known as a `record`) is a collection of ordered fields, each with a defined data type (required) and an optional field name. BigQuery DataFrames maps BigQuery struct types to the pandas equivalent, `pandas.ArrowDtype(pa.struct())`. This section provides practical code examples illustrating how to use struct columns with BigQuery DataFrames."
    ]
   },
   {
@@ -391,7 +389,7 @@
    "source": [
     "## Create DataFrames with struct columns \n",
     "\n",
-    "Let's create a sample BigFrames DataFrame where the `Address` column holds struct data of type `struct<City: string, State: string>[pyarrow]`:"
+    "Create a DataFrame with an `Address` struct column by using dictionaries for the data and setting the dtype to `struct<City: string, State: string>[pyarrow]`."
    ]
   },
   {
@@ -403,7 +401,7 @@
      "name": "stderr",
      "output_type": "stream",
      "text": [
-      "/usr/local/google/home/chelsealin/src/bigframes2/venv/lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py:537: UserWarning: Pyarrow could not determine the type of columns: bigframes_unnamed_index.\n",
+      "/usr/local/google/home/chelsealin/src/bigframes/venv/lib/python3.12/site-packages/google/cloud/bigquery/_pandas_helpers.py:570: UserWarning: Pyarrow could not determine the type of columns: bigframes_unnamed_index.\n",
       "  warnings.warn(\n"
      ]
     },
@@ -509,9 +507,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## CRUD operations for struct data\n",
+    "## Operate on struct data\n",
     "\n",
-    "Similar to Pandas, BigFrames provides a [`StructAccessor`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.operations.structs.StructAccessor) to streamline the manipulation of struct data. Let's explore how you can utilize this feature for efficient CRUD operations on your nested struct columns."
+    "Similar to pandas, BigQuery DataFrames provides a [`StructAccessor`](https://cloud.google.com/python/docs/reference/bigframes/latest/bigframes.operations.structs.StructAccessor). Use the methods provided in this accessor to manipulate struct data."
    ]
   },
   {

	Name	Scores	NewScores
0	Alice	[95 88 92]	[95.23809524 88.57142857 92.38095238]
1	Bob	[78 81]	[79.04761905 81.9047619 ]
2	Charlie	[ 82 89 94 100]	[ 82.85714286 89.52380952 94.28571429 100. ...
	Name	Address
0	Alice	{'City': 'New York', 'State': 'NY'}
1	Bob	{'City': 'San Francisco', 'State': 'CA'}
2	Charlie	{'City': 'Seattle', 'State': 'WA'}