switched to yaml file for widgets #150

harshraj172 · 2023-07-01T10:07:59Z

kahkeng

thanks for this! had a few comments

kahkeng · 2023-07-01T16:58:20Z

env/local.example.yaml

-  host: localhost:3000
-  origins:
-  - '*'
-  secret_key: US3e74TCvm0fB6XR0DrA5f7RofyIhR4xQtousxSwcV8=


this file shouldn't be deleted

sorry, I missed it.. added

kahkeng · 2023-07-01T17:01:20Z

knowledge_base/widgets.yaml

@@ -0,0 +1,643 @@
+widget 1:


i think there should be a way to add a list to yaml using hyphens as bullets. this way you don't need to find a unused number and there won't be number conflicts, etc.

maybe it would also be clearer if the name/command is the first field, since it seems to be alphabetically sorted, perhaps could use an underscore like _id: ... to force this to appear first?

Yea, would like to echo that we don't need widget numbers, we could make it a native YAML list of widget objects with the first item of the object being the name for easy identification.
We can enforce uniqueness by parsing the list and checking on server start.

kahkeng · 2023-07-01T17:04:12Z

knowledge_base/widgets.yaml

+    - address
+    type: object
+  return_value_description: ''
+  widget_command: <|display-transfer({token},{amount},{address})|>


i don't think we need this here as it has information that is redundantly encoded. either we keep this and drop the name and required fields (have them be generated from parsing the widget command, or vice versa). i think my preference might be for keeping the widget command since it is a concise representation. @sgzsh269 any thoughts as well?

Yea, one suggestion: I think we should keep the name and required fields.. widget_command might change so it would be easier to make a new widget. Also, constructing the widget_command manually is prone to errors.

sure that also seems reasonable. i've seen spacing inconsistency also.

I think it might be better to remove the widget_command and have it be generated (as well as OpenAI functions) on server start using the schema so we should keep name and required fields.

kahkeng · 2023-07-01T17:09:00Z

utils/common.py

+        if len(values["return_value_description"].strip()) > 0: 
+            doc += f"Return value description:\n-{values['return_value_description']}\n"
+        doc += "---\n"
+    return doc.strip().strip('---').strip()


one way to avoid this is to put things into a list initially, then do '---\n'.join(docs).

sgzsh269

Good stuff, added my comments.

sgzsh269 · 2023-07-01T22:51:28Z

knowledge_base/widgets.yaml

@@ -0,0 +1,643 @@
+widget 1:


Yea, would like to echo that we don't need widget numbers, we could make it a native YAML list of widget objects with the first item of the object being the name for easy identification.
We can enforce uniqueness by parsing the list and checking on server start.

sgzsh269 · 2023-07-01T22:56:10Z

knowledge_base/widgets.yaml

+    - address
+    type: object
+  return_value_description: ''
+  widget_command: <|display-transfer({token},{amount},{address})|>


I think it might be better to remove the widget_command and have it be generated (as well as OpenAI functions) on server start using the schema so we should keep name and required fields.

sgzsh269 · 2023-07-01T23:10:57Z

utils/common.py

-    FUNCTIONS = json.load(f)
+def yaml2functions(file_path):
+    functions = []
+    with open(file_path, 'r') as file:


We should add a validation check for uniqueness on widget names defined in the YAML file

sgzsh269 · 2023-07-01T23:20:33Z

utils/common.py

+    return doc.strip().strip('---').strip()
+
+yaml_file_path = f"{os.getcwd()}/knowledge_base/widgets.yaml"
+WIDGETS = yaml2widgetsdoc(yaml_file_path)


@kahkeng @harshraj172 One point to consider is whether we want to index the widgets in text or YAML schema, if the YAML schema is the source of truth and has extensive metadata, should that be in the index instead of text? Not sure how this would affect the search quality/recall metric
If we do index the YAML file, then the widget names from top-k results could be used to lookup the corresponding OpenAI function schema and compact text schema.

i think the yaml and current widgets.txt format is pretty similar, so embedding perf should be similar. but in general, i think a change to indexing format could be a separate experiment/PR, we can use finetune/validate.py to quantify if there's any perf hit as well. so this PR keeping the status quo should be ok.

i'm not sure i understand what your point is about "If we do index the YAML file, then the widget names from top-k results could be used to lookup the corresponding OpenAI function schema and compact text schema." Could you explain that further?

Got it, makes sense to keep the existing embedding behavior as-is.

My point was regarding how the widget search results in YAML format could be mapped to GPT functions and text schema at LLM query time, so the _name_ field in the search results could be used to lookup the corresponding GPT function and text schema that presumably would have all been constructed and cached on server start

In the chat module (https://github.com/yieldprotocol/cacti-backend/blob/dev/chat/chatgpt_function_call.py#L147-L159), when we get widgets back from weaviate, are we already doing something similar to what you are suggesting?

sgzsh269 · 2023-07-02T02:29:01Z

README.md

 docker-compose up
 # ensure schema is created/up-to-date
 ENV_TAG=local ./db_schema_sync.sh
 ```


I observed that the general setup instructions above is missing the update widget index command, could you please add that, so:-

Remove the duplicate local config copy line out here https://github.com/yieldprotocol/cacti-backend/pull/150/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R14

Replace that with python -m scripts.check_update_widget_index

…rotocol/cacti-backend into harsh/switch_widgets2yaml

kahkeng · 2023-07-02T16:20:25Z

Looks good, except I ran validate.py on your branch, and got these results:

0/chat.chatgpt_function_call.ChatGPTFunctionCallChat/accuracy/widget:  0.92
0/chat.chatgpt_function_call.ChatGPTFunctionCallChat/accuracy/widget_param:  0.30

The param accuracy is much lower than before. Digging into some examples, looks like ordering of input variables is incorrect. Looking deeper, I think your functions json is missing the 'required' param which describes the order?

<|display-transfer(0x1234,,ETH)|> :: <|display-transfer(ETH,,0x1234)|>
<|display-transfer(0x1234,123,ETH)|> :: <|display-transfer(ETH,123,0x1234)|>
<|display-transfer(0x4321,456,USDC)|> :: <|display-transfer(USDC,456,0x4321)|>

Perhaps after fixing you can verify you can reach the same/similar results to the spreadsheet for at least ChatGPTFunctions and BasicAgent? https://docs.google.com/spreadsheets/d/1QnUL7bEHbrGUFrL3GKa5rjpPtwNG2WLy4t4NzQ4GHWE/edit#gid=0

kahkeng

Besides the accuracy issue, things look good for me. Will defer to Sagar if there's anything else from his review.

harshraj172 · 2023-07-02T16:41:01Z

Looks good, except I ran validate.py on your branch, and got these results:
0/chat.chatgpt_function_call.ChatGPTFunctionCallChat/accuracy/widget:  0.92
0/chat.chatgpt_function_call.ChatGPTFunctionCallChat/accuracy/widget_param:  0.30
The param accuracy is much lower than before. Digging into some examples, looks like ordering of input variables is incorrect. Looking deeper, I think your functions json is missing the 'required' param which describes the order?
<|display-transfer(0x1234,,ETH)|> :: <|display-transfer(ETH,,0x1234)|>
<|display-transfer(0x1234,123,ETH)|> :: <|display-transfer(ETH,123,0x1234)|>
<|display-transfer(0x4321,456,USDC)|> :: <|display-transfer(USDC,456,0x4321)|>
Perhaps after fixing you can verify you can reach the same/similar results to the spreadsheet for at least ChatGPTFunctions and BasicAgent? https://docs.google.com/spreadsheets/d/1QnUL7bEHbrGUFrL3GKa5rjpPtwNG2WLy4t4NzQ4GHWE/edit#gid=0

harshraj172 · 2023-07-02T16:41:19Z

Looks good, except I ran validate.py on your branch, and got these results:
0/chat.chatgpt_function_call.ChatGPTFunctionCallChat/accuracy/widget:  0.92
0/chat.chatgpt_function_call.ChatGPTFunctionCallChat/accuracy/widget_param:  0.30
The param accuracy is much lower than before. Digging into some examples, looks like ordering of input variables is incorrect. Looking deeper, I think your functions json is missing the 'required' param which describes the order?
<|display-transfer(0x1234,,ETH)|> :: <|display-transfer(ETH,,0x1234)|>
<|display-transfer(0x1234,123,ETH)|> :: <|display-transfer(ETH,123,0x1234)|>
<|display-transfer(0x4321,456,USDC)|> :: <|display-transfer(USDC,456,0x4321)|>
Perhaps after fixing you can verify you can reach the same/similar results to the spreadsheet for at least ChatGPTFunctions and BasicAgent? https://docs.google.com/spreadsheets/d/1QnUL7bEHbrGUFrL3GKa5rjpPtwNG2WLy4t4NzQ4GHWE/edit#gid=0

I see this is interesting.. actually my FUNCTIONS variable has 'required' param.. Let me dig into it

harshraj172 · 2023-07-02T17:20:41Z

Looks good, except I ran validate.py on your branch, and got these results:
0/chat.chatgpt_function_call.ChatGPTFunctionCallChat/accuracy/widget:  0.92
0/chat.chatgpt_function_call.ChatGPTFunctionCallChat/accuracy/widget_param:  0.30
The param accuracy is much lower than before. Digging into some examples, looks like ordering of input variables is incorrect. Looking deeper, I think your functions json is missing the 'required' param which describes the order?
<|display-transfer(0x1234,,ETH)|> :: <|display-transfer(ETH,,0x1234)|>
<|display-transfer(0x1234,123,ETH)|> :: <|display-transfer(ETH,123,0x1234)|>
<|display-transfer(0x4321,456,USDC)|> :: <|display-transfer(USDC,456,0x4321)|>
Perhaps after fixing you can verify you can reach the same/similar results to the spreadsheet for at least ChatGPTFunctions and BasicAgent? https://docs.google.com/spreadsheets/d/1QnUL7bEHbrGUFrL3GKa5rjpPtwNG2WLy4t4NzQ4GHWE/edit#gid=0
I see this is interesting.. actually my FUNCTIONS variable has 'required' param.. Let me dig into it

So the issue is that the FUNCTIONS variable should have the function parameters arranged in ['parameters']['properties'] in the same order as given in ['parameters']['required'], but due to the alphabetical arrangement of keys in YAML it got rearranged.

kahkeng · 2023-07-02T20:53:25Z

Hmm I don't think this is the main issue. I looked at the original GPT functions blog post, their API takes the "required" param. But I think your code that reads yaml into functions, we are not setting this param?

kahkeng · 2023-07-02T20:55:14Z

utils/common.py


-with open(f"{os.getcwd()}/knowledge_base/functions.json", 'r') as f:
-    FUNCTIONS = json.load(f)
+def widgets_yaml2functions(widgets_lst):


This function should probably be reading the "required" param as well?

Yes this has the 'required' param under 'parameter' key. I printed out the functions while running finetune/validate.py

sgzsh269

Left few more comments to think about, will approve. Thanks for putting this together!

Just make sure to do some quick end-to-end smoke tests locally by prompting to trigger basic widgets such as swap, NFT, Aave etc. via the FE

sgzsh269 · 2023-07-03T01:43:46Z

knowledge_base/widgets.yaml

+    - tokenToBuy
+    - transactionKeyword
+    - amount
+    type: object


@harshraj172 Why is the type: object declaration necessary? Isn't it redundant as the YAML type of the parameters is inherently an object cc @kahkeng

sgzsh269 · 2023-07-03T01:58:34Z

utils/common.py

+    docs = []
+    for values in widgets_lst:
+        doc = ""
+        widget_command = f"<|{values['_name_'].replace('_', '-')}(" + "{" + "},{".join(values['parameters']['required']) + "})|>"


nit: try to optimize for readability as at times being succinct can make it hard to maintain code in the long run
So for this case, it would help to break it up into 2 parts for widget command construction - name and parameters

widget_command_name = values['_name_'].replace('_', '-') widget_command_params = ','.join(['{' + p + '}' for p in values['parameters']['required']]) widget_command = f"<|{widget_command_name}({widget_command_params})|>"

Also, why are we only selecting required parameters, shouldn't optional parameters also be included here?

Currently, we have put all the parameters under the 'required' key. In future, we might have to declare some optional parameters and the way to do that would be just don't put the optional parameters in the list under the 'required' key. So ideally, we should be reading the paramter names from 'parameters.properties', but the problem will be that due to being saved in YAML format the keys in 'parameters.properties' gets rearranged alphabetically and we will lose the correct ordering of functions params.

I think we should allow for all params and not just the required ones. As more widgets get added, there would be cases of optional params and people would follow our YAML schema and accordingly populate required

How about we define properties as a list to preserve order and have a name field for the param?
@kahkeng any thoughts?

yeah we could do that in the interim as well. it would probably just be a hacky stop-gap, until we properly support optional args/kwargs on both backend and frontend (for display- widgets).

right now all the arguments are required, so any of the (hacky) solutions proposed are probably all equivalent. however, once we want to support optional arguments, then certain solutions become less necessary, like having properties be a list, which seems like a less natural way to express a dict. hence i think my slight preference would be for a hacky solution that is in code that can be removed, rather than in data which requires a data format adjustment, should we not need the hacky fix any more.

but given we've had a lot of discussion so far, i'm in favor of just picking one reasonable solution and moving to get this change out, and we can debate further offline with new PR as needed, if optional arguments do arise (i'm not sure how often this is needed in practice, we could probably use system prompts to force all arguments to be specified).

In terms of use-cases that may have optional params, need to think more about the range, it would be similar to queries like 'price of BTC' for which a user doesn't specify quote currency so it should default to USD

We may be able to use system prompts to handle such defaults in the interim as we figure out a robust approach.

sgzsh269 · 2023-07-03T02:06:54Z

utils/common.py

+    return doc.strip().strip('---').strip()
+
+yaml_file_path = f"{os.getcwd()}/knowledge_base/widgets.yaml"
+WIDGETS = yaml2widgetsdoc(yaml_file_path)


Got it, makes sense to keep the existing embedding behavior as-is.

My point was regarding how the widget search results in YAML format could be mapped to GPT functions and text schema at LLM query time, so the _name_ field in the search results could be used to lookup the corresponding GPT function and text schema that presumably would have all been constructed and cached on server start

harshraj172 · 2023-07-03T03:05:46Z

Left few more comments to think about, will approve. Thanks for putting this together!

Just make sure to do some quick end-to-end smoke tests locally by prompting to trigger basic widgets such as swap, NFT, Aave etc. via the FE

Yea, I should do some good tests before raising a PR. This got too many errors.

kahkeng · 2023-07-03T03:17:38Z

Oh I see, so "required" is nested one more level and doesn't seem to make a difference. So I checked the blog post again and it seems the response from GPT is a json dict for input keys and values, so we could use that and the ordering in "required" that we have in the yaml to produce the correct order ourselves. That could be the way we encode the ordering ourselves more reliably anyway, than relying on key order for a dictionary. Thoughts?

kahkeng · 2023-07-03T03:22:35Z

There's also this if we are creating the yaml from a dump somewhere: https://stackoverflow.com/questions/16782112/can-pyyaml-dump-dict-items-in-non-alphabetical-order/55171433#55171433 This could let us keep the key order and the required order in sync to reduce mistakes a developer might have. Finally, one more thought: we can also, at read time of the yaml, substitute the properties dict with an OrderedDict using the ordering in the "required" list. So GPT gets the dictionary with keys in the expected order.

Ok another thought: we could also embrace the kwarg method for specifying parameters and values, no need to make widget commands be unnamed arg style. Probably there's some places to change to support this but it might be the ideal if we want to support optional params as well in future.

What do people think?

harshraj172 · 2023-07-03T04:53:33Z

There's also this if we are creating the yaml from a dump somewhere: https://stackoverflow.com/questions/16782112/can-pyyaml-dump-dict-items-in-non-alphabetical-order/55171433#55171433 This could let us keep the key order and the required order in sync to reduce mistakes a developer might have. Finally, one more thought: we can also, at read time of the yaml, substitute the properties dict with an OrderedDict using the ordering in the "required" list. So GPT gets the dictionary with keys in the expected order.

Ok another thought: we could also embrace the kwarg method for specifying parameters and values, no need to make widget commands be unnamed arg style. Probably there's some places to change to support this but it might be the ideal if we want to support optional params as well in future.

What do people think?

The 3rd case is the ideal one, but we need to change the widget_command syntax. We should probably do that. But would that be reasonable to use the 1st method now and do the 3rd one in the next PR? Also using the 1st one we can solve the problem of optional params too as @sgzsh269 mentioned here.

kahkeng · 2023-07-03T07:15:54Z

Right now we call widgets with something like fetch_nft_search(*params), perhaps it is easy to change this to fetch_nft_search(**kwargs) for the GPT functions case? But agreed it can be separate PR.

I think fo rnow, either first or second option is fine with me, assuming either of them would help. I think having more constraints/invariants is better, so I would be in favor of either an assert that the order specified in "required" matches the properties ordering, or keeping them alphabetical order / undefined order and then just creating that dictionary using the "required" ordering (second method). That's my preference.

Is the yaml right now getting constructed in a notebook somewhere? With the first method, how do we ensure anyone who modifies it will remember to use sort_keys=False? What if they copy-paste or just insert properties in some order that doesn't match our actual widget or the "required" field's order? Should we use inspect module to inspect the function's signature and figure out the right order? (if we're going that route, it might be easier to just support **kwargs)

harshraj172 · 2023-07-03T08:20:56Z

Right now we call widgets with something like fetch_nft_search(*params), perhaps it is easy to change this to fetch_nft_search(**kwargs) for the GPT functions case? But agreed it can be separate PR.

I think fo rnow, either first or second option is fine with me, assuming either of them would help. I think having more constraints/invariants is better, so I would be in favor of either an assert that the order specified in "required" matches the properties ordering, or keeping them alphabetical order / undefined order and then just creating that dictionary using the "required" ordering (second method). That's my preference.

Is the yaml right now getting constructed in a notebook somewhere? With the first method, how do we ensure anyone who modifies it will remember to use sort_keys=False? What if they copy-paste or just insert properties in some order that doesn't match our actual widget or the "required" field's order? Should we use inspect module to inspect the function's signature and figure out the right order? (if we're going that route, it might be easier to just support **kwargs)

Yea, this will happen with the sort_keys method but the problem with the 1st is that all the params might not be there in 'required'. As 'required' only takes in required params not the optional one so the reordering of optional params might not happen.

harshraj172 · 2023-07-03T08:25:32Z

Right now we call widgets with something like fetch_nft_search(*params), perhaps it is easy to change this to fetch_nft_search(**kwargs) for the GPT functions case? But agreed it can be separate PR.
I think fo rnow, either first or second option is fine with me, assuming either of them would help. I think having more constraints/invariants is better, so I would be in favor of either an assert that the order specified in "required" matches the properties ordering, or keeping them alphabetical order / undefined order and then just creating that dictionary using the "required" ordering (second method). That's my preference.
Is the yaml right now getting constructed in a notebook somewhere? With the first method, how do we ensure anyone who modifies it will remember to use sort_keys=False? What if they copy-paste or just insert properties in some order that doesn't match our actual widget or the "required" field's order? Should we use inspect module to inspect the function's signature and figure out the right order? (if we're going that route, it might be easier to just support **kwargs)

Yea, this will happen with the sort_keys method but the problem with the 1st is that all the params might not be there in 'required'. As 'required' only takes in required params not the optional one so the reordering of optional params might not happen. I think we have to resort to the **kwargs method only..

switched to yaml file for widgets

fa00646

harshraj172 requested review from kahkeng and sgzsh269 July 1, 2023 10:07

kahkeng reviewed Jul 1, 2023

View reviewed changes

replaced widgets.txt with YAML

b243ced

harshraj172 requested a review from kahkeng July 1, 2023 18:50

Create local.example.yaml

ca6a9d8

sgzsh269 reviewed Jul 1, 2023

View reviewed changes

sgzsh269 reviewed Jul 2, 2023

View reviewed changes

harshraj172 added 2 commits July 2, 2023 18:27

resolved comment

b6fca3d

Merge branch 'harsh/switch_widgets2yaml' of https://github.com/yieldp…

9b0e826

…rotocol/cacti-backend into harsh/switch_widgets2yaml

kahkeng approved these changes Jul 2, 2023

View reviewed changes

harshraj172 closed this Jul 2, 2023

harshraj172 reopened this Jul 2, 2023

harshraj172 requested a review from sgzsh269 July 2, 2023 16:58

kahkeng reviewed Jul 2, 2023

View reviewed changes

sgzsh269 approved these changes Jul 3, 2023

View reviewed changes

reordering keys in the read YAML file

17a6fec

harshraj172 and others added 2 commits July 4, 2023 01:23

reordering the params correctly

8a539b3

beautify

eb546db

harshraj172 merged commit 6cb24ad into dev Jul 3, 2023

This was referenced Jul 6, 2023

integration: liquity #134

Open

[WIP] sfrxETH integration #149

Closed

integration: yield protocol #147

Closed

SavingsDAI #122

Open

Lido integration #101

Closed

switched to yaml file for widgets #150

switched to yaml file for widgets #150

Uh oh!

Conversation

harshraj172 commented Jul 1, 2023

Uh oh!

kahkeng left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgzsh269 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kahkeng commented Jul 2, 2023

Uh oh!

kahkeng left a comment

Choose a reason for hiding this comment

Uh oh!

harshraj172 commented Jul 2, 2023

Uh oh!

harshraj172 commented Jul 2, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

harshraj172 commented Jul 2, 2023

Uh oh!

kahkeng commented Jul 2, 2023

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sgzsh269 left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

harshraj172 commented Jul 2, 2023 •

edited

Loading

harshraj172 commented Jul 3, 2023 •

edited

Loading