-
Notifications
You must be signed in to change notification settings - Fork 17
switched to yaml file for widgets #150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for this! had a few comments
host: localhost:3000 | ||
origins: | ||
- '*' | ||
secret_key: US3e74TCvm0fB6XR0DrA5f7RofyIhR4xQtousxSwcV8= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file shouldn't be deleted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, I missed it.. added
knowledge_base/widgets.yaml
Outdated
@@ -0,0 +1,643 @@ | |||
widget 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think there should be a way to add a list to yaml using hyphens as bullets. this way you don't need to find a unused number and there won't be number conflicts, etc.
maybe it would also be clearer if the name/command is the first field, since it seems to be alphabetically sorted, perhaps could use an underscore like _id: ...
to force this to appear first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, would like to echo that we don't need widget numbers, we could make it a native YAML list of widget objects with the first item of the object being the name
for easy identification.
We can enforce uniqueness by parsing the list and checking on server start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
knowledge_base/widgets.yaml
Outdated
- address | ||
type: object | ||
return_value_description: '' | ||
widget_command: <|display-transfer({token},{amount},{address})|> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think we need this here as it has information that is redundantly encoded. either we keep this and drop the name
and required
fields (have them be generated from parsing the widget command, or vice versa). i think my preference might be for keeping the widget command since it is a concise representation. @sgzsh269 any thoughts as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, one suggestion: I think we should keep the name
and required
fields.. widget_command
might change so it would be easier to make a new widget. Also, constructing the widget_command
manually is prone to errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure that also seems reasonable. i've seen spacing inconsistency also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to remove the widget_command
and have it be generated (as well as OpenAI functions) on server start using the schema so we should keep name
and required
fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
utils/common.py
Outdated
if len(values["return_value_description"].strip()) > 0: | ||
doc += f"Return value description:\n-{values['return_value_description']}\n" | ||
doc += "---\n" | ||
return doc.strip().strip('---').strip() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one way to avoid this is to put things into a list initially, then do '---\n'.join(docs)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good stuff, added my comments.
knowledge_base/widgets.yaml
Outdated
@@ -0,0 +1,643 @@ | |||
widget 1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, would like to echo that we don't need widget numbers, we could make it a native YAML list of widget objects with the first item of the object being the name
for easy identification.
We can enforce uniqueness by parsing the list and checking on server start.
knowledge_base/widgets.yaml
Outdated
- address | ||
type: object | ||
return_value_description: '' | ||
widget_command: <|display-transfer({token},{amount},{address})|> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it might be better to remove the widget_command
and have it be generated (as well as OpenAI functions) on server start using the schema so we should keep name
and required
fields.
utils/common.py
Outdated
FUNCTIONS = json.load(f) | ||
def yaml2functions(file_path): | ||
functions = [] | ||
with open(file_path, 'r') as file: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should add a validation check for uniqueness on widget names defined in the YAML file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
utils/common.py
Outdated
return doc.strip().strip('---').strip() | ||
|
||
yaml_file_path = f"{os.getcwd()}/knowledge_base/widgets.yaml" | ||
WIDGETS = yaml2widgetsdoc(yaml_file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kahkeng @harshraj172 One point to consider is whether we want to index the widgets in text or YAML schema, if the YAML schema is the source of truth and has extensive metadata, should that be in the index instead of text? Not sure how this would affect the search quality/recall metric
If we do index the YAML file, then the widget names from top-k results could be used to lookup the corresponding OpenAI function schema and compact text schema.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think the yaml and current widgets.txt format is pretty similar, so embedding perf should be similar. but in general, i think a change to indexing format could be a separate experiment/PR, we can use finetune/validate.py to quantify if there's any perf hit as well. so this PR keeping the status quo should be ok.
i'm not sure i understand what your point is about "If we do index the YAML file, then the widget names from top-k results could be used to lookup the corresponding OpenAI function schema and compact text schema." Could you explain that further?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, makes sense to keep the existing embedding behavior as-is.
My point was regarding how the widget search results in YAML format could be mapped to GPT functions and text schema at LLM query time, so the _name_
field in the search results could be used to lookup the corresponding GPT function and text schema that presumably would have all been constructed and cached on server start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the chat module (https://github.com/yieldprotocol/cacti-backend/blob/dev/chat/chatgpt_function_call.py#L147-L159), when we get widgets back from weaviate, are we already doing something similar to what you are suggesting?
docker-compose up | ||
# ensure schema is created/up-to-date | ||
ENV_TAG=local ./db_schema_sync.sh | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I observed that the general setup instructions above is missing the update widget index command, could you please add that, so:-
- Remove the duplicate local config copy line out here https://github.com/yieldprotocol/cacti-backend/pull/150/files#diff-b335630551682c19a781afebcf4d07bf978fb1f8ac04c6bf87428ed5106870f5R14
- Replace that with
python -m scripts.check_update_widget_index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
…rotocol/cacti-backend into harsh/switch_widgets2yaml
Looks good, except I ran validate.py on your branch, and got these results:
The param accuracy is much lower than before. Digging into some examples, looks like ordering of input variables is incorrect. Looking deeper, I think your functions json is missing the 'required' param which describes the order?
Perhaps after fixing you can verify you can reach the same/similar results to the spreadsheet for at least ChatGPTFunctions and BasicAgent? https://docs.google.com/spreadsheets/d/1QnUL7bEHbrGUFrL3GKa5rjpPtwNG2WLy4t4NzQ4GHWE/edit#gid=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides the accuracy issue, things look good for me. Will defer to Sagar if there's anything else from his review.
|
I see this is interesting.. actually my FUNCTIONS variable has 'required' param.. Let me dig into it |
So the issue is that the FUNCTIONS variable should have the function parameters arranged in ['parameters']['properties'] in the same order as given in ['parameters']['required'], but due to the alphabetical arrangement of keys in YAML it got rearranged. |
Hmm I don't think this is the main issue. I looked at the original GPT functions blog post, their API takes the "required" param. But I think your code that reads yaml into functions, we are not setting this param? |
|
||
with open(f"{os.getcwd()}/knowledge_base/functions.json", 'r') as f: | ||
FUNCTIONS = json.load(f) | ||
def widgets_yaml2functions(widgets_lst): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should probably be reading the "required" param as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left few more comments to think about, will approve. Thanks for putting this together!
Just make sure to do some quick end-to-end smoke tests locally by prompting to trigger basic widgets such as swap, NFT, Aave etc. via the FE
- tokenToBuy | ||
- transactionKeyword | ||
- amount | ||
type: object |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@harshraj172 Why is the type: object
declaration necessary? Isn't it redundant as the YAML type of the parameters
is inherently an object cc @kahkeng
utils/common.py
Outdated
docs = [] | ||
for values in widgets_lst: | ||
doc = "" | ||
widget_command = f"<|{values['_name_'].replace('_', '-')}(" + "{" + "},{".join(values['parameters']['required']) + "})|>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: try to optimize for readability as at times being succinct can make it hard to maintain code in the long run
So for this case, it would help to break it up into 2 parts for widget command construction - name and parameters
widget_command_name = values['_name_'].replace('_', '-')
widget_command_params = ','.join(['{' + p + '}' for p in values['parameters']['required']])
widget_command = f"<|{widget_command_name}({widget_command_params})|>"
Also, why are we only selecting required parameters, shouldn't optional parameters also be included here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, we have put all the parameters under the 'required' key. In future, we might have to declare some optional parameters and the way to do that would be just don't put the optional parameters in the list under the 'required' key. So ideally, we should be reading the paramter names from 'parameters.properties', but the problem will be that due to being saved in YAML format the keys in 'parameters.properties' gets rearranged alphabetically and we will lose the correct ordering of functions params.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should allow for all params and not just the required ones. As more widgets get added, there would be cases of optional params and people would follow our YAML schema and accordingly populate required
How about we define properties
as a list to preserve order and have a name
field for the param?
@kahkeng any thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah we could do that in the interim as well. it would probably just be a hacky stop-gap, until we properly support optional args/kwargs on both backend and frontend (for display- widgets).
right now all the arguments are required, so any of the (hacky) solutions proposed are probably all equivalent. however, once we want to support optional arguments, then certain solutions become less necessary, like having properties be a list, which seems like a less natural way to express a dict. hence i think my slight preference would be for a hacky solution that is in code that can be removed, rather than in data which requires a data format adjustment, should we not need the hacky fix any more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but given we've had a lot of discussion so far, i'm in favor of just picking one reasonable solution and moving to get this change out, and we can debate further offline with new PR as needed, if optional arguments do arise (i'm not sure how often this is needed in practice, we could probably use system prompts to force all arguments to be specified).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of use-cases that may have optional params, need to think more about the range, it would be similar to queries like 'price of BTC' for which a user doesn't specify quote currency so it should default to USD
We may be able to use system prompts to handle such defaults in the interim as we figure out a robust approach.
utils/common.py
Outdated
return doc.strip().strip('---').strip() | ||
|
||
yaml_file_path = f"{os.getcwd()}/knowledge_base/widgets.yaml" | ||
WIDGETS = yaml2widgetsdoc(yaml_file_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, makes sense to keep the existing embedding behavior as-is.
My point was regarding how the widget search results in YAML format could be mapped to GPT functions and text schema at LLM query time, so the _name_
field in the search results could be used to lookup the corresponding GPT function and text schema that presumably would have all been constructed and cached on server start
Yea, I should do some good tests before raising a PR. This got too many errors. |
Oh I see, so "required" is nested one more level and doesn't seem to make a difference. So I checked the blog post again and it seems the response from GPT is a json dict for input keys and values, so we could use that and the ordering in "required" that we have in the yaml to produce the correct order ourselves. That could be the way we encode the ordering ourselves more reliably anyway, than relying on key order for a dictionary. Thoughts? |
There's also this if we are creating the yaml from a dump somewhere: https://stackoverflow.com/questions/16782112/can-pyyaml-dump-dict-items-in-non-alphabetical-order/55171433#55171433 This could let us keep the key order and the required order in sync to reduce mistakes a developer might have. Finally, one more thought: we can also, at read time of the yaml, substitute the properties dict with an OrderedDict using the ordering in the "required" list. So GPT gets the dictionary with keys in the expected order. Ok another thought: we could also embrace the kwarg method for specifying parameters and values, no need to make widget commands be unnamed arg style. Probably there's some places to change to support this but it might be the ideal if we want to support optional params as well in future. What do people think? |
The 3rd case is the ideal one, but we need to change the widget_command syntax. We should probably do that. But would that be reasonable to use the 1st method now and do the 3rd one in the next PR? Also using the 1st one we can solve the problem of optional params too as @sgzsh269 mentioned here. |
Right now we call widgets with something like I think fo rnow, either first or second option is fine with me, assuming either of them would help. I think having more constraints/invariants is better, so I would be in favor of either an assert that the order specified in "required" matches the properties ordering, or keeping them alphabetical order / undefined order and then just creating that dictionary using the "required" ordering (second method). That's my preference. Is the yaml right now getting constructed in a notebook somewhere? With the first method, how do we ensure anyone who modifies it will remember to use sort_keys=False? What if they copy-paste or just insert properties in some order that doesn't match our actual widget or the "required" field's order? Should we use |
Yea, this will happen with the sort_keys method but the problem with the 1st is that all the params might not be there in 'required'. As 'required' only takes in required params not the optional one so the reordering of optional params might not happen. |
|
Resolution to issue #128