-
-
Notifications
You must be signed in to change notification settings - Fork 1.9k
enhancement: add support for Playwright's storage_state parameter
#832
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
enhancement: add support for Playwright `storage_state`
VinciGit00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @aflansburg, please add in the folder example examples/extra an example of this and if it's possible add to the other graphs the auth
Hi @VinciGit00 I'll add the example. I tried to add it to all graphs that might leverage the |
@VinciGit00 I added the example and the |
VinciGit00
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it work with each login page?
|
🎉 This PR is included in version 1.33.0-beta.1 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
|
🎉 This PR is included in version 1.33.0 🎉 The release is available on:
Your semantic-release bot 📦🚀 |
TL;DR
The Playwright browser context
storage_stateparameter can be used to provide a path to a JSON state file that can be leveraged for providing session authentication to a scraper leveraging the ChromiumLoader class. As one of the medium-term goals I noted was for handling authentication when using Selenium or Playwright, in the short-term this pull request allows the passing of thestorage_stateparameter in the Playwright loader, allowing for more flexible and secure scraping operations.Example usage:
Imagine a workflow where, in some module, you are using Playwright directly to authenticate a session. You can leverage the state of your browser at the moment after login in future Playwright calls.
You can see at the end of the
_loginmethod that we store the storage_state of the browser context to a file.Then, when using scrapegraph-ai we can pass the path into the graph by including the
storage_stateparameter in thegraph_configso that playwright can use it:Summary of Changes
This pull request includes several changes to improve the functionality and maintainability of the
scrapegraphaipackage, specifically focusing on thechromium.py,abstract_graph.py, andcode_generator_graph.pyfiles. The most important changes include adding support forstorage_state, improving error handling, and reformatting code for better readability.Note some changes were introduced by running the ruff formatter, but do not impact functionality in any way. Please let me know if this is an issue (ruff is really good).
Enhancements to chromium.py:
• Added storage_state parameter to the ChromiumLoader class to support session storage.
• Updated ascrape_playwright and ascrape_with_js_support methods to use storage_state when creating a new browser context.
• Improved error message formatting for better readability.
• Reformatted conditional logic in lazy_load and alazy_load methods for clarity.
Enhancements to abstract_graph.py:
• Reformatted import statements and function definitions for better readability.
• Added storage_state configuration to the AbstractGraph class initialization.
• Improved readability of the _create_llm method by reformatting and simplifying code.
Enhancements to code_generator_graph.py:
• Reformatted import statements and function definitions for better readability.
• Added storage_state configuration to the CodeGeneratorGraph class initialization and graph creation.
• Improved readability of the _create_graph method by reformatting and simplifying code.