Skip to content

Commit 891c95b

Browse files
committed
Merge remote-tracking branch 'origin/main' into close-inactive-contexts
2 parents b1079d7 + 848955e commit 891c95b

19 files changed

+465
-126
lines changed

.bumpversion.cfg

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 0.0.37
2+
current_version = 0.0.39
33
commit = True
44
tag = True
55

.github/workflows/tests.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,11 @@ jobs:
2525
with:
2626
python-version: ${{ matrix.python-version }}
2727

28+
- name: Set up node
29+
uses: actions/setup-node@v4
30+
with:
31+
node-version: 18
32+
2833
- name: Install tox
2934
run: pip install tox
3035

.gitignore

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,8 @@ coverage.xml
1717
coverage-*.xml
1818
coverage-asyncio/
1919
coverage-twisted/
20+
21+
# nodejs stuff
22+
node_modules/
23+
package-lock.json
24+
package.json

README.md

Lines changed: 73 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -168,14 +168,17 @@ Type `Optional[str]`, default `None`
168168
The endpoint of a remote Chromium browser to connect using the
169169
[Chrome DevTools Protocol](https://chromedevtools.github.io/devtools-protocol/),
170170
via [`BrowserType.connect_over_cdp`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-connect-over-cdp).
171+
172+
```python
173+
PLAYWRIGHT_CDP_URL = "http://localhost:9222"
174+
```
175+
171176
If this setting is used:
172177
* all non-persistent contexts will be created on the connected remote browser
173178
* the `PLAYWRIGHT_LAUNCH_OPTIONS` setting is ignored
174179
* the `PLAYWRIGHT_BROWSER_TYPE` setting must not be set to a value different than "chromium"
175180

176-
```python
177-
PLAYWRIGHT_CDP_URL = "http://localhost:9222"
178-
```
181+
**This settings CANNOT be used at the same time as `PLAYWRIGHT_CONNECT_URL`**
179182

180183
### `PLAYWRIGHT_CDP_KWARGS`
181184
Type `dict[str, Any]`, default `{}`
@@ -192,6 +195,41 @@ PLAYWRIGHT_CDP_KWARGS = {
192195
}
193196
```
194197

198+
### `PLAYWRIGHT_CONNECT_URL`
199+
Type `Optional[str]`, default `None`
200+
201+
URL of a remote Playwright browser instance to connect using
202+
[`BrowserType.connect`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-connect).
203+
204+
From the upstream Playwright docs:
205+
> When connecting to another browser launched via
206+
> [`BrowserType.launchServer`](https://playwright.dev/docs/api/class-browsertype#browser-type-launch-server)
207+
> in Node.js, the major and minor version needs to match the client version (1.2.3 → is compatible with 1.2.x).
208+
209+
```python
210+
PLAYWRIGHT_CONNECT_URL = "ws://localhost:35477/ae1fa0bc325adcfd9600d9f712e9c733"
211+
```
212+
213+
If this setting is used:
214+
* all non-persistent contexts will be created on the connected remote browser
215+
* the `PLAYWRIGHT_LAUNCH_OPTIONS` setting is ignored
216+
217+
**This settings CANNOT be used at the same time as `PLAYWRIGHT_CDP_URL`**
218+
219+
### `PLAYWRIGHT_CONNECT_KWARGS`
220+
Type `dict[str, Any]`, default `{}`
221+
222+
Additional keyword arguments to be passed to
223+
[`BrowserType.connect`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-connect)
224+
when using `PLAYWRIGHT_CONNECT_URL`. The `ws_endpoint` key is always ignored,
225+
`PLAYWRIGHT_CONNECT_URL` is used instead.
226+
227+
```python
228+
PLAYWRIGHT_CONNECT_KWARGS = {
229+
"slow_mo": 1000,
230+
"timeout": 10 * 1000
231+
}
232+
```
195233

196234
### `PLAYWRIGHT_CONTEXTS`
197235
Type `dict[str, dict]`, default `{}`
@@ -286,6 +324,17 @@ def custom_headers(
286324
PLAYWRIGHT_PROCESS_REQUEST_HEADERS = custom_headers
287325
```
288326

327+
### `PLAYWRIGHT_RESTART_DISCONNECTED_BROWSER`
328+
Type `bool`, default `True`
329+
330+
Whether the browser will be restarted if it gets disconnected, for instance if the local
331+
browser crashes or a remote connection times out.
332+
Implemented by listening to the
333+
[`disconnected` Browser event](https://playwright.dev/python/docs/api/class-browser#browser-event-disconnected),
334+
for this reason it does not apply to persistent contexts since
335+
[BrowserType.launch_persistent_context](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch-persistent-context)
336+
returns the context directly.
337+
289338
### `PLAYWRIGHT_MAX_PAGES_PER_CONTEXT`
290339
Type `int`, defaults to the value of Scrapy's `CONCURRENT_REQUESTS` setting
291340

@@ -459,14 +508,16 @@ This key could be used in conjunction with `playwright_include_page` to make a c
459508
requests using the same page. For instance:
460509

461510
```python
511+
from playwright.async_api import Page
512+
462513
def start_requests(self):
463514
yield scrapy.Request(
464515
url="https://httpbin.org/get",
465516
meta={"playwright": True, "playwright_include_page": True},
466517
)
467518

468519
def parse(self, response, **kwargs):
469-
page = response.meta["playwright_page"]
520+
page: Page = response.meta["playwright_page"]
470521
yield scrapy.Request(
471522
url="https://httpbin.org/headers",
472523
callback=self.parse_headers,
@@ -507,6 +558,20 @@ def parse(self, response, **kwargs):
507558
# {'issuer': 'DigiCert TLS RSA SHA256 2020 CA1', 'protocol': 'TLS 1.3', 'subjectName': 'www.example.org', 'validFrom': 1647216000, 'validTo': 1678838399}
508559
```
509560

561+
### `playwright_suggested_filename`
562+
Type `Optional[str]`, read only
563+
564+
The value of the [`Download.suggested_filename`](https://playwright.dev/python/docs/api/class-download#download-suggested-filename)
565+
attribute when the response is the binary contents of a
566+
[download](https://playwright.dev/python/docs/downloads) (e.g. a PDF file).
567+
Only available for responses that only caused a download. Can be accessed
568+
in the callback via `response.meta['playwright_suggested_filename']`
569+
570+
```python
571+
def parse(self, response, **kwargs):
572+
print(response.meta["playwright_suggested_filename"])
573+
# 'sample_file.pdf'
574+
```
510575

511576
## Receiving Page objects in callbacks
512577

@@ -525,6 +590,7 @@ necessary the spider job could get stuck because of the limit set by the
525590
`PLAYWRIGHT_MAX_PAGES_PER_CONTEXT` setting.
526591

527592
```python
593+
from playwright.async_api import Page
528594
import scrapy
529595

530596
class AwesomeSpiderWithPage(scrapy.Spider):
@@ -539,7 +605,7 @@ class AwesomeSpiderWithPage(scrapy.Spider):
539605
)
540606

541607
def parse_first(self, response):
542-
page = response.meta["playwright_page"]
608+
page: Page = response.meta["playwright_page"]
543609
return scrapy.Request(
544610
url="https://example.com",
545611
callback=self.parse_second,
@@ -548,13 +614,13 @@ class AwesomeSpiderWithPage(scrapy.Spider):
548614
)
549615

550616
async def parse_second(self, response):
551-
page = response.meta["playwright_page"]
617+
page: Page = response.meta["playwright_page"]
552618
title = await page.title() # "Example Domain"
553619
await page.close()
554620
return {"title": title}
555621

556622
async def errback_close_page(self, failure):
557-
page = failure.request.meta["playwright_page"]
623+
page: Page = failure.request.meta["playwright_page"]
558624
await page.close()
559625
```
560626

docs/changelog.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,18 @@
11
# scrapy-playwright changelog
22

3+
### [v0.0.39](https://github.com/scrapy-plugins/scrapy-playwright/releases/tag/v0.0.39) (2024-07-11)
4+
5+
* Return proper status and headers for downloads (#293)
6+
* Restart on browser crash (#295)
7+
* Override method and/or body only for the first matching request (#297)
8+
9+
10+
### [v0.0.38](https://github.com/scrapy-plugins/scrapy-playwright/releases/tag/v0.0.38) (2024-07-06)
11+
12+
* Fix freezing on responses with status 204 (#292)
13+
* Connect to remote browser using BrowserType.connect (#283)
14+
15+
316
### [v0.0.37](https://github.com/scrapy-plugins/scrapy-playwright/releases/tag/v0.0.37) (2024-07-03)
417

518
* Improve Windows concurrency (#286)

examples/books.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
from pathlib import Path
44
from typing import Generator, Optional
55

6+
from playwright.async_api import Page
67
from scrapy import Spider
78
from scrapy.http.response import Response
89

@@ -51,7 +52,7 @@ def parse(self, response: Response, current_page: Optional[int] = None) -> Gener
5152

5253
async def parse_book(self, response: Response) -> dict:
5354
url_sha256 = hashlib.sha256(response.url.encode("utf-8")).hexdigest()
54-
page = response.meta["playwright_page"]
55+
page: Page = response.meta["playwright_page"]
5556
await page.screenshot(
5657
path=Path(__file__).parent / "books" / f"{url_sha256}.png", full_page=True
5758
)

examples/contexts.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
from pathlib import Path
22

3+
from playwright.async_api import Page
34
from scrapy import Spider, Request
45

56

@@ -96,7 +97,7 @@ def start_requests(self):
9697
)
9798

9899
async def parse(self, response, **kwargs):
99-
page = response.meta["playwright_page"]
100+
page: Page = response.meta["playwright_page"]
100101
context_name = response.meta["playwright_context"]
101102
storage_state = await page.context.storage_state()
102103
await page.close()

examples/max_pages.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
from playwright.async_api import Page
12
from scrapy import Spider, Request
23

34

@@ -45,5 +46,5 @@ def parse(self, response, **kwargs):
4546
return {"url": response.url}
4647

4748
async def errback(self, failure):
48-
page = failure.request.meta["playwright_page"]
49+
page: Page = failure.request.meta["playwright_page"]
4950
await page.close()

examples/storage.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
from playwright.async_api import Page
12
from scrapy import Spider, Request
23
from scrapy_playwright.page import PageMethod
34

@@ -27,7 +28,7 @@ def start_requests(self):
2728
)
2829

2930
async def parse(self, response, **kwargs):
30-
page = response.meta["playwright_page"]
31+
page: Page = response.meta["playwright_page"]
3132
storage_state = await page.context.storage_state()
3233
await page.close()
3334
return {

scrapy_playwright/__init__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1 +1 @@
1-
__version__ = "0.0.37"
1+
__version__ = "0.0.39"

0 commit comments

Comments
 (0)