Skip to content

[pull] master from scrapy-plugins:master #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 73 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
e87a056
Max pages per context (#16)
elacuesta Mar 12, 2022
a46e357
Response.ip_address attribute (#67)
elacuesta Mar 12, 2022
ba87c60
Response security details (#68)
elacuesta Mar 12, 2022
a6d99af
Bump version: 0.0.10 → 0.0.11
elacuesta Mar 12, 2022
023b35d
Avoid confusing cleanup exception
elacuesta Mar 14, 2022
cc27195
Add warning for non-PageCoroutine objs in meta.playwright_page_corout…
elacuesta Mar 15, 2022
7e678ce
Bump version: 0.0.11 → 0.0.12
elacuesta Mar 15, 2022
dc0666b
PageCoroutine checks (#69)
elacuesta Mar 16, 2022
0539cb6
Update Codecov uploader
elacuesta Mar 16, 2022
dde11c1
Move handler method
elacuesta Mar 17, 2022
fe9244a
Add python_requires to setup, update readme
elacuesta Mar 20, 2022
4a8911c
Pylint check
elacuesta Mar 23, 2022
d79aa5b
Small cleanup
elacuesta Mar 23, 2022
d08a944
Avoid UnicodeEncodeError: try multiple body encodings (#72)
elacuesta Mar 24, 2022
0bde3e6
Ability to abort requests via setting (#63)
elacuesta Mar 24, 2022
999dc48
Bump version: 0.0.12 → 0.0.13
elacuesta Mar 24, 2022
9ee07aa
Update changelog
elacuesta Mar 26, 2022
a06b3b1
Rename PageCoroutine -> PageMethod (#70)
elacuesta Mar 26, 2022
b7a7f41
Bump version: 0.0.13 → 0.0.14
elacuesta Mar 26, 2022
0e998e2
Close page in example spider
elacuesta Mar 26, 2022
94eced6
Restore exception handling in test helper
elacuesta Mar 27, 2022
418e3fa
Readme updates
elacuesta Mar 27, 2022
af1bfe0
Remove deprecated PLAYWRIGHT_CONTEXT_ARGS setting (#75)
elacuesta Mar 29, 2022
8837603
CI: bump black version
elacuesta Mar 30, 2022
7e34a4c
Update README.md (#77)
wRAR Apr 8, 2022
f75d48d
Warn on failed requests (#74)
elacuesta Apr 15, 2022
22e1800
PLAYWRIGHT_ABORT_REQUEST: accept coroutine functions (#86)
elacuesta May 6, 2022
afef146
Add note about header processing
elacuesta May 6, 2022
fa7d5f1
Accept sync functions to process headers (#87)
elacuesta May 6, 2022
251bdc7
Set playwright_page request meta key early (#91)
elacuesta May 9, 2022
256bc63
Update changelog for 0.0.15
elacuesta May 9, 2022
e32c250
Bump version: 0.0.14 → 0.0.15
elacuesta May 9, 2022
3376a13
Update proxy section on readme
elacuesta May 9, 2022
a632118
Fix playwright header override (#92)
clippered May 11, 2022
b36a79f
Use new headers API from Playwright 1.15 (#93)
elacuesta May 14, 2022
f679106
Update changelog for 0.0.16
elacuesta May 14, 2022
1f01ccb
Bump version: 0.0.15 → 0.0.16
elacuesta May 14, 2022
7785789
Readme: add note about graphic setup under WSL
elacuesta May 15, 2022
b5f6f56
Remove requirements-dev.txt
elacuesta May 15, 2022
9d612df
Rename aux function
elacuesta May 16, 2022
c53786c
Update changelog.md
elacuesta May 16, 2022
c32d720
Persistent contexts (#94)
elacuesta May 18, 2022
2e66b9e
Limit concurrent context count (PLAYWRIGHT_MAX_CONTEXTS setting) (#95)
elacuesta May 22, 2022
3af912d
Update changelog for v0.0.17
elacuesta May 22, 2022
268e533
Bump version: 0.0.16 → 0.0.17
elacuesta May 22, 2022
1c5f96e
Update readme: known issues
elacuesta May 22, 2022
414c3d2
Add example for the PLAYWRIGHT_PROCESS_REQUEST_HEADERS setting
elacuesta Jun 18, 2022
4f29612
Always override request headers (#98)
elacuesta Jun 18, 2022
820dd4c
Bump version: 0.0.17 → 0.0.18
elacuesta Jun 18, 2022
593ab31
Update changelog for v0.0.18
elacuesta Jun 18, 2022
1942ebb
Fix camel case arguments in readme
elacuesta Jun 24, 2022
0fb9d01
Update link in readme
elacuesta Jun 24, 2022
55102c5
Add caution note about closing pages
elacuesta Jun 24, 2022
86d8e88
Readme: add link to zyte-smartproxy-playwright (#101)
elacuesta Jun 28, 2022
683ee5f
Add supported meta keys to readme
elacuesta Jul 16, 2022
bd9bd95
add playwright page.goto kwargs (#54)
Pandaaaa906 Jul 17, 2022
a7ab8ed
Update changelog for v0.0.19
elacuesta Jul 17, 2022
dda8c9b
Bump version: 0.0.18 → 0.0.19
elacuesta Jul 17, 2022
bfd1a40
Update exception handling examples
elacuesta Jul 27, 2022
2b41b67
CI: Update macos version
elacuesta Jul 27, 2022
f3f7b2d
Update docs about playwright_include_page
elacuesta Jul 27, 2022
bf5c56e
Doc updates about Receiving Page objects in callbacks
elacuesta Jul 27, 2022
253d7bd
More doc updates about Receiving Page objects in callbacks
elacuesta Jul 27, 2022
0e1ac0e
Adjust docstring
elacuesta Jul 31, 2022
5f54ec4
Don't break if Page.goto returns None (#113)
elacuesta Aug 1, 2022
609cbed
Add scrapy link to readme
elacuesta Aug 1, 2022
ee1b01a
Update changelog for 0.0.20
elacuesta Aug 3, 2022
2b911a0
Bump version: 0.0.19 → 0.0.20
elacuesta Aug 3, 2022
6ddd4bd
Catch TypeError when getting server IP address
elacuesta Aug 8, 2022
73097d9
Update changelog for v0.0.21
elacuesta Aug 8, 2022
ae44b61
Bump version: 0.0.20 → 0.0.21
elacuesta Aug 8, 2022
60cb8cc
Add link to section in readme
elacuesta Aug 8, 2022
6bc8e82
Document the command to install browsers
elacuesta Aug 24, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Readme updates
  • Loading branch information
elacuesta committed Mar 27, 2022
commit 418e3face623db1a18db49331637f64a475b6049
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,12 +65,12 @@ TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
`scrapy-playwright` accepts the following settings:

* `PLAYWRIGHT_BROWSER_TYPE` (type `str`, default `chromium`)
The browser type to be launched. Valid values are (`chromium`, `firefox`, `webkit`).
The browser type to be launched, e.g. `chromium`, `firefox`, `webkit`.

* `PLAYWRIGHT_LAUNCH_OPTIONS` (type `dict`, default `{}`)

A dictionary with options to be passed when launching the Browser.
See the docs for [`BrowserType.launch`](https://playwright.dev/python/docs/api/class-browsertype#browser_typelaunchkwargs).
See the docs for [`BrowserType.launch`](https://playwright.dev/python/docs/api/class-browsertype#browser-type-launch).

* `PLAYWRIGHT_CONTEXTS` (type `dict[str, dict]`, default `{}`)

Expand All @@ -89,13 +89,13 @@ TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"
```
If no contexts are defined, a default context (called `default`) is created.
The arguments passed here take precedence over the ones defined in `PLAYWRIGHT_CONTEXT_ARGS`.
See the docs for [`Browser.new_context`](https://playwright.dev/python/docs/api/class-browser#browsernew_contextkwargs).
See the docs for [`Browser.new_context`](https://playwright.dev/python/docs/api/class-browser#browser-new-context).

* `PLAYWRIGHT_DEFAULT_NAVIGATION_TIMEOUT` (type `Optional[float]`, default `None`)

The timeout used when requesting pages by Playwright. If `None` or unset,
the default value will be used (30000 ms at the time of writing this).
See the docs for [BrowserContext.set_default_navigation_timeout](https://playwright.dev/python/docs/api/class-browsercontext#browser_contextset_default_navigation_timeouttimeout).
See the docs for [BrowserContext.set_default_navigation_timeout](https://playwright.dev/python/docs/api/class-browsercontext#browser-context-set-default-navigation-timeout).

* `PLAYWRIGHT_PROCESS_REQUEST_HEADERS` (type `Union[Callable, str]`, default `scrapy_playwright.headers.use_scrapy_headers`)

Expand Down Expand Up @@ -175,7 +175,7 @@ does not match the running Browser. If you prefer the `User-Agent` sent by
default by the specific browser you're using, set the Scrapy user agent to `None`.


## Receiving the Page object in the callback
## Receiving Page objects in callbacks

Specifying a non-False value for the `playwright_include_page` `meta` key for a
request will result in the corresponding `playwright.async_api.Page` object
Expand Down Expand Up @@ -260,7 +260,7 @@ on HTTP Proxies.

## Multiple browser contexts

Multiple [browser contexts](https://playwright.dev/python/docs/core-concepts/#browser-contexts)
Multiple [browser contexts](https://playwright.dev/python/docs/browser-contexts)
to be launched at startup can be defined via the `PLAYWRIGHT_CONTEXTS` [setting](#settings).

### Choosing a specific context for a request
Expand All @@ -278,7 +278,7 @@ yield scrapy.Request(

If the context specified in the `playwright_context` meta key does not exist, it will be created.
You can specify keyword arguments to be passed to
[`Browser.new_context`](https://playwright.dev/python/docs/api/class-browser#browsernew_contextkwargs)
[`Browser.new_context`](https://playwright.dev/python/docs/api/class-browser#browser-new-context)
in the `playwright_context_kwargs` meta key:

```python
Expand Down Expand Up @@ -347,7 +347,7 @@ Represents a method to be called (and awaited if necessary) on a
are passed when calling such method. The return value
will be stored in the `PageMethod.result` attribute.

For instance,
For instance:
```python
def start_requests(self):
yield Request(
Expand Down Expand Up @@ -375,7 +375,8 @@ def start_requests(self):

async def parse(self, response):
page = response.meta["playwright_page"]
await page.screenshot(path="example.png", full_page=True)
screenshot = await page.screenshot(path="example.png", full_page=True)
# screenshot contains the image's bytes
await page.close()
```

Expand All @@ -393,7 +394,7 @@ a navigation (e.g. a `click` on a link), the `Response.url` attribute will point
new URL, which might be different from the request's URL.


## Page events
## Handling page events

A dictionary of Page event handlers can be specified in the `playwright_page_event_handlers`
[Request.meta](https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.meta) key.
Expand Down Expand Up @@ -428,11 +429,11 @@ class EventSpider(scrapy.Spider):
logging.info(f"Received response with URL {response.url}")
```

See the [upstream `Page` docs](https://playwright.dev/python/docs/api/class-page/) for a list of
See the [upstream `Page` docs](https://playwright.dev/python/docs/api/class-page) for a list of
the accepted events and the arguments passed to their handlers.

**Note**: keep in mind that, unless they are
[removed later](https://playwright.dev/python/docs/events/#addingremoving-event-listener),
[removed later](https://playwright.dev/python/docs/events#addingremoving-event-listener),
these handlers will remain attached to the page and will be called for subsequent
downloads using the same page. This is usually not a problem, since by default
requests are performed in single-use pages.
Expand Down Expand Up @@ -508,7 +509,7 @@ For more examples, please see the scripts in the [examples](examples) directory.
Refer to the [Proxy support](#proxy-support) section for more information.


##  Deprecation policy
## Deprecation policy

Deprecated features will be supported for at least six months
following the release that deprecated them. After that, they
Expand Down