Skip to content

Commit b5cf12e

Browse files
committed
further lint fixes
fix lint issues reword blockquote to docusaurus admonition
1 parent 48df761 commit b5cf12e

File tree

4 files changed

+7
-5
lines changed

4 files changed

+7
-5
lines changed

sources/academy/tutorials/node_js/caching_responses_in_puppeteer.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ After implementing this code, we can run the scraper again.
9797

9898
![Good run results](./images/good-run-results.png)
9999

100-
Looking at the statistics, caching responses in Puppeteer brought the traffic down from 177MB to 13.4MB, which is a reduction of data transfer by 92%. The related screenshots can be found [here](https://my.apify.com/storage/key-value/iWQ3mQE2XsLA2eErL).
100+
Looking at the statistics, caching responses in Puppeteer brought the traffic down from 177MB to 13.4MB, which is a reduction of data transfer by 92%. The related screenshots can be found [in the Apify storage](https://my.apify.com/storage/key-value/iWQ3mQE2XsLA2eErL).
101101

102102
It did not speed up the crawler, but that is only because the crawler is set to wait until the network is nearly idle, and CNN has a lot of tracking and analytics scripts that keep the network busy.
103103

sources/academy/tutorials/node_js/filter_blocked_requests_using_sessions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ const gotoFunction = async ({ request, page }) => {
180180
};
181181
```
182182

183-
Now we have access to the session in the `handlePageFunction` and the rest of the logic is the same as in the first example. We extract the session from the userData, try/catch the whole code and on success we add the session and on error we delete it. Also it is useful to retire the browser completely (check [here](https://docs.apify.com/academy/node-js/handle-blocked-requests-puppeteer) for reference) since the other requests will probably have similar problem.
183+
Now we have access to the session in the `handlePageFunction` and the rest of the logic is the same as in the first example. We extract the session from the userData, try/catch the whole code and on success we add the session and on error we delete it. Also it is useful to retire the browser completely (check the [handling blocked requests guide](/academy/node-js/handle-blocked-requests-puppeteer) for reference) since the other requests will probably have similar problem.
184184

185185
```js
186186
const handlePageFunction = async ({ request, page, puppeteerPool }) => {

sources/academy/tutorials/node_js/processing_multiple_pages_web_scraper.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -9,9 +9,11 @@ Sometimes you need to process the same URL several times, but each time with a d
99

1010
Let's illustrate a solution to this problem by creating a scraper which starts with an array of keywords and inputs each of them to Google, one by one. Then it retrieves the results.
1111

12-
> This isn't an efficient solution to searching keywords on Google. You could directly enqueue search URLs like `https://www.google.cz/search?q=KEYWORD`.
12+
:::note Tutorial focus
1313

14-
> Solving a common problem with scraper automatically deduplicating the same URLs.
14+
This tutorial demonstrates how to handle a common scenario where scrapers automatically deduplicate URLs. For the most efficient Google searches in production, directly enqueue search URLs like `https://www.google.cz/search?q=KEYWORD` instead of the form-submission approach shown here.
15+
16+
:::
1517

1618
First, we need to start the scraper on the page from which we're going to do our enqueuing. To do that, we create one start URL with the label "enqueue" and URL "https://example.com/". Now we can proceed to enqueue all the pages. The first part of our `pageFunction` will look like this:
1719

sources/academy/webscraping/anti_scraping/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,7 @@ Because we here at Apify scrape for a living, we have discovered many popular an
111111

112112
This is the most straightforward and standard protection, which is mainly implemented to prevent DDoS attacks, but it also works for blocking scrapers. Websites using rate limiting don't allow to more than some defined number of requests from one IP address in a certain time span. If the max-request number is low, then there is a high potential for false-positive due to IP address uniqueness, such as in large companies where hundreds of employees can share the same IP address.
113113

114-
> Learn more about rate limiting [here](./techniques/rate_limiting.md)
114+
> Learn more about rate limiting in our [rate limiting guide](./techniques/rate_limiting.md)
115115
116116
### Header checking
117117

0 commit comments

Comments
 (0)