You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
constdownloadUrl='http://example.com';// The url to download is a little bit private. Can I send it to you via email?constdestination='./save/';constscrape=require('website-scraper');constoptions={urls: [downloadUrl],directory: destination,sources: [{selector: 'a',attr: 'href'}],urlFilter: (url)=>newURL(url).host===newURL(downloadUrl).host,filenameGenerator: 'bySiteStructure',requestConcurrency: 10,ignoreErrors: true,plugins: [{apply: (registerAction)=>{registerAction('onResourceError',({ resource, error })=>console.log(`Resource ${resource.url} has error ${error}`));}}]};scrape(options).then((result)=>console.log(result));
Description
Error thrown by the request module stop the whole process and ignore the handler registered to onResourceError (plugin).
Expected behavior:
The process continue running (because the ignoreErrors option was set to true) and print an error message to the console (because of the registered plugin).
Actual behavior:
The whole proccess is stopped and the onResourceError handler is ignored.
$ node --max-old-space-size=10240 .
C:\Users\Max\Documents\Download\node_modules\request\request.js:1147
response.body = strings.join('')
^
RangeError: Invalid string length
at Array.join (<anonymous>)
at Request.<anonymous> (C:\Users\Max\Documents\Download\node_modules\request\request.js:1147:31)
at Request.emit (events.js:315:20)
at IncomingMessage.<anonymous> (C:\Users\Max\Documents\Download\node_modules\request\request.js:1083:12)
at Object.onceWrapper (events.js:421:28)
at IncomingMessage.emit (events.js:327:22)
at endReadableNT (_stream_readable.js:1225:12)
at processTicksAndRejections (internal/process/task_queues.js:84:21)
The text was updated successfully, but these errors were encountered:
Sorry for late response and thank you for sharing issue
To be honest I'm not sure that downloading so large file will be possible because now module stores everything in memory.
I can suggest only to exclude such files from download link with urlFilter
As for error from request module not handled by onResourceError method - I'll need to take closer look on it, looks like a bug
Configuration
version: [email protected]
options:
Description
Error thrown by the request module stop the whole process and ignore the handler registered to onResourceError (plugin).
Expected behavior:
The process continue running (because the ignoreErrors option was set to true) and print an error message to the console (because of the registered plugin).
Actual behavior:
The whole proccess is stopped and the onResourceError handler is ignored.
The text was updated successfully, but these errors were encountered: