Skip to content

Error thrown by the request module stop the process, even if ignoreErrors was set to true #415

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
max-hk opened this issue Jul 17, 2020 · 3 comments
Labels

Comments

@max-hk
Copy link

max-hk commented Jul 17, 2020

Configuration

version: [email protected]

options:

const downloadUrl = 'http://example.com'; // The url to download is a little bit private. Can I send it to you via email?
const destination = './save/';

const scrape = require('website-scraper');
const options = {
  urls: [downloadUrl],
  directory: destination,
  sources: [{ selector: 'a', attr: 'href' }],
  urlFilter: (url) => new URL(url).host === new URL(downloadUrl).host,
  filenameGenerator: 'bySiteStructure',
  requestConcurrency: 10,
  ignoreErrors: true,
  plugins: [{
    apply: (registerAction) => {
      registerAction('onResourceError', ({ resource, error }) => console.log(`Resource ${resource.url} has error ${error}`));
    }
  }]
};

scrape(options).then((result) => console.log(result));

Description

Error thrown by the request module stop the whole process and ignore the handler registered to onResourceError (plugin).

Expected behavior:

The process continue running (because the ignoreErrors option was set to true) and print an error message to the console (because of the registered plugin).

Actual behavior:

The whole proccess is stopped and the onResourceError handler is ignored.

$ node --max-old-space-size=10240 .
C:\Users\Max\Documents\Download\node_modules\request\request.js:1147
      response.body = strings.join('')
                              ^

RangeError: Invalid string length
    at Array.join (<anonymous>)
    at Request.<anonymous> (C:\Users\Max\Documents\Download\node_modules\request\request.js:1147:31)
    at Request.emit (events.js:315:20)
    at IncomingMessage.<anonymous> (C:\Users\Max\Documents\Download\node_modules\request\request.js:1083:12)
    at Object.onceWrapper (events.js:421:28)
    at IncomingMessage.emit (events.js:327:22)
    at endReadableNT (_stream_readable.js:1225:12)
    at processTicksAndRejections (internal/process/task_queues.js:84:21)
@max-hk
Copy link
Author

max-hk commented Jul 18, 2020

Maybe related: jsforce/jsforce#615, nodejs/node#33960

Edit: the error occurred when website-scraper/request attempt to download a 700mb zip file.

@s0ph1e
Copy link
Member

s0ph1e commented Sep 2, 2020

Hi @MaxLOh 👋

Sorry for late response and thank you for sharing issue

To be honest I'm not sure that downloading so large file will be possible because now module stores everything in memory.
I can suggest only to exclude such files from download link with urlFilter

As for error from request module not handled by onResourceError method - I'll need to take closer look on it, looks like a bug

@s0ph1e s0ph1e added the bug label Sep 2, 2020
@s0ph1e
Copy link
Member

s0ph1e commented Dec 24, 2021

Hopefully this issue is fixed by replacing request with got in #445. Changes will be published in next release v5 soon

Otherwise I need a way to reproduce it, without that it's hard to fix something

@s0ph1e s0ph1e closed this as completed Dec 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants