Skip to content

Update libcurl-impersonate usage #4535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 17, 2025
Merged

Update libcurl-impersonate usage #4535

merged 1 commit into from
May 17, 2025

Conversation

wrobelda
Copy link
Contributor

@wrobelda wrobelda commented May 2, 2025

Overhaul the usage of libcurl-impersonate, which was not, in fact, being used properly, as the code was overriding the headers set by it to prevent detection.

  • update the libcurl-impersonate to an actively managed lexiforest fork
  • impersonate Chrome 131
  • move the defaultHttpHeaders to http.php, where it belongs
  • only set defaultHttpHeaders if curl-impersonate is not detected
  • make useragent ini setting optional and disabled by default
  • add necessary documentation updates

@wrobelda wrobelda force-pushed the master branch 5 times, most recently from 1f447cd to 2f5de87 Compare May 2, 2025 00:34
@wrobelda wrobelda marked this pull request as draft May 2, 2025 08:16
@wrobelda wrobelda marked this pull request as ready for review May 2, 2025 12:18
@dvikan
Copy link
Contributor

dvikan commented May 9, 2025

i was hoping we could keep CurlHttpClient cleanly separated by treating it as a thirdparty http client

this change removes the http user agent for non-curl-impersonator usage which might break some fetching. (default ua is Curl v1.2.3 or simply the empty string i dont remember)

this change breaks the usage of custom http headers for non-curl-impersonate uses (no it doesnt, i read code wrong)

@wrobelda wrobelda force-pushed the master branch 2 times, most recently from 6273e0f to 462dbf8 Compare May 17, 2025 16:06
@wrobelda
Copy link
Contributor Author

wrobelda commented May 17, 2025

i was hoping we could keep CurlHttpClient cleanly separated by treating it as a thirdparty http client

I can see that, but you could also make a point that getContents() in contents.php could be kept clean of the actual underlying anti-bot measures, which is the purpose of the default headers you had there already.

Also, I would imagine any alternative client would need to incorporate some of their own anti-bot measures, possibly with another set of headers, so if we consider the current set of default headers as curl-specific, I think it makes sense to keep them all sealed off in CurlHttpClient.

It would be hard to fix the libcurl-impersonate usage without the refactoring I did.

this change removes the http user agent for non-curl-impersonator usage which might break some fetching. (default ua is Curl v1.2.3 or simply the empty string i dont remember)

Indeed, fixed it.

libcurl-impersonate was not being used properly, as the code was
overriding the headers set by it to prevent detection.

- update the libcurl-impersonate to an actively managed lexiforest
  fork
- impersonate Chrome 131
- move the defaultHttpHeaders to http.php, where it belongs
- only set defaultHttpHeaders if curl-impersonate is not detected
- make useragent ini setting optional and disabled by default
- add necessary documentation updates
@dvikan dvikan merged commit b7c04f8 into RSS-Bridge:master May 17, 2025
3 of 9 checks passed
@dvikan
Copy link
Contributor

dvikan commented May 17, 2025

thanks for helping to keep this up to date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants