-
Notifications
You must be signed in to change notification settings - Fork 304
Open
Description
Hi community,
I'm working on a personal project where I need to retrieve the title and HTML content of a webpage (a simple task).
Sometimes, the URL I visit has protections like cookies, but the HTML content is already fully loaded, so I don’t actually care about the cookie. All the information I need is in the HTML.
Here’s my problem:
- When I try to evaluate the title of the page, I often get a timeout error.
- To handle this, I retry the process after the first failure, but I can’t do more than that ?.
- What’s confusing is that if I manually check the title in the browser’s console, it works perfectly. However, when I try to retrieve the title programmatically in PHP, it doesn’t work.
- Does anyone know why this might happen or how I can fix it?
Thanks for your help!
Here is my code
` $urls = $urlScrapedByKeyWordRepository->findBy(['isUsedForGeneration' => false]);
shuffle($urls);
$urls = array_slice($urls, 0, 2);
if ($urls) {
/** @var UrlScrapedByKeyword[] $urls */
foreach ($urls as $key => $url) {
$urlScrapped = ltrim($url->getUrl(), './');
// $urlScrapped = $urlScrapped;
$browser = $this->createBrowser();
$page = $browser->createPage();
$html = false;
try {
$page->navigate($urlScrapped, ['strict'])->waitForNavigation(Page::INTERACTIVE_TIME, 6000);
$page->evaluate("console.log('document.title')");
// -> here where my code crash so i catch the error below
$pageTitle = $page->evaluate('document.title')->waitForResponse()->getReturnValue();
if ($pageTitle == 'Before you continue')
{
$this->AcceptGoogleCookies($page);
$pageTitle = $page->evaluate('document.title')->waitForResponse()->getReturnValue();
}
echo($pageTitle.' from normal way');
$pageContent = $page->getHtml(2500);
sleep(1);
if ($pageContent) echo('content OK');
if ($pageTitle == 'Before you continue') $this->AcceptGoogleCookies($page);
} catch (OperationTimedOut $e) {
// Here in the console of the navigator, i can see this operation work correctly
$page->evaluate("console.log(document.title)");
// !!----catch the error and retry to evaluate title but again crash ----!!
$pageTitle = $page->evaluate('document.title')->getReturnValue();
if ($pageTitle == 'Before your continue') $this->AcceptGoogleCookies($page);
echo $pageTitle.' from error';
$pageContent = $page->getHtml(2500);
sleep(1);
if ($pageContent) echo('content OK from error');
} catch (NavigationExpired $e) {
echo "Erreur de NavigationExpired lors de l'évaluation du titre : $pageTitle</br>";
}
}`
Metadata
Metadata
Assignees
Labels
No labels