Skip to content

response from _get_results(query) contains NoneType which leads to parsing Fail #35

Open
@stRudolph

Description

@stRudolph

Hi Matt,

trying to scrape from google, I followed your blogpost on 3 lines google scraping and got the following error:

AttributeError                            Traceback (most recent call last)
Cell In[2], line 1
----> 1 results = seo.get_serps("stupid")
      2 print(results)
File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:144, in get_serps(query, output)
    133 """Return the first 10 Google search results for a given query.
    134 
    135 Args:
   (...)
    140     results (dict): Results of query.
    141 """
    143 response = _get_results(query)
--> 144 results = _parse_search_results(response)
    146 if results:
    147     if output == "dataframe":

File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:124, in _parse_search_results(response)
    118 output = []
    120 for result in results:
    121     item = {
    122         'title': result.find(css_identifier_title, first=True).text,
    123         'link': result.find(css_identifier_link, first=True).attrs['href'],
--> 124         'text': result.find(css_identifier_text, first=True).text
...
    125     }
    127     output.append(item)
    129 return output

AttributeError: 'NoneType' object has no attribute 'text'

then i tried your other blogpost scrape with python, which is not relying on the ecommercetools package, and followed it to the T.
here is the interesting part:

results = google_search("stupid")
results

yields normal output, rerunning this (jupyter cell) with keyword

results = google_search("allergy")
results

yields

AttributeError                            Traceback (most recent call last)
Cell In[9], line 1
----> 1 results = google_search("allergy")
      2 results

Cell In[8], line 3, in google_search(query)
      1 def google_search(query):
      2     response = get_results(query)
----> 3     return parse_results(response)

Cell In[7], line 17, in parse_results(response)
     10 output = []
     12 for result in results:
     14     item = {
     15         'title': result.find(css_identifier_title, first=True).text,
     16         'link': result.find(css_identifier_link, first=True).attrs['href'],
---> 17         'text': result.find(css_identifier_text, first=True).text
     18     }
     20     output.append(item)
     22 return output

AttributeError: 'NoneType' object has no attribute 'text'

So sometimes, the result.find(css_identifier_text, first=True): yields True , but NoneType ??
I have no Idea, under which circumstances this NoneType arises, but the behavior is as follows:
the seo.get_serps() from ecommercetools consistently throws the error, the "hand written" equivalent is keyword sensitive, e.g. "allergy" throws the error, "keyword sensitive" does not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions