Open
Description
Hi Matt,
trying to scrape from google, I followed your blogpost on 3 lines google scraping and got the following error:
AttributeError Traceback (most recent call last)
Cell In[2], line 1
----> 1 results = seo.get_serps("stupid")
2 print(results)
File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:144, in get_serps(query, output)
133 """Return the first 10 Google search results for a given query.
134
135 Args:
(...)
140 results (dict): Results of query.
141 """
143 response = _get_results(query)
--> 144 results = _parse_search_results(response)
146 if results:
147 if output == "dataframe":
File c:\Users\stephan.rudolph\Coding\testenv\Lib\site-packages\ecommercetools\seo\google_search.py:124, in _parse_search_results(response)
118 output = []
120 for result in results:
121 item = {
122 'title': result.find(css_identifier_title, first=True).text,
123 'link': result.find(css_identifier_link, first=True).attrs['href'],
--> 124 'text': result.find(css_identifier_text, first=True).text
...
125 }
127 output.append(item)
129 return output
AttributeError: 'NoneType' object has no attribute 'text'
then i tried your other blogpost scrape with python, which is not relying on the ecommercetools package, and followed it to the T.
here is the interesting part:
results = google_search("stupid")
results
yields normal output, rerunning this (jupyter cell) with keyword
results = google_search("allergy")
results
yields
AttributeError Traceback (most recent call last)
Cell In[9], line 1
----> 1 results = google_search("allergy")
2 results
Cell In[8], line 3, in google_search(query)
1 def google_search(query):
2 response = get_results(query)
----> 3 return parse_results(response)
Cell In[7], line 17, in parse_results(response)
10 output = []
12 for result in results:
14 item = {
15 'title': result.find(css_identifier_title, first=True).text,
16 'link': result.find(css_identifier_link, first=True).attrs['href'],
---> 17 'text': result.find(css_identifier_text, first=True).text
18 }
20 output.append(item)
22 return output
AttributeError: 'NoneType' object has no attribute 'text'
So sometimes, the result.find(css_identifier_text, first=True):
yields True
, but NoneType
??
I have no Idea, under which circumstances this NoneType
arises, but the behavior is as follows:
the seo.get_serps() from ecommercetools consistently throws the error, the "hand written" equivalent is keyword sensitive, e.g. "allergy" throws the error, "keyword sensitive" does not.
Metadata
Metadata
Assignees
Labels
No labels