Running @Language.factory a second time in Databricks fails #13491
Unanswered
larrymccutchan
asked this question in
Help: Other Questions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
I am developing some new pipelines in my NLP project for work. While creating the pipelines, i re-run a Jupiter like notebook over and over again as I develop the logic for it. (I am very new to spacy). I am getting a "code not found"/E004 for the following code when I run it a second time.
I believe that this is related to another discussion here: #7316
However, it seems like a very realistic scenario to create a pipeline. I believe you mentioned that rerunning it in the same process it unlikely (I read it as undesired). How would you suggest I do it?
I have been using dbutils.library.restartPython() (see here: https://docs.databricks.com/en/libraries/restart-python-process.html) to get it to work for now.
I would appreciate any help,
Larry
CODE below (stripped down version of the example provided here: https://spacy.io/usage/processing-pipelines#custom-components-attributes)
import requests
from spacy.lang.en import English
from spacy.language import Language
from spacy.matcher import PhraseMatcher
from spacy.tokens import Doc, Span, Token
@Language.factory("rest_countries")
class RESTCountriesComponent:
def init(self, nlp, name, label="GPE"):
Doc.set_extension("has_country", getter=self.has_country)
print("init")
nlp = English()
#nlp.add_pipe("rest_countries", config={"label": "GPE"})
doc = nlp("Some text about Colombia and the Czech Republic")
print("Pipeline", nlp.pipe_names) # pipeline contains component name
-- Error details
File /databricks/python/lib/python3.10/site-packages/spacy/language.py:514, in Language.factory..add_factory(factory_func)
508 if internal_name in registry.factories:
509 # We only check for the internal name here – it's okay if it's a
510 # subclass and the base class has a factory of the same name. We
511 # also only raise if the function is different to prevent raising
512 # if module is reloaded.
513 existing_func = registry.factories.get(internal_name)
--> 514 if not util.is_same_func(factory_func, existing_func):
515 err = Errors.E004.format(
516 name=name, func=existing_func, new_func=factory_func
517 )
518 raise ValueError(err)
File /databricks/python/lib/python3.10/site-packages/spacy/util.py:1125, in is_same_func(func1, func2)
1123 return False
1124 same_name = func1.qualname == func2.qualname
-> 1125 same_file = inspect.getfile(func1) == inspect.getfile(func2)
1126 same_code = inspect.getsourcelines(func1) == inspect.getsourcelines(func2)
1127 return same_name and same_file and same_code
File /databricks/python/lib/python3.10/site-packages/torch/package/package_importer.py:691, in _patched_getfile(object)
689 if object.module in _package_imported_modules:
690 return _package_imported_modules[object.module].file
--> 691 return _orig_getfile(object)
File /usr/lib/python3.10/inspect.py:785, in getfile(object)
783 return module.file
784 if object.module == 'main':
--> 785 raise OSError('source code not available')
786 raise TypeError('{!r} is a built-in class'.format(object))
787 if ismethod(object):
The e
Beta Was this translation helpful? Give feedback.
All reactions