Formatting training data for spancat training #13880
Unanswered
sam8beard
asked this question in
Help: Coding & Implementations
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi there,
I'm taking a stab at building my own claim extraction pipeline (first time spaCy user).
Upstream in my pipeline, I feed n amount of docs to NER in the en_core_web_sm pretrained model in order to identify target spans using my own dependency parsing logic. I then construct a list of training data formatted for span cat:
Each start and end is the starting token index and end token index in the sentence. This list is then passed to my function where I have the training loop and create examples from all of the tuples in training_data.
I'm a bit confused on how I should be creating examples for my training loop. How should my training data be formatted for training my spancat component?
Beta Was this translation helpful? Give feedback.
All reactions