Skip to content

Conversation

@tpeng
Copy link
Contributor

@tpeng tpeng commented Aug 27, 2014

add MdrExtractor to parse the listing data. the output will be a separated field with the name as the group name set in the annotation (using listingDataGroupName) and the value is a list of dict extracted from each matched record.

tpeng added 4 commits August 20, 2014 15:32
MDR extractor is base on https://pypi.python.org/pypi/mdr/ which can
detect the listing data automatically and extract listing data with
scrapely annnotation supervision.
since sometimes the extract data is empty, this will make the validated
false. but we still want to add to extracted listing data to indicate
there are some data missing on the page.

also fix a problem when the annotation was added to other records
rather than seed record. fix it by propogating the annotations to
aligned elements.
also fixed a typo for the group name saved in annotation
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant