Skip to content

techstone/CONetStat

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CONetStat

Module for Eurostat online glossaries' web scraping, ontology indexing and semantic classification

About

This module will enable you to automatically scrape Eurostat so-called "Statistics Explained" and index the contents of those pages. It will build a graph of inter-relationships between the pages while extracting some semantic contents ("concepts"). The interconnected concepts are then used to automatically train a text classifier.

documentation available at: https://gjacopo.github.io/esscrape/
since 2018
license EUPL

Description

Notes

Resources

  • Keras, the Python Deep Learning library.
  • Various algorithms for short text categorization: PyShortTextCategorization.
  • Source code for large-scale hierarchical text classification with recursively regularized Deep Graph-CNN: Deepgraphcnn.
  • Convolutional Neural Networks for sentence classification: CNN_sentence.
  • Tool word2vec for computing continuous distributed representations of words, with pre-trained word and phrase vectors; see also mirror repository.
  • Implementation of Graph Convolutional Networks in TensorFlow.
  • Text matching toolkit MatchZoo for designing, comparing, and sharing of deep text matching models.
  • Britz D. blog on implementing a Convolutional Neural Network for text classification in Tensorflow and source code cnn-text-classification-tf.
  • Britz D. blog for understanding Convolutional Neural Networks for NLP.
  • Kipf T.N. blog on Graph Convolutional Network.
  • Framework Scrapy for extracting data from online websites.
  • Natural language toolkit nltk to work with human language data.
  • Package NetworkX for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
  • Module py2neo for neo4j graph database, though the bolt driver neo4j-python-driver does the job.

References

About

Module (Python) for Eurostat online glossaries' web scraping, ontology indexing and semantic classification

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%