Article processing
We have built a customized web scraping infrastructure to retrieve articles and extract content into a SQL database.
Features include: article retrieval, report extraction,
filtering, classification, information extraction, and visualization.
We have built a customized web scraping infrastructure to retrieve articles and extract content into a SQL database.
A combination of natural language processing and keyword analysis is used to retrieve useful information from articles.
A visual front-end interacts with the database to produce a map of displacement events, and highlight extracted information.
The tool is open source, built on Python, PostgreSQL, and Node.js, and fully Dockerized, for reliable deployment anywhere.
Challenge results, codebase and visualizations are all available to view.
The repository that contains the open source code and notebooks for our solution.
A DropBox folder containing the test set classification and information extraction results.
Our map visualization prototype can be viewed here.
See here for how to deploy and use the code for our article processing pipeline.