Skip to content

IMTorgOpenDataTools/cbdc-scraper

Repository files navigation

CBDC Scraper

This application scrapes sites with informational data on Central Bank Digital Currencies and outputs a summary report file.

Install and Configure

Install the usual way, or use the requirements.txt:

pipenv install

Update the following:

  • config/emails.csv with users and admin who need notification of report completion. Admin receive logging errors, also.
  • config/_constants.py with the appropriate report output directory: email_network_drive

See the References to selenium-wire for installation and configuration. This is only necessary when using a proxy for obtaining the docs.google.com url of atlanticcouncil backend data; otherwise, selenium-wire is not necessary.

Usage

Main entrypoint to the program:

pipenv run python cbdc_scraper/scraper.py

The output in downloads/ includes:

  • csv file (NN - julien date): monthly_report-YYYY_NN.csv
  • excel file: monthly_report.xlsx
  • log file: process.log

Testing

Basic testing can be performed within the pipenv using pytest. Note that test_send_notification_success() will fail if email is not configured.

Command line testing of a specific test method can be performed using: pytest --trace tests/test_scraper.py -k test_get_data_atlantic.

Use the following commands:

  • n(next) – step to the next line within the same function
  • s(step) – step to the next line in this function or called function
  • b(break) – set up new breakpoints without changing the code
  • p(print) – evaluate and print the value of an expression
  • c(continue) – continue execution and only stop when a breakpoint is encountered
  • unt(until) – continue execution until the line with a number greater than the current one is reached
  • q(quit) – quit the debugger/execution

Sites

Current sites scraped include:

Look into cbdc_scraper/utils.py to see the exact urls for documents extracted.

References

Selenium-wire setup

Steps to get selenium-wire with a chrome driver - headless browser, installed correctly.

  • select and download a chromedriver to .libs/ from site: https://sites.google.com/chromium.org/driver/
  • unzip and make it executable: chmod +x chromedriver
  • in code, set the driver location: chrome_driver = "./.libs/chromedriver" but use absolute path
  • install google chrome binary, reference:
cd .libs/
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install ./google-chrome-stable_current_amd64.deb
google-chrome
  • ensure the driver and binary are of the same version

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published