This application scrapes sites with informational data on Central Bank Digital Currencies and outputs a summary report file.
Install the usual way, or use the requirements.txt
:
pipenv install
Update the following:
config/emails.csv
with users and admin who need notification of report completion. Admin receive logging errors, also.config/_constants.py
with the appropriate report output directory:email_network_drive
See the References to selenium-wire for installation and configuration. This is only necessary when using a proxy for obtaining the docs.google.com
url of atlanticcouncil backend data; otherwise, selenium-wire is not necessary.
Main entrypoint to the program:
pipenv run python cbdc_scraper/scraper.py
The output in downloads/
includes:
- csv file (NN - julien date):
monthly_report-YYYY_NN.csv
- excel file:
monthly_report.xlsx
- log file:
process.log
Basic testing can be performed within the pipenv
using pytest
. Note that test_send_notification_success()
will fail if email is not configured.
Command line testing of a specific test method can be performed using: pytest --trace tests/test_scraper.py -k test_get_data_atlantic
.
Use the following commands:
- n(next) – step to the next line within the same function
- s(step) – step to the next line in this function or called function
- b(break) – set up new breakpoints without changing the code
- p(print) – evaluate and print the value of an expression
- c(continue) – continue execution and only stop when a breakpoint is encountered
- unt(until) – continue execution until the line with a number greater than the current one is reached
- q(quit) – quit the debugger/execution
Current sites scraped include:
- https://cbdctracker.org/
- https://www.atlanticcouncil.org/cbdctracker/
- google docs: backend update
- google docs: github url which is obtained from github commit
Look into cbdc_scraper/utils.py
to see the exact urls for documents extracted.
Steps to get selenium-wire with a chrome driver - headless browser, installed correctly.
- select and download a chromedriver to
.libs/
from site:https://sites.google.com/chromium.org/driver/
- unzip and make it executable:
chmod +x chromedriver
- in code, set the driver location:
chrome_driver = "./.libs/chromedriver"
but use absolute path - install google chrome binary, reference:
cd .libs/
wget https://dl.google.com/linux/direct/google-chrome-stable_current_amd64.deb
sudo apt install ./google-chrome-stable_current_amd64.deb
google-chrome
- ensure the driver and binary are of the same version