This project is a solution to the Site Reliability Engineering take-home exercise provided by Fetch. It monitors the availability of HTTP endpoints defined in a YAML file and logs their cumulative availability every 15 seconds. The program ensures that endpoints are only considered available if they respond with a 2xx status code and within 500ms.
- Accepts a YAML configuration file as input
- Makes HTTP requests using optional method, headers, and body
- Normalizes domains (ignores port numbers) to group availability stats
- Checks all endpoints concurrently every 15 seconds
- Tracks cumulative availability per domain
- Drops decimal places from availability percentages as per requirements
- Logs availability in a readable format to stdout
- Python 3.8+
requests
modulePyYAML
Install dependencies using:
pip install -r requirements.txt
- The script loads endpoint definitions from a YAML file passed as a command-line argument.
- Every 15 seconds, it sends HTTP requests to all endpoints concurrently.
- It tracks whether each response meets the criteria (2xx status code and ≤ 500ms).
- It groups results by domain (ignoring ports) and prints cumulative availability for each.
- Availability is printed as an integer percentage (decimals are dropped).
- Added a 0.5-second timeout to each HTTP request to ensure quick responsiveness.
- Measured response duration using timestamps to validate that the endpoint responded within the required 500ms.
- Implemented a default HTTP method of "GET" in case it is not specified in the YAML configuration.
- Removed port numbers when determining the domain by using Python's
urllib.parse
to normalize the hostname. - Replaced the original sequential request logic with multithreaded execution so that all endpoints are checked concurrently and efficiently.
- Used a thread lock (
threading.Lock
) to prevent race conditions when updating the shareddomain_stats
dictionary. - Updated the availability percentage calculation to use
int()
instead ofround()
, so decimal values are dropped, as per the assignment requirements. - Changed the request body handling from
json=
todata=
to properly send raw JSON string bodies. - Improved the log output format by adding clear headers and separators to distinguish between check cycles.
To meet the requirement of dropping decimal places in the availability percentage:
# Drop decimals in availability percentage as required by the Fetch instructions
availability = int((stats["up"] / stats["total"]) * 100)