It collects data about workflow runs and workflow run jobs from GitHub Api
- Python 3
- requests
SYNOPSIS
./multivac/fetch.py [OPTION] [owner/repo]
DESCRIPTION
multivac/fetch.py — download workflow runs GitHub API, workflow run jobs
GitHub API and logs. Result stored in `<owner>/<repo>/workflow_runs` and
`<owner>/<repo>/workflow_run_jobs` directories in the root of the project.
OPTIONS
--branch __branch__
Branch (all if omitted). To collect data about several branches, use
this option several times.
--nologs
Don't download logs
--nostop
Continue till end or rate limit
--since __N__
A workflow run list page to start from it. Default: 1
EXAMPLE
Collect all data about workflow runs and workflow run jobs till the end
of rate limit from repository `tarantool/multivac` branches `master` and
`sample-branch`, but don't collect logs:
$ ./multivac/fetch.py --branch `master` --branch `sample-branch` --nologs --nostop tarantool/multivac
SYNOPSIS
./multivac/last_seen.py --branch __branch__ [OPTIONS]
DESCRIPTION
multivac/last_seen.py - generate common report from data collected by
`multivac/fetch.py` and store it in `output` directory. By default, it
generates a report in CSV format. To start locally, set some value as
'LOG_STORAGE_BUCKET_URL' - uses as a value in result file.
OPTIONS
--branch branch
Generate a report for a certain branch.
--format __[csv|html]__
Generate a report in CSV format or in HTML format.
--short
Coalesce 'runs-on' labels by a first word
--repo-path
owner/repository
Uses in `fetch.py` result files path, so `last_seen.py` needs it to
find data for certain repo. Default: 'tarantool/tarantool'. You can
set only one repo in one sckript start.
EXAMPLE
Generate a report in the HTML format for branches `master` and
`sample-branch` for repo 'tarantool/multivac':
$ ./multivac/last_seen.py --branch master --branch sample-branch --format html --repo-path tarantool/multivac
SYNOPSIS
./multivac/gather_data.py [OPTIONS]
DESCRIPTION
Analyze logs and gather data about all finished jobs and tests which failed.
See the parameters it collects on the
[website](https://www.tarantool.io/en/dev/multivac/gather_data/).
OPTIONS
--tests, -t
Collect data about test failures from test logs. Without this
option the script will collect data only about workflows.
--format __[csv|json|infuxdb]__
Store gathered data as `workflows.csv` or `workflows.json` file in
`output` directory or store data in InfluxDB.
--failure-stats
Show overall failure statistics.
--watch-failure __failure-type__
Show detailed statistics about certain type of workflow failure. See
list of known failure types with `--failure-stats` option.
--latest __N__
Only take logs from the latest N workflow runs.
--since __N[d|h]__
Only take logs for jobs started N days or hours ago: Nd for N days
and Nh for N hours.
--repo-path
repository (without owner)
Uses in `fetch.py` result files path, so `gather_data.py` needs it
to find data for certain repo. Default: 'tarantool/tarantool'.
You can set only one repo in one sckript start.
EXAMPLE
Collect data about jobs and tests started a week ago or later in repo
'tarantool/multivac' (so the fetched JSON and log files stored in the path
./tarantool/multivc), and put this data to InfluxDB:
$ ./multivac/gather_data.py -t --since 7d --format influxdb --repo-path tarantool/multivac
SYNOPSIS
./multivac/gather_test_data.py [OPTIONS]
DESCRIPTION
Get data about every failed test in the completed jobs and group them by
job ID: test name, test configuration, test number (to unicalize data in
InfluxDB for correct statistic calculation).
OPTIONS
--format __[csv|infuxdb]__
Store gathered data as `tests.json` file in `output` directory or
store data in InfluxDB. For JSON format - will rewrite file if
exists.
JSON example:
"8301691934": [
{
"name": "box/tx_man.test.lua",
"configuration": null,
"test_number": 0
},
{
"name": "replication/qsync_advanced.test.lua",
"configuration": "vinyl",
"test_number": 1
},
]
For InfluxDB, collects job name for tests to make graphics:
{
"measurement": test_name,
"tags": {
'job_id': job_id,
'configuration': test['configuration'],
'job_name': metadata['job_name'],
'commit_sha': metadata['commit_sha'],
'test_number': test['test_number']
},
"fields": {
"value": 1
},
"time": metadata["started_at"]
}
--since __N[d|h]__
Only take logs for jobs started N days or hours ago: Nd for N days
and Nh for N hours.
EXAMPLE
Collect data about jobs started a week ago and earlier and put them to the
InfluxDB:
$ ./multivac/gather_test_data.py --since 7d --format influxdb
Add a token on Personal access token GitHub page, give
repo:public_repo
access and copy the token to token.txt
. All the scripts
should be started from the root of the project.
You can set all necessary environment variables at .env
file as in
.env-example
and then run the command:
source .env && export $(cut -d= -f1 .env)
$ ./multivac/fetch.py --branch master tarantool/tarantool
$ ./multivac/fetch.py --branch 2.8 tarantool/tarantool
$ ./multivac/fetch.py --branch 2.7 tarantool/tarantool
$ ./multivac/fetch.py --branch 1.10 tarantool/tarantool
If something went wrong during initial script run, you may re-run it with
--nostop
option: it disables stop heuristic. The heuristics is the following:
stop on a two weeks old workflow run stored on a previous script call.
$ ./multivac/last_seen.py --branch master --branch 2.8 --branch 2.7 --branch 1.10
Add --format html
to get the 'last seen' report in the HTML format instead of
CSV. Reports are stored in the output
directory.
Caution: Don't mix usual fetch.py
calls with --nologs
calls (see
below), otherwise some logs may be missed. The script is designed to either
collect meta + logs or just meta. If the meta is up-to-date, there is no cheap
way to ensure that all relevant jobs are collected with logs.
./multivac/gather_data.py --format json
See more information on website
Collect jobs metainformation:
$ ./multivac/fetch.py --nologs --nostop tarantool/tarantool
You may need to re-run it several times to collect enough information: GitHub ratelimits requests to 5000 per hour.
If hit by 403 error, look at the last X-RateLimit-Reset
value in debug.log
:
it is time, when you may start the script again. Call date --date=@<..unix time..> '+%a %b %_d %H:%M:%S %Z %Y'
to translate this value into a human
readable format.
You may continue from a particular page using the --since N
option (beware of
holes, always leave some overlap).
Next, generate the report itself:
$ ./multivac/minutes.py [--short]
It prints minutes splitted in two ways:
- per day / per week / per month
- by 'runs-on' ('ubuntu-20.04' and so on)
Use --short
to merge 'ubuntu-18.04' and 'ubuntu-20.04' into just 'ubuntu'.