Skip to content

Commit 01625f9

Browse files
committed
Limit queries to the last 30 days.
Also, add link to query that groups by platform.
1 parent e64fb1e commit 01625f9

File tree

1 file changed

+27
-8
lines changed

1 file changed

+27
-8
lines changed

source/guides/analyzing-pypi-package-downloads.rst

Lines changed: 27 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -93,10 +93,10 @@ Useful queries
9393
Run queries in the `BigQuery web UI`_ by clicking the "Compose query" button.
9494

9595
Note that the rows are stored in separate tables for each day, which helps
96-
limit costs if you are only interested in recent downloads. To analyze the
97-
full history, use `wildcard tables
96+
limit the cost of queries. These example queries analyze downloads from
97+
recent history by using `wildcard tables
9898
<https://cloud.google.com/bigquery/docs/querying-wildcard-tables>`__ to
99-
select all tables.
99+
select all tables and then filter by ``_TABLE_SUFFIX``.
100100

101101
Counting package downloads
102102
--------------------------
@@ -110,11 +110,16 @@ The following query counts the total number of downloads for the project
110110
SELECT COUNT(*) AS num_downloads
111111
FROM `the-psf.pypi.downloads*`
112112
WHERE file.project = 'pytest'
113+
-- Only query the last 30 days of history
114+
AND _TABLE_SUFFIX
115+
BETWEEN FORMAT_DATE(
116+
'%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
117+
AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
113118

114119
+---------------+
115120
| num_downloads |
116121
+===============+
117-
| 35534338 |
122+
| 2117807 |
118123
+---------------+
119124

120125
To only count downloads from pip, filter on the ``details.installer.name``
@@ -125,14 +130,18 @@ column.
125130
#standardSQL
126131
SELECT COUNT(*) AS num_downloads
127132
FROM `the-psf.pypi.downloads*`
128-
WHERE
129-
file.project = 'pytest'
133+
WHERE file.project = 'pytest'
130134
AND details.installer.name = 'pip'
135+
-- Only query the last 30 days of history
136+
AND _TABLE_SUFFIX
137+
BETWEEN FORMAT_DATE(
138+
'%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 30 DAY))
139+
AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
131140

132141
+---------------+
133142
| num_downloads |
134143
+===============+
135-
| 31768554 |
144+
| 1829322 |
136145
+---------------+
137146

138147
Package downloads over time
@@ -151,7 +160,11 @@ costs.
151160
FROM `the-psf.pypi.downloads*`
152161
WHERE
153162
file.project = 'pytest'
154-
AND _TABLE_SUFFIX BETWEEN '20171001' AND '20180131'
163+
-- Only query the last 6 months of history
164+
AND _TABLE_SUFFIX
165+
BETWEEN FORMAT_DATE(
166+
'%Y%m01', DATE_SUB(CURRENT_DATE(), INTERVAL 6 MONTH))
167+
AND FORMAT_DATE('%Y%m%d', CURRENT_DATE())
155168
GROUP BY `month`
156169
ORDER BY `month` DESC
157170

@@ -166,6 +179,10 @@ costs.
166179
+---------------+--------+
167180
| 2047310 | 201710 |
168181
+---------------+--------+
182+
| 1744443 | 201709 |
183+
+---------------+--------+
184+
| 1916952 | 201708 |
185+
+---------------+--------+
169186

170187
More queries
171188
------------
@@ -175,6 +192,8 @@ More queries
175192
- `PyPI queries gist <https://gist.github.com/alex/4f100a9592b05e9b4d63>`__
176193
- `Python versions over time
177194
<https://github.com/tswast/code-snippets/blob/master/2018/python-community-insights/Python%20Community%20Insights.ipynb>`__
195+
- `Non-Windows downloads, grouped by platform
196+
<https://bigquery.cloud.google.com/savedquery/51422494423:ff1976af63614ad4a1258d8821dd7785>`__
178197

179198
Additional tools
180199
================

0 commit comments

Comments
 (0)