Skip to content

Tech report: Dedupe technology records #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
Feb 14, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
cf33544
versions
max-ostapenko Jan 12, 2025
547f63e
tech filter
max-ostapenko Jan 12, 2025
e1f0e60
Merge branch 'main' into central-flyingfish
max-ostapenko Jan 14, 2025
3ec247c
Merge branch 'main' into central-flyingfish
max-ostapenko Jan 21, 2025
65acf68
Merge branch 'main' into central-flyingfish
max-ostapenko Jan 26, 2025
e8580d0
new table with versions
max-ostapenko Jan 26, 2025
bfac4f6
typo
max-ostapenko Jan 26, 2025
ac2f597
versions table
max-ostapenko Jan 26, 2025
bdd46b8
fix
max-ostapenko Jan 26, 2025
23864b9
no retries
max-ostapenko Jan 26, 2025
4d42453
tech_report_* tables
max-ostapenko Jan 26, 2025
4dd3f9b
clusters renamed
max-ostapenko Jan 26, 2025
8032aab
lint
max-ostapenko Jan 26, 2025
a41ed32
adjust export config
max-ostapenko Jan 26, 2025
2aec142
fix clustering
max-ostapenko Jan 26, 2025
396d664
origin renamed
max-ostapenko Jan 27, 2025
e9b666e
deduplicated good_cwv
max-ostapenko Jan 27, 2025
ff2f5a4
Merge branch 'main' into central-flyingfish
max-ostapenko Jan 27, 2025
58eea31
include minor
max-ostapenko Jan 30, 2025
747a18f
Merge branch 'main' into main
max-ostapenko Jan 30, 2025
8c0455c
Merge branch 'central-flyingfish' into central-flyingfish
max-ostapenko Jan 30, 2025
c88ef18
fix
max-ostapenko Jan 30, 2025
3268e28
Merge branch 'central-flyingfish' into central-flyingfish
max-ostapenko Jan 30, 2025
bd07f78
cleanup
max-ostapenko Jan 30, 2025
5967524
pattern fix
max-ostapenko Jan 30, 2025
146978d
Merge branch 'central-flyingfish' into central-flyingfish
max-ostapenko Jan 30, 2025
7ff9151
tech detections only
max-ostapenko Jan 31, 2025
718e3c4
fix
max-ostapenko Jan 31, 2025
34a4bb7
relaxed pattern
max-ostapenko Jan 31, 2025
330e918
remove hashing (#59)
max-ostapenko Feb 1, 2025
7ae88fa
dedupe technologies
max-ostapenko Feb 1, 2025
511983d
Merge branch 'main' into premier-chicken
max-ostapenko Feb 1, 2025
c6f8460
cleanup
max-ostapenko Feb 2, 2025
bede83e
Merge branch 'main' into premier-chicken
max-ostapenko Feb 14, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 9 additions & 7 deletions definitions/output/reports/cwv_tech_categories.js
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,14 @@ technology_stats AS (
GROUP BY
technology,
categories
),

total_pages AS (
SELECT
client,
COUNT(DISTINCT root_page) AS origins
FROM pages
GROUP BY client
)

SELECT
Expand Down Expand Up @@ -82,11 +90,5 @@ SELECT
COALESCE(MAX(IF(client = 'mobile', origins, 0))) AS mobile
) AS origins,
NULL AS technologies
FROM (
SELECT
client,
COUNT(DISTINCT root_page) AS origins
FROM pages
GROUP BY client
)
FROM total_pages
`)
29 changes: 20 additions & 9 deletions definitions/output/reports/tech_report_technologies.js
Original file line number Diff line number Diff line change
Expand Up @@ -21,13 +21,22 @@ WITH pages AS (

tech_origins AS (
SELECT
client,
technology,
COUNT(DISTINCT root_page) AS origins
FROM pages
GROUP BY
client,
technology
STRUCT(
MAX(IF(client = 'desktop', origins, 0)) AS desktop,
MAX(IF(client = 'mobile', origins, 0)) AS mobile
) AS origins
FROM (
SELECT
client,
technology,
COUNT(DISTINCT root_page) AS origins
FROM pages
GROUP BY
client,
technology
)
GROUP BY technology
),

technologies AS (
Expand All @@ -53,7 +62,6 @@ total_pages AS (
)

SELECT
client,
technology,
description,
category,
Expand All @@ -66,11 +74,14 @@ USING(technology)
UNION ALL

SELECT
client,
'ALL' AS technology,
NULL AS description,
NULL AS category,
NULL AS category_obj,
origins
NULL AS similar_technologies,
STRUCT(
MAX(IF(client = 'desktop', origins, 0)) AS desktop,
MAX(IF(client = 'mobile', origins, 0)) AS mobile
) AS origins
FROM total_pages
`)
Loading