Skip to content

Commit 5610ab5

Browse files
beadergitbook-bot
authored andcommitted
GitBook: [master] 22 pages and 5 assets modified
1 parent 2101221 commit 5610ab5

File tree

6 files changed

+330
-0
lines changed

6 files changed

+330
-0
lines changed
69.4 KB
Loading
326 KB
Loading
45.1 KB
Loading
58.6 KB
Loading
137 KB
Loading

usecases/invite-graph-econnoisseur-detection/deep-analysis.md

Lines changed: 330 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -222,6 +222,7 @@ END;
222222

223223
{% tabs %}
224224
{% tab title="代码" %}
225+
{% code title="comps\_nonself\_order\_ratio\_rank.gsql" %}
225226
```sql
226227
CREATE QUERY comps_nonself_order_ratio_rank(INT min_num_orders, INT k) FOR GRAPH MyGraph {
227228
TYPEDEF TUPLE<VERTEX ancestor,
@@ -302,6 +303,7 @@ CREATE QUERY comps_nonself_order_ratio_rank(INT min_num_orders, INT k) FOR GRAPH
302303
END;
303304
}
304305
```
306+
{% endcode %}
305307
{% endtab %}
306308

307309
{% tab title="执行结果" %}
@@ -382,5 +384,333 @@ CREATE QUERY comps_nonself_order_ratio_rank(INT min_num_orders, INT k) FOR GRAPH
382384

383385
### CC中存在大量设备共用
384386

387+
![&#x5171;&#x7528;&#x8BBE;&#x5907;](../../.gitbook/assets/screen-shot-2020-03-18-at-3.00.09-pm.png)
388+
389+
{% tabs %}
390+
{% tab title="代码" %}
391+
{% code title="comps\_share\_device\_rate\_rank.gsql" %}
392+
```sql
393+
CREATE QUERY comps_share_device_rate_rank(INT min_comp_size, INT k) FOR GRAPH MyGraph {
394+
TYPEDEF TUPLE<VERTEX ancestor, FLOAT share_device_rate, INT num_accounts> tp_comp_sdr;
395+
MaxAccum<VERTEX> @ancestor;
396+
GroupByAccum<VERTEX ancestor, VERTEX imei, SetAccum<VERTEX> accounts> @@accounts_grby_anc_imei;
397+
GroupByAccum<VERTEX ancestor, AvgAccum share_device_rate, SetAccum<VERTEX> accounts> @@comps_counter;
398+
HeapAccum<tp_comp_sdr>(k, share_device_rate DESC) @@comps_stats;
399+
all_accounts = {Account.*};
400+
401+
/*
402+
寻找所有的祖先
403+
*/
404+
ancestors =
405+
SELECT t
406+
FROM all_accounts:t
407+
WHERE t.outdegree("invite") > 0 AND t.outdegree("reverse_invite") == 0
408+
ACCUM t.@ancestor = t
409+
;
410+
411+
/*
412+
将祖先信息传播到所有节点
413+
*/
414+
children = ancestors;
415+
WHILE (children.size() > 0) DO
416+
_t0 =
417+
SELECT t
418+
FROM children:s -(use_imei:e)-> IMEI:t
419+
ACCUM @@accounts_grby_anc_imei += (s.@ancestor, t -> s)
420+
;
421+
422+
children =
423+
SELECT t
424+
FROM children:s -(invite:e)-> Account:t
425+
ACCUM t.@ancestor = s.@ancestor
426+
;
427+
END;
428+
429+
FOREACH (ancestor, imei, accounts) IN @@accounts_grby_anc_imei DO
430+
@@comps_counter += (ancestor -> accounts.size(), accounts);
431+
END;
432+
433+
FOREACH (ancestor, share_device_rate, accounts) IN @@comps_counter DO
434+
IF accounts.size() >= min_comp_size THEN
435+
@@comps_stats += tp_comp_sdr(ancestor, share_device_rate, accounts.size());
436+
END;
437+
END;
438+
439+
FOREACH c IN @@comps_stats DO
440+
PRINT c.ancestor AS ancestor,
441+
c.share_device_rate AS share_device_rate,
442+
c.num_accounts As num_accounts
443+
;
444+
END;
445+
}
446+
```
447+
{% endcode %}
448+
{% endtab %}
449+
450+
{% tab title="执行结果" %}
451+
`min_comp_size=30``k=10` 作为参数,执行脚本。
452+
453+
```javascript
454+
[
455+
{
456+
"ancestor": "1879",
457+
"share_device_rate": 21,
458+
"num_accounts": 42
459+
},
460+
{
461+
"ancestor": "5283",
462+
"share_device_rate": 19.6,
463+
"num_accounts": 98
464+
},
465+
{
466+
"ancestor": "8017",
467+
"share_device_rate": 18,
468+
"num_accounts": 36
469+
},
470+
{
471+
"ancestor": "6606",
472+
"share_device_rate": 4.625,
473+
"num_accounts": 37
474+
},
475+
{
476+
"ancestor": "7753",
477+
"share_device_rate": 4.04762,
478+
"num_accounts": 85
479+
},
480+
{
481+
"ancestor": "361",
482+
"share_device_rate": 3.57143,
483+
"num_accounts": 75
484+
},
485+
{
486+
"ancestor": "3236",
487+
"share_device_rate": 3.33333,
488+
"num_accounts": 60
489+
},
490+
{
491+
"ancestor": "2090",
492+
"share_device_rate": 3.09091,
493+
"num_accounts": 34
494+
},
495+
{
496+
"ancestor": "6597",
497+
"share_device_rate": 2.90909,
498+
"num_accounts": 32
499+
},
500+
{
501+
"ancestor": "8660",
502+
"share_device_rate": 2.05556,
503+
"num_accounts": 37
504+
}
505+
]
506+
```
507+
{% endtab %}
508+
{% endtabs %}
509+
510+
这个语句不涉及到新的语法。大体思路是,用两个 GroupByAccum,`@@accounts_grby_anc_imei` 的 key 是 \(ancestor, imei\),统计每个 CC ,每个 imei 下对应的账号数。然后在用 `@@comps_counter` 来统计每个 CC 下平均一个 IMEI 对应对少个账号。
511+
512+
下图展示来一个**高设备共用率**的 CC:
513+
514+
![&#x9AD8;&#x8BBE;&#x5907;&#x5171;&#x7528;&#x7387;](../../.gitbook/assets/screen-shot-2020-03-18-at-3.47.29-pm.png)
515+
516+
可以看出,这个 CC 中大部分账号都共用了一台手机
517+
385518
### CC的行为疑似机器操作
386519

520+
这里说的行为疑似机器,想表达的是,CC的行为,看上去像是一个预谋好的,有策略性的,由脚本控制的活动。
521+
522+
我们之前不断说过,黑产与反黑产的对抗,本质就是成本与效益的博弈,黑产团伙使用的资源都是有成本的,只有成本低于收益的时候,才有利可图。如何更加有效的利用资源,是黑产团伙的核心技术。
523+
524+
假设某优惠活动的规则如下,每邀请 3 个人,则可以换取一份奖励,某个黑产团伙一共拥有 10 个手机号。那么对于这次营销活动,他们有以下这 3 种薅羊毛策略。
525+
526+
![&#x4E09;&#x79CD;&#x4E0D;&#x540C;&#x7684;&#x8585;&#x7F8A;&#x6BDB;&#x7B56;&#x7565;](../../.gitbook/assets/screen-shot-2020-03-18-at-4.12.23-pm.png)
527+
528+
策略 1 有浪费,策略 2、3 效率相同。但是策略 2 容易暴露,因此多数黑产会使用策略 3。
529+
530+
因此黑产 CC 在邀请关系图上,会体现出如下特征:
531+
532+
1. 图的深度特别大,这个前面已经提到过了
533+
2. CC 中每个**邀请者邀请的人数非常均匀**
534+
535+
一个 CC 中,有邀请过别人的账号,我们称之为**邀请者**,如果营销活动的规则是邀请10个人可以兑换一份奖品,那么黑产 CC 中,每个邀请者一定会不多不少刚刚好邀请 10 个人,这样才不会造成资源浪费。
536+
537+
那么如何来衡量一个 CC 中,**邀请者邀请人数**的均匀程度呢?统计我们常常用**基尼系数 \( Gini Coefficient \)** 来衡量均匀程度。
538+
539+
基尼系数为洛伦兹曲线与45度直线构成的区域的面积占三角形面积的比例
540+
541+
![&#x6D1B;&#x4F26;&#x5179;&#x66F2;&#x7EBF;](../../.gitbook/assets/screen-shot-2020-03-18-at-4.26.58-pm.png)
542+
543+
过去,基尼系数常常被用来衡量一个国家的贫富分化程度。我们将一个国家所有人的收入从低到高排序,洛伦兹曲线上的点,代表收入最低的 k% 的人口拥有 n% 的社会总财富。这个曲线越陡峭,说明**大多数人掌握社会的少部分财富**,贫富分化严重,基尼系数很大。
544+
545+
在黑产 CC 识别中,我们运用类似的思想,对一个 CC 中所有邀请者邀请的人数构成的数列,求基尼系数。黑产团伙的基尼系数往往很低,接近于 0。
546+
547+
基尼系数的一种计算方法:
548+
549+
$$
550+
G = \frac{\sum_{i=1}^{n}\sum_{j=1}^{n}{|x_{i}-x_{j}|}}{2n\sum_{i=1}^{n}{x_i}}
551+
$$
552+
553+
{% tabs %}
554+
{% tab title="代码" %}
555+
{% code title="comps\_gini\_rank.gsql" %}
556+
```sql
557+
CREATE QUERY comps_gini_rank(INT k=100, INT min_comp_size=30) FOR GRAPH MyGraph {
558+
TYPEDEF TUPLE<VERTEX ancestor,
559+
INT comp_depth,
560+
INT comp_size,
561+
DOUBLE gini> tp_comp_stat;
562+
563+
MaxAccum<VERTEX> @ancestor;
564+
565+
GroupByAccum<VERTEX ancestor, VERTEX sendr,
566+
SumAccum<INT> num_recvrs> @@num_recvrs_grby_ancestor_sendr;
567+
MapAccum<VERTEX, MaxAccum<INT>> @@comp_depth;
568+
MapAccum<VERTEX, SumAccum<INT>> @@comp_size;
569+
MapAccum<VERTEX, BagAccum<INT>> @@num_recvrs_arr;
570+
571+
HeapAccum<tp_comp_stat>(k, gini) @@comp_stats;
572+
573+
INT depth = 1;
574+
INT comp_depth = 0;
575+
INT sum_diffs;
576+
INT sum_arr;
577+
DOUBLE gini;
578+
579+
all_accounts = {Account.*};
580+
all_orders = {BonusOrder.*};
581+
582+
ancestors =
583+
SELECT t
584+
FROM all_accounts:t
585+
WHERE t.outdegree("invite") > 0 AND t.outdegree("reverse_invite") == 0
586+
ACCUM t.@ancestor = t,
587+
@@comp_size += (t -> 1)
588+
;
589+
590+
children = ancestors;
591+
WHILE (children.size() > 0) DO
592+
children =
593+
SELECT t
594+
FROM children:s -(invite:e)-> Account:t
595+
ACCUM t.@ancestor += s.@ancestor,
596+
@@comp_depth += (s.@ancestor -> depth),
597+
@@comp_size += (s.@ancestor -> 1),
598+
@@num_recvrs_grby_ancestor_sendr += (s.@ancestor, s -> 1)
599+
;
600+
depth = depth + 1;
601+
END;
602+
603+
FOREACH (ancestor, sendr, num_recvrs) IN @@num_recvrs_grby_ancestor_sendr DO
604+
@@num_recvrs_arr += (ancestor -> num_recvrs);
605+
END;
606+
607+
FOREACH (ancestor, comp_size) IN @@comp_size DO
608+
sum_diffs = 0;
609+
sum_arr = 0;
610+
gini = 0;
611+
FOREACH x1 IN @@num_recvrs_arr.get(ancestor) DO
612+
sum_arr = sum_arr + x1;
613+
FOREACH x2 IN @@num_recvrs_arr.get(ancestor) DO
614+
sum_diffs = sum_diffs + abs(x1 - x2);
615+
END;
616+
END;
617+
gini = 0.5 * sum_diffs / (@@num_recvrs_arr.get(ancestor).size() * sum_arr);
618+
619+
IF comp_size >= min_comp_size THEN
620+
@@comp_stats += tp_comp_stat(
621+
ancestor,
622+
@@comp_depth.get(ancestor),
623+
comp_size,
624+
gini
625+
);
626+
END;
627+
END;
628+
629+
FOREACH comp_stat IN @@comp_stats DO
630+
PRINT comp_stat.ancestor AS ancestor,
631+
comp_stat.comp_depth AS comp_depth,
632+
comp_stat.comp_size AS comp_size,
633+
comp_stat.gini AS gini
634+
;
635+
END;
636+
}
637+
```
638+
{% endcode %}
639+
{% endtab %}
640+
641+
{% tab title="执行结果" %}
642+
`min_comp_size=30``k=10` 执行该脚本
643+
644+
```javascript
645+
[
646+
{
647+
"ancestor": "8262",
648+
"comp_depth": 5,
649+
"comp_size": 71,
650+
"gini": 0
651+
},
652+
{
653+
"ancestor": "8433",
654+
"comp_depth": 2,
655+
"comp_size": 31,
656+
"gini": 0
657+
},
658+
{
659+
"ancestor": "5628",
660+
"comp_depth": 3,
661+
"comp_size": 31,
662+
"gini": 0
663+
},
664+
{
665+
"ancestor": "8440",
666+
"comp_depth": 9,
667+
"comp_size": 91,
668+
"gini": 0
669+
},
670+
{
671+
"ancestor": "11584",
672+
"comp_depth": 2,
673+
"comp_size": 31,
674+
"gini": 0
675+
},
676+
{
677+
"ancestor": "4231",
678+
"comp_depth": 5,
679+
"comp_size": 51,
680+
"gini": 0
681+
},
682+
{
683+
"ancestor": "8478",
684+
"comp_depth": 2,
685+
"comp_size": 31,
686+
"gini": 0
687+
},
688+
{
689+
"ancestor": "6678",
690+
"comp_depth": 3,
691+
"comp_size": 51,
692+
"gini": 0
693+
},
694+
{
695+
"ancestor": "15928",
696+
"comp_depth": 4,
697+
"comp_size": 41,
698+
"gini": 0
699+
},
700+
{
701+
"ancestor": "11686",
702+
"comp_depth": 4,
703+
"comp_size": 41,
704+
"gini": 0
705+
}
706+
]
707+
```
708+
{% endtab %}
709+
{% endtabs %}
710+
711+
这个查询语句除了麻烦一点,和之前的相比,并不算太复杂。先统计每个 CC 下,每个邀请者,邀请的人数。然后对每个 CC 进行统计,计算基尼系数。除此之外,该语句还顺便统计了一下 CC 对深度和大小。
712+
713+
![&#x4E0D;&#x540C;&#x57FA;&#x6570;&#x7684; CC](../../.gitbook/assets/screen-shot-2020-03-18-at-5.18.07-pm.png)
714+
715+
上图分别展示了不同基尼系数的 2 个 CC 的邀请关系图。基尼系数更小的 CC,更有可能是黑产行为。
716+

0 commit comments

Comments
 (0)