Skip to content

wr339988/TencentAlgo19

Repository files navigation

1. 题目介绍

请直接查看guide.pdf了解赛题,该项目是初赛第一名的模型。

2.模型介绍

avatar avatar avatar

3. 配置环境

  • scikit-learn
  • tqdm
  • pandas
  • numpy
  • scipy
  • tensorFlow>=1.12.0 (最好不要超过1.14)
  • Linux Ubuntu 16.04, 128G内存(64G应该足够),一张显卡

可以直接运行

pip install -r requirements.txt

4.数据下载

mkdir data 
cd data
#Download data from https://pan.baidu.com/s/1ASQMms_u70psRgW_KEyT2Q 
#Password: burw
unzip algo.qq.com_641013010_testa.zip imps_log.zip user.zip
cd ..

mkdir testdata 
cd testdata
#Download data from https://microsoft-my.sharepoint.com/personal/xiuniu_microsoft_com/_layouts/15/onedrive.aspx?id=%2Fpersonal%2Fxiuniu%5Fmicrosoft%5Fcom%2FDocuments%2FAISchoolAdsProject%2FTestDataV2&originalPath=aHR0cHM6Ly9taWNyb3NvZnQtbXkuc2hhcmVwb2ludC5jb20vOmY6L3AveGl1bml1L0VpQ0U2cElFNXVoR28zY2h3cm5xMF9NQk5FakViRHVsaXBDWDJ0eGJFanMxZ3c%5FcnRpbWU9RFNKNVFxVDYyRWc 

5.数据预处理

python src/preprocess.py

6.提取特征

python src/extract_feature.py
python src/added_feature.py

7.转换数据格式

python src/convert_format.py

1)缺失值NA用0填充

2)将Word2Vec和DeepWalk得到的embedding拼接起来,并且掩盖到5%的广告

3)将需要用key-values的稠密特征正则化到[0,1]之间

8.训练模型 CIN

mkdir results
mkdir submission
mkdir model_tmp/model/
cd model_tmp
mkdir model
cd..
python train.py
python postprocess.py CIN

9.训练模型 LGB

mkdir results
mkdir stacking
python LGB.py
python postprocess.py LGB

10.评估结果

所有的结果会在训练结束之后显示,同时也会以如下格式存储在results/CINScore.txt 和LGBScore.txt文本文件中:

Time
Username's score is score_number.

About

Tencent Algorithm test repro

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •