0% found this document useful (0 votes)
12 views

spark电影推荐

Uploaded by

dapeng zhou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

spark电影推荐

Uploaded by

dapeng zhou
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Spark standalone cluster

Spark- Master
submit 192.168.100.101 FIFO
1
Spark- driver1 driver2
submit
2 FAIR

Worker Worker Worker


192.168.100.1012 192.168.100.103 192.168.100.104
executor executor executor executor
executor executor
1 2 1 2
1 2
sparkcontext

DF2

DF1

HDFS
数据仓库

T1

DB1 DB2 DB3

T2

T23
D
VAL COUNT =
1

NODE1

E
MASTER NODE
VAL
COUNT
=1

SLAVE NODE SLAVE NODE SLAVE NODE


VAL
COUNT=1
Trainin
gdata
Contai MSE = (r1-r2)^2/2000
ns Testdata
Model
8000us Another
Predict all users’
ers 2000
rating about out
rating user
products
ratings
RMSE = MSE开根号
Testdata
Another
2000 user
ratings AUC,MAP
数据挖掘的算法分类

聚类 分类 协同过滤
非监督 监督

神经网络
聚类
1.为所有用户填空
2.为所有用户按行排序
3.为所有用户取出对应行的前5个元素

m m m m
u 1.5 3 4 2 5 4.5
u
u
u
u
u
Test
3个字

MODEL
Result
4个字

filt
er

推荐受欢迎的电影

userid

通过模型给出推荐结果
RDD.foreachPartition{p=>
Con = connectionpool.get()
p.foreach{r=>
Con.sent(r)
}
Con.return()

}
Web/zeppelin

Spark/hadoop/JDK

linux

hardware

Cpu(Intel),GPU, L3 cache,L2 cache


Storage:2.5G
1.25G

Executor:10GB

Execution:7.5G

You might also like