当前位置:首页 > 【大数据】【Spark】淘宝展示广告点击率统计分析 >
【大数据】【Spark】淘宝展示广告点击率统计分析
文章目录
- @[toc]
- 数据集说明
- 文件说明
- raw_sample.csv
- ad_feature.csv
- user_profile.csv
- behavior_log.csv
- 业务需求
- (1)统计广告整体点击率
- (2)分析一段时间内的广告点击情况
- (3)分析一天内的广告点击情况
- (4)统计广告点击量Top 1,分析该广告的点击用户特征
- (5)统计广告点击量Top 1,分析该广告的主要受众群体
- (6)分析广告展示对用户行为的影响
- 需求实现
- 数据预处理
- 数据统计分析
- 结果可视化
- (1)统计广告整体点击率
- (2)分析一段时间内的广告点击情况
- (3)分析一天内的广告点击情况
- (4)统计广告点击量Top 1,分析该广告的点击用户特征
- (5)统计广告点击量Top 1,分析该广告的主要受众群体
- (6)分析广告展示对用户行为的影响
文章目录
- @[toc]
- 数据集说明
- 文件说明
- raw_sample.csv
- ad_feature.csv
- user_profile.csv
- behavior_log.csv
- 业务需求
- (1)统计广告整体点击率
- (2)分析一段时间内的广告点击情况
- (3)分析一天内的广告点击情况
- (4)统计广告点击量Top 1,分析该广告的点击用户特征
- (5)统计广告点击量Top 1,分析该广告的主要受众群体
- (6)分析广告展示对用户行为的影响
- 需求实现
- 数据预处理
- 数据统计分析
- 结果可视化
- (1)统计广告整体点击率
- (2)分析一段时间内的广告点击情况
- (3)分析一天内的广告点击情况
- (4)统计广告点击量Top 1,分析该广告的点击用户特征
- (5)统计广告点击量Top 1,分析该广告的主要受众群体
- (6)分析广告展示对用户行为的影响
数据集说明
Ali_Display_Ad_Click
是阿里巴巴提供的一个淘宝展示广告点击率预估数据集
文件名称 | 注释 |
---|---|
raw_sample.csv | 原始的样本骨架 |
ad_feature.csv | 广告的基本信息 |
user_profile.csv | 用户的基本信息 |
behavior_log.csv | 用户的行为日志 |
文件说明
raw_sample.csv
从淘宝网站中随机抽样了114 114 114万用户8 8 8天内的广告展示/点击日志(2600 2600 2600万条记录),构成原始的样本骨架
用前面7 7 7天的做训练样本(
20170506-20170512
),用第8 8 8天的做测试样本(20170513
)pid
是资源位
user,time_stamp,adgroup_id,pid,nonclk,clk581738,1494137644,1,430548_1007,1,0449818,1494638778,3,430548_1007,1,0914836,1494650879,4,430548_1007,1,0...
ad_feature.csv
campaign_id
是广告计划ID
adgroup_id,cate_id,campaign_id,customer,brand,price63133,6406,83237,1,95471,170.0313401,6406,83237,1,87331,199.0248909,392,83237,1,32233,38.0...
user_profile.csv
cms_segid
是微群ID
final_gender_code
为1 1 1表示男,为2 2 2表示女pvalue_level
是消费档次,为1 1 1表示低档,为2 2 2表示中档,为3 3 3表示高档shopping_level
是购物深度,为1 1 1表示浅层用户,为2 2 2表示中度用户,为3 3 3表示深度用户occupation
表示是否为大学生,1 1 1表示是,0 0 0表示否new_user_class_level
是城市层级
userid,cms_segid,cms_group_id,final_gender_code,age_level,pvalue_level,shopping_level,occupation,new_user_class_level234,0,5,2,5,,3,0,3523,5,2,2,2,1,3,1,2612,0,8,1,2,2,3,0,
behavior_log.csv
- 本数据集涵盖了
raw_sample
中全部用户22 22 22天内的购物行为(共七亿条记录) - 以
user + time_stamp
为key
,会有很多重复的记录,因为不同的类型的行为数据是不同部门记录的,在打包到一起的时候,实际上会有小的偏差(即两个一样的time_stamp
实际上是差异比较小的两个时间) btag
是行为类型,ipv
表示浏览,cart
表示加入购物车,fav
表示喜欢,buy
表示购买
user,time_stamp,btag,cate,brand558157,1493741625,pv,6250,91286558157,1493741626,pv,6250,91286558157,1493741627,pv,6250,91286...
业务需求
(1)统计广告整体点击率
(2)分析一段时间内的广告点击情况
(3)分析一天内的广告点击情况
(4)统计广告点击量Top 1,分析该广告的点击用户特征
(5)统计广告点击量Top 1,分析该广告的主要受众群体
(6)分析广告展示对用户行为的影响
需求实现
数据预处理
# -*- coding: utf-8 -*-# @Time : 2024/12/14 19:12# @Author : 从心# @File : spark_taobao_display_ad_click_rate_prediction_preprocess.py# @Software : PyCharmimportpandas aspdfromsklearn.impute importSimpleImputerfromsklearn.neighbors importKNeighborsClassifier"""预处理 ad_feature.csv"""df_ad_feature =pd.read_csv('../data/ad_feature.csv')df_ad_feature.columns =df_ad_feature.columns.str.strip()print(df_ad_feature.head(10))df_ad_feature.info()fori indf_ad_feature.columns:null_rate =df_ad_feature[i].isna().sum()/len(df_ad_feature)*100ifnull_rate >0:print(f'{ i}null rate: { null_rate:.2f}%')df_ad_feature.dropna(inplace=True)df_ad_feature.drop_duplicates(inplace=True)df_ad_feature['brand']=df_ad_feature['brand'].astype(int)print(df_ad_feature.head())df_ad_feature.info()df_ad_feature.to_csv('../data/ad_feature_cleaned.csv',index=False)"""预处理 user_profile.csv"""df_user_profile =pd.read_csv('../data/user_profile.csv')df_user_profile.columns =df_user_profile.columns.str.strip()print(df_user_profile.head(10))df_user_profile.info()fori indf_user_profile.columns:null_rate =df_user_profile[i].isna().sum()/len(df_user_profile)*100ifnull_rate >0:print(f'{ i}null rate: { null_rate:.2f}%')new_user_class_level =df_user_profile.loc[:,'new_user_class_level'].values.reshape(-1,1)si =SimpleImputer(strategy='most_frequent')df_user_profile.loc[:,'new_user_class_level']=si.fit_transform(new_user_class_level)columns =['userid','cms_segid','cms_group_id','final_gender_code','age_level','shopping_level','occupation','new_user_class_level','pvalue_level']df_user_profile =df_user_profile[columns]pvalue_level_null =df_user_profile.loc[df_user_profile['pvalue_level'].isnull().values ==True]pvalue_level_no_null =df_user_profile.loc[df_user_profile['pvalue_level'].isnull().values ==False]X_train,y_train =pvalue_level_no_null.iloc[:,:-1],pvalue_level_no_null.iloc[:,-1]knn =KNeighborsClassifier(n_neighbors=3,weights='distance')knn.fit(X_train,y_train)X_test =pvalue_level_null.iloc[:,:-1]y_test =knn.predict(X_test)y_test =pd.DataFrame(y_test)y_test.columns =['pvalue_level']X_test.reset_index(drop=True,inplace=True)pvalue_level_null =pd.concat([X_test,y_test],axis=1)df_user_profile =pd.concat([pvalue_level_no_null,pvalue_level_null],ignore_index=False)print(df_user_profile.head(10))df_user_profile.info()df_user_profile.to_csv('../data/user_profile_cleaned.csv',index=False)"""预处理 raw_sample.csv"""df_raw_sample =pd.read_csv('../data/raw_sample.csv',nrows=500000)df_raw_sample.columns =df_raw_sample.columns.str.strip()print(df_raw_sample.head(10))df_raw_sample.info()fori indf_raw_sample.columns:null_rate =df_raw_sample[i].isna().sum()/len(df_raw_sample)*100ifnull_rate >0:print(f'{ i}null rate: { null_rate:.2f}%')df_raw_sample.rename(columns={ 'user':'userid'},inplace=True)print(df_raw_sample.head(10))df_raw_sample.info()df_raw_sample.to_csv('../data/raw_sample_cleaned.csv',index=False)"""预处理 behavior_log.csv"""df_behavior_log =pd.read_csv('../data/behavior_log.csv',nrows=500000)df_behavior_log.columns =df_behavior_log.columns.str.strip()print(df_behavior_log.head(10))df_behavior_log.info()fori indf_behavior_log.columns:null_rate =df_behavior_log[i].isna().sum()/len(df_behavior_log)*100ifnull_rate >0:print(f'{ i}null rate: { null_rate:.2f}%')df_behavior_log.rename(columns={ 'user':'userid'},inplace=True)df_behavior_log.rename(columns={ 'cate':'cate_id'},inplace=True)df_behavior_log['time_stamp']=pd.to_datetime(df_behavior_log['time_stamp'],unit='s')df_behavior_log['date']=df_behavior_log['time_stamp'].dt.datedf_behavior_log['time']=df_behavior_log['time_stamp'].dt.timedf_behavior_log['hour']=df_behavior_log['time_stamp'].dt.hourdf_behavior_log =df_behavior_log.drop(['time_stamp'],axis=1)print(df_behavior_log.head(10))df_behavior_log.info()df_behavior_log.to_csv('../data/behavior_log_cleaned.csv',index=False)"""连接 raw_sample_cleaned.csv, user_profile_cleaned.csv, ad_feature_cleaned.csv"""df_raw_sample_user_profile =pd.merge(df_raw_sample,df_user_profile,on='userid',how='left')df_raw_sample_user_profile_ad_feature =pd.merge(df_raw_sample_user_profile,df_ad_feature,on='adgroup_id',how='left')print(df_raw_sample_user_profile_ad_feature.head(10))df_raw_sample_user_profile_ad_feature.info()df =df_raw_sample_user_profile_ad_featuredf_null =df.isnull().sum()/len(df_raw_sample_user_profile_ad_feature)*100df_null =df_null.drop(df_null[df_null ==0].index).sort_values(ascending=False)df_null_rate =pd.DataFrame({ 'null rate(%)':df_null})print(df_null_rate)df.dropna(axis=0,how='any',inplace=True)df['time_stamp']=pd.to_datetime(df['time_stamp'],unit='s')df['date']=df['time_stamp'].dt.datedf['time']=df['time_stamp'].dt.timedf['hour']=df['time_stamp'].dt.hourcolumns =['adgroup_id','cate_id','campaign_id','customer','brand','price','userid','cms_segid','cms_group_id','final_gender_code','age_level','shopping_level','occupation','new_user_class_level','pvalue_level','time_stamp','date','time','hour','pid','nonclk','clk']df =df[columns]df =df.drop(['time_stamp','time','nonclk'],axis=1)print(df.head(10))df.info()df.to_csv('../data/raw_sample_user_profile_ad_feature.csv',index=False)
数据统计分析
# -*- coding: utf-8 -*-# @Time : 2024/12/14 22:18# @Author : 从心# @File : spark_taobao_display_ad_click_rate_analysis.py# @Software : PyCharmfrompyspark.sql importSparkSessionfrompyspark.sql.functions importround,col,count,sum,meanspark =SparkSession.builder.getOrCreate()"""(1) 统计广告整体点击率"""df_raw_sample_user_profile_ad_feature =spark.read.format('csv')\ .option('header','true')\ .option('inferSchema','true')\ .option('escape','\"')\ .load('../data/raw_sample_user_profile_ad_feature.csv')df_raw_sample_user_profile_ad_feature.show(10)df_overall_click_count =df_raw_sample_user_profile_ad_feature.groupBy('clk').count()df_overall_click_rate =df_overall_click_count \ .withColumn('overall_click_rate',round((col('count')/df_raw_sample_user_profile_ad_feature.count()*100),2))\ .orderBy(col('overall_click_rate'),ascending=False)df_overall_click_rate.show()df_overall_click_rate.toPandas().to_csv('../result/overall_click_rate.csv',index=False,header=None)"""(2) 分析一段时间内的广告点击情况"""df_period_click_rate =df_raw_sample_user_profile_ad_feature.groupBy('date')\ .agg(count('clk').alias('展现量'),sum('clk').alias('点击量'),mean('clk').alias('点击率'))\ .orderBy('date')df_period_click_rate.show(10)df_period_click_rate.toPandas().to_csv('../result/period_click_rate.csv',index=False,header=None)"""(3) 分析一天内的广告点击情况"""df_hour_click_rate =df_raw_sample_user_profile_ad_feature.groupBy('hour')\ .agg(count('clk').alias('展现量'),sum('clk').alias('点击量'),mean('clk').alias('点击率'))\ .orderBy('hour')df_hour_click_rate.show(10)df_hour_click_rate.toPandas().to_csv('../result/hour_click_rate.csv',index=False,header=None)"""(4) 统计广告点击量 Top 1, 分析该广告的点击用户特征"""df_click_sum =df_raw_sample_user_profile_ad_feature.groupBy('adgroup_id')\ .agg(count('clk').alias('展现量'),sum('clk').alias('点击量'),mean('clk').alias('点击率'))\ .orderBy('点击量',ascending=False)df_click_sum.show(1)df_click_sum.toPandas().to_csv('../result/click_sum.csv',index=False,header=None)"""(6) 分析广告展示对用户行为的影响"""df_behavior_log =spark.read.format('csv')\ .option('header','true')\ .option('inferSchema','true')\ .option('escape','\"')\ .load('../data/behavior_log_cleaned.csv')df_behavior_log.show(10)df_joined =df_behavior_log.join(df_raw_sample_user_profile_ad_feature,on='userid',how='left')df_joined =df_joined.dropna(subset=['btag'])df_btag =df_joined.groupBy('adgroup_id').pivot('btag').count()df_btag =df_btag.fillna(0)df_btag =df_btag.withColumn('behavior_sum',col('pv')+col('cart')+col('fav')+col('buy'))df_btag =df_btag.withColumn('cart_rate',round(col('cart')/col('behavior_sum'),4))df_btag =df_btag.withColumn('fav_rate',round(col('fav')/col('behavior_sum'),4))df_btag =df_btag.withColumn('buy_rate',round(col('buy')/col('behavior_sum'),4))df_btag.show(10)df_btag.toPandas().to_csv('../result/btag.csv',index=False,header=None)
结果可视化
# -*- coding: utf-8 -*-# @Time : 2024/12/14 22:37# @Author : 从心# @File : spark_taobao_display_ad_click_rate_prediction_visualization.py# @Software : PyCharmimportpandas aspdfrompyecharts importoptions asoptsfrompyecharts.charts importPie,Bar,Line,Radarfrompyspark.sql importSparkSessionfrompyspark.sql.functions importsum,max,mindefchart_1():""" (1) 统计广告整体点击率 """csv_path ='../result/overall_click_rate.csv'names =['clk','count','overall_click_rate']df =pd.read_csv(csv_path,header=None,names=names)labels =['未点击量','点击量']x =sorted(df['overall_click_rate'].to_list(),reverse=True)data =[list(x)forx inzip(labels,x)]pie =Pie(init_opts=opts.InitOpts(width='600px',height='450px'))pie.add('',data)pie.set_global_opts(title_opts=opts.TitleOpts(title='广告整体点击率',pos_left='center'),legend_opts=opts.LegendOpts(orient='vertical',pos_top='10%',pos_right='10%'))pie.set_series_opts(label_opts=opts.LabelOpts(formatter='{ b}: { d}%'))pie.render('../visualization/[1]overall_click_rate.html')pie.render_notebook()defchart_2():""" (2) 分析一段时间内的广告点击情况 """csv_path ='../result/period_click_rate.csv'names =['date','展现量','点击量','点击率']df =pd.read_csv(csv_path,header=None,names=names)x =df.date.tolist()y1 =df.展现量.tolist()y2 =df.点击量.tolist()y3 =(df.点击率.values *100).tolist()y3 =[round(i,2)fori iny3]bar =(Bar(init_opts=opts.InitOpts(width='600px',height='450px')).add_xaxis(x).add_yaxis('展现量',y1).add_yaxis('点击量',y2).extend_axis(yaxis=opts.AxisOpts(axislabel_opts=opts.LabelOpts(formatter='{ value} %'),interval=1,max_=6)).set_series_opts(label_opts=opts.LabelOpts(is_show=True)).set_global_opts(title_opts=opts.TitleOpts(title='2017-05-05 至 2017-05-13 的广告点击情况',pos_left='center'),xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-30)),yaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(formatter='{ value}'),max_=100000),legend_opts=opts.LegendOpts(pos_top='10%',pos_right='10%')))line =Line().add_xaxis(x).add_yaxis('点击率',y3,yaxis_index=1)bar.overlap(line)bar.render('../visualization/[2]period_click_rate.html')bar.render_notebook()defchart_3():""" (3) 分析一天内的广告点击情况 """csv_path ='../result/hour_click_rate.csv'names =['hour','展现量','点击量','点击率']df =pd.read_csv(csv_path,header=None,names=names)x =df.hour.tolist()y1 =(df.展现量 -df.点击量).tolist()y2 =df.点击量.tolist()y3 =(df.点击率.values *100).tolist()y3 =[round(i,2)fori iny3]bar =(Bar(init_opts=opts.InitOpts(width='800px',height='600px')).add_xaxis(x).add_yaxis('未点击量',y1,stack='stack_1',category_gap='80%').add_yaxis('点击量',y2,stack='stack_1',category_gap='80%').extend_axis(yaxis=opts.AxisOpts(axislabel_opts=opts.LabelOpts(formatter='{ value}%'),interval=1,min_=4,max_=5.5)).set_series_opts(label_opts=opts.LabelOpts(is_show=False)).set_global_opts(title_opts=opts.TitleOpts(title='一天内的广告点击情况',pos_left='center'),xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-30)),yaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(formatter='{ value}'),max_=50000),legend_opts=opts.LegendOpts(pos_top='10%',pos_right='10%')))line =Line().add_xaxis(x).add_yaxis('点击率',y3,yaxis_index=1,markline_opts=opts.MarkLineOpts(data=[opts.MarkLineItem(type_='average')]))bar.overlap(line)bar.render('../visualization/[3]hour_click_rate.html')bar.render_notebook()defchart_4():""" (4) 统计广告点击量 Top 1, 分析该广告的点击用户特征 """spark =SparkSession.builder.getOrCreate()df =spark.read.format('csv')\ .option('header','true')\ .option('inferSchema','true')\ .option('escape','\"')\ .load('../data/raw_sample_user_profile_ad_feature.csv')adgroup_id_44952 =df[df['adgroup_id']==44952]adgroup_id_44952.show(10)adgroup_id_44952_click =adgroup_id_44952[adgroup_id_44952['clk']==1]adgroup_id_44952_click.show(10)adgroup_id_44952_cms_segid =adgroup_id_44952_click.groupby('cms_segid')\ .agg(sum('clk').alias('点击量'))\ .orderBy('cms_segid')\ .toPandas()adgroup_id_44952_cms_segid =[(f'微群{ int(i[0])}',i[1])fori inzip(adgroup_id_44952_cms_segid['cms_segid'],adgroup_id_44952_cms_segid['点击量'])]print(adgroup_id_44952_cms_segid)adgroup_id_44952_cms_group_id =adgroup_id_44952_click.groupby('cms_group_id')\ .agg(sum('clk').alias('点击量'))\ .orderBy('cms_group_id')\ .toPandas()adgroup_id_44952_cms_group_id =[(f'cms_group{ int(i[0])}',i[1])fori inzip(adgroup_id_44952_cms_group_id['cms_group_id'],adgroup_id_44952_cms_group_id['点击量'])]print(adgroup_id_44952_cms_group_id)adgroup_id_44952_final_gender_code =adgroup_id_44952_click.groupby('final_gender_code')\ .agg(sum('clk').alias('点击量'))\ .orderBy('final_gender_code')\ .toPandas()print(adgroup_id_44952_final_gender_code)adgroup_id_44952_final_gender_code =[('男',46),('女',54)]adgroup_id_44952_age_level =adgroup_id_44952_click.groupby('age_level')\ .agg(sum('clk').alias('点击量'))\ .orderBy('age_level')\ .toPandas()adgroup_id_44952_age_level =[(f'年龄层次{ int(i[0])}',i[1])fori inzip(adgroup_id_44952_age_level['age_level'],adgroup_id_44952_age_level['点击量'])]print(adgroup_id_44952_age_level)adgroup_id_44952_shopping_level =adgroup_id_44952_click.groupby('shopping_level')\ .agg(sum('clk').alias('点击量'))\ .orderBy('shopping_level')\ .toPandas()print(adgroup_id_44952_shopping_level)adgroup_id_44952_shopping_level =[('浅层用户',2),('中度用户',4),('深度用户',94)]adgroup_id_44952_occupation =adgroup_id_44952_click.groupby('occupation')\ .agg(sum('clk').alias('点击量'))\ .orderBy('occupation')\ .toPandas()print(adgroup_id_44952_occupation)adgroup_id_44952_occupation =[('非大学生',98),('大学生',2)]adgroup_id_44952_new_user_class_level =adgroup_id_44952_click.groupby('new_user_class_level')\ .agg(sum('clk').alias('点击量'))\ .orderBy('new_user_class_level')\ .toPandas()adgroup_id_44952_new_user_class_level =[(f'城市层级{ int(i[0])}',i[1])fori inzip(adgroup_id_44952_new_user_class_level['new_user_class_level'],adgroup_id_44952_new_user_class_level['点击量'])]print(adgroup_id_44952_new_user_class_level)adgroup_id_44952_pvalue_level =adgroup_id_44952_click.groupby('pvalue_level')\ .agg(sum('clk').alias('点击量'))\ .orderBy('pvalue_level')\ .toPandas()print(adgroup_id_44952_pvalue_level)adgroup_id_44952_pvalue_level =[('低档',28),('中档',66),('高档',6)]pie =(Pie(init_opts=opts.InitOpts(width='1200px',height='600px')).add('微群特征',adgroup_id_44952_cms_segid,center=['20%','30%'],radius=[50,100],label_opts=opts.LabelOpts(formatter='{ b}: { d}%',position='inside')).add('cms_group 特征',adgroup_id_44952_cms_group_id,center=['40%','30%'],radius=[50,100],label_opts=opts.LabelOpts(formatter='{ b}: { d}%',position='inside')).add('性别特征',adgroup_id_44952_final_gender_code,center=['60%','30%'],radius=[50,100],label_opts=opts.LabelOpts(formatter='{ b}: { d}%',position='inside')).add('年龄层次特征',adgroup_id_44952_age_level,center=['80%','30%'],radius=[50,100],label_opts=opts.LabelOpts(formatter='{ b}: { d}%',position='inside')).add('购物深度特征',adgroup_id_44952_shopping_level,center=['20%','70%'],radius=[50,100],label_opts=opts.LabelOpts(formatter='{ b}: { d}%',position='inside')).add('是否为大学生特征',adgroup_id_44952_occupation,center=['40%','70%'],radius=[50,100],label_opts=opts.LabelOpts(formatter='{ b}: { d}%',position='inside')).add('城市层级特征',adgroup_id_44952_new_user_class_level,center=['60%','70%'],radius=[50,100],label_opts=opts.LabelOpts(formatter='{ b}: { d}%',position='inside')).add('消费档次特征',adgroup_id_44952_pvalue_level,center=['80%','70%'],radius=[50,100],label_opts=opts.LabelOpts(formatter='{ b}: { d}%',position='inside'),).set_global_opts(title_opts=opts.TitleOpts(title='广告 ID 为 44952 的点击用户特征',pos_left='center'),legend_opts=opts.LegendOpts(is_show=False)).set_series_opts(tooltip_opts=opts.TooltipOpts(trigger='item',formatter='{ a}<br/>{ b}: { d}%')))pie.render('../visualization/[4]adgroup_id_44952_user_feature.html')pie.render_notebook()defchart_5():""" (5) 统计广告点击量 Top 1, 分析该广告的主要受众群体 """spark =SparkSession.builder.getOrCreate()df =spark.read.format('csv')\ .option('header','true')\ .option('inferSchema','true')\ .option('escape','\"')\ .load('../data/raw_sample_user_profile_ad_feature.csv')df.select(max('cms_segid'),min('cms_segid')).show()df.select(max('cms_group_id'),min('cms_group_id')).show()df.select(max('final_gender_code'),min('final_gender_code')).show()df.select(max('age_level'),min('age_level')).show()df.select(max('new_user_class_level'),min('new_user_class_level')).show()data =[{ 'value':[0,4,2,4,3,0,2,2],'name':'广告 ID 为 44952 的主要受众群体'}]radar =(Radar(init_opts=opts.InitOpts(width='800px',height='600px',bg_color='#FFFFFF')).set_colors(['#4587E7']).add_schema(schema=[opts.RadarIndicatorItem(name='微群',max_=96,min_=0),opts.RadarIndicatorItem(name='cms_group_id',max_=12,min_=0),opts.RadarIndicatorItem(name='性别 (1:男, 2:女)',max_=2,min_=1),opts.RadarIndicatorItem(name='年龄层次',max_=6,min_=0),opts.RadarIndicatorItem(name='购物深度 (1:浅层用户, 2:中度用户, 3:深度用户)',max_=3,min_=1),opts.RadarIndicatorItem(name='是否为大学生 (1:是, 0:否)',max_=1,min_=0),opts.RadarIndicatorItem(name='城市层级',max_=4,min_=1),opts.RadarIndicatorItem(name='消费档次 (1:低档, 2:中档, 3:高档)',max_=3,min_=1)],shape='circle',splitarea_opt=opts.SplitAreaOpts(is_show=True,areastyle_opts=opts.AreaStyleOpts(opacity=1)),textstyle_opts=opts.TextStyleOpts(color='#000000')).add(series_name='广告 ID 为 44952 的主要受众群体',data=data,linestyle_opts=opts.LineStyleOpts(color='#CD0000')).set_series_opts(label_opts=opts.LabelOpts(is_show=True)).set_global_opts(title_opts=opts.TitleOpts(title='广告的主要受众群体'),legend_opts=opts.LegendOpts()))radar.render('../visualization/[5]adgroup_id_44952_primary_audience.html')radar.render_notebook()defchart_6():""" (6) 分析广告展示对用户行为的影响 """csv_path ='../result/btag.csv'names =['adgroup_id','buy','cart','fav','pv','behavior_sum','cart_rate','fav_rate','buy_rate']df =pd.read_csv(csv_path,header=None,names=names,nrows=10)x =df.adgroup_id.astype('str').tolist()y1 =df.pv.tolist()y2 =df.cart.tolist()y3 =df.fav.tolist()y4 =df.buy.tolist()y5 =(df.buy_rate.values *100).tolist()y5 =[round(i,2)fori iny5]bar =(Bar(init_opts=opts.InitOpts(width='800px',height='600px')).add_xaxis(x).add_yaxis('浏览',y1).add_yaxis('加购',y2).add_yaxis('喜欢',y3).add_yaxis('购买',y4).extend_axis(yaxis=opts.AxisOpts(axislabel_opts=opts.LabelOpts(formatter='{ value} %'),interval=1,max_=10)).set_series_opts(label_opts=opts.LabelOpts(is_show=False)).set_global_opts(title_opts=opts.TitleOpts(title='点击广告后的用户行为',pos_left='center'),xaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(rotate=-30)),yaxis_opts=opts.AxisOpts(axislabel_opts=opts.LabelOpts(formatter='{ value}'),max_=50),legend_opts=opts.LegendOpts(pos_top='10%',pos_right='10%')))line =Line().add_xaxis(x).add_yaxis('购买率',y5,yaxis_index=1,markline_opts=opts.MarkLineOpts(data=[opts.MarkLineItem(type_='average')]))bar.overlap(line)bar.render('../visualization/[6]btag.html')bar.render_notebook()if__name__ =='__main__':fori inrange(1,7):eval(f'chart_{ i}()')
(1)统计广告整体点击率
(2)分析一段时间内的广告点击情况
(3)分析一天内的广告点击情况
(4)统计广告点击量Top 1,分析该广告的点击用户特征
(5)统计广告点击量Top 1,分析该广告的主要受众群体
(6)分析广告展示对用户行为的影响
相关文章
最新文章