2015年11月30日星期一

使用Hive统计应用商店数据

Download log:
{
    "device_id":"5d9e295b8705895f8c6de5420eb1973b",
    "download_log":
    {
        "apk_url":"http://res.appstore.ticwear.com/com.broadlink.rmt_3.5.9-1.1.9.apk",
        "appid":1000
    },
    "message_type":3,
    "remote_ip":"220.200.5.247",
    "time":1441571407,
    "uid":"29716727"
}

Count the number of devices:
hive> select count(distinct device_id) from download_log;

Count the download times of different users:
hive> select device_id, count(device_id) rank from download_log group by device_id order by rank desc;
or: hive> select uid, device_id, count(device_id) rank from download_log group by uid, device_id order by rank desc;

Count the download times of different apps:
hive> select apk_url, appid, count(appid) rank from download_log group by apk_url, appid order by rank desc;

Detail log:
{
    "detail_log":
    {
        "from":"",
        "package_name":"com.stmp.minimalface",
        "query":""
    },
    "device_id":"a5147789062ba29b06562bca6cbb3cc1",
    "message_type":0,
    "remote_ip":"61.239.223.125",
    "time":1441331302,
    "uid":"29728861"
}

Count the detail times of different users:
hive> select device_id, count(device_id) rank from detail_log group by device_id order by rank desc;

Count the detail times of different apps:
hive> select package_name, count(package_name) rank from detail_log group by package_name order by rank desc;

Save the result into local file system:
hive> insert overwrite local directory '$your_path'
select device_id, count(device_id) rank from download_log group by device_id order by rank desc;

Save the result into HDFS:
hive> insert overwrite directory '$your_path'
select device_id, count(device_id) rank from download_log group by device_id order by rank desc;

Save the result into other table:
hive> insert into table test
select device_id, count(device_id) rank from download_log group by device_id order by rank desc;

没有评论:

发表评论