2015年12月22日星期二

生成应用商店的用户数据文件

Select with concat:
hive> select concat(device_id, ',', appid, ',', apk_url) from download_log where device_id is not NULL;
hive> select concat(device_id, ',', package_name) from detail_log where device_id is not NULL;
Above data is too large to save.

For the huge number of detail log, I group them first:
hive> select device_id, package_name, count(package_name) rank from detail_log where length(device_id) > 0 and length(package_name) > 0 group by device_id, package_name order by rank desc;

If you want to split each entry by comma instead of tab, you can use awk (assume that your filename is test):
$ cat test | awk '{print $1“,"$2","$3}' > test_commma

For the download log, also I group them:
hive> select device_id, appid, count(appid) rank from download_log where length(device_id) > 0 group by device_id, appid order by rank desc;

Save results into local file:
insert overwrite local directory 'your_path'
row format delimited
fields terminated by '\t'
stored as textfile
select * from table where ...;

没有评论:

发表评论