CHEN Xiaoyu's blog: 2015

2015年12月28日星期一

HDFS的用户权限问题和用户代理

用户代理机制：link.
HDFS用户权限管理：link.

因为下面设置环境变量解决了我的HDFS访问权限问题，所以针对用户代理机制和用户权限管理我没有进一步深入研究。

设置HADOOP_USER_NAME环境比较重要，将环境变量设置为Hadoop的超级用户（启动Namenode进程时的用户）便可以在集群外访问HDFS（在没有安全配置的情况下）。
export HADOOP_USER_NAME=hadoop
export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop

HDFS更改文件所有者和权限：
hdfs dfs -chown -R username:group /filepath
hdfs dfs -chmod ...

2015年12月24日星期四

Read Python code in Source Insight

First, download the Python.CLF file in this website: link.

In Source Insight, click Options -> Preferences... and import Python.CLF.

Then in Options -> Document Options..., click 'Add Type...' and add Python Source File. Settings are shown below:

Finish.

2015年12月22日星期二

生成应用商店的用户数据文件

Select with concat:
hive> select concat(device_id, ',', appid, ',', apk_url) from download_log where device_id is not NULL;
hive> select concat(device_id, ',', package_name) from detail_log where device_id is not NULL;
Above data is too large to save.

For the huge number of detail log, I group them first:
hive> select device_id, package_name, count(package_name) rank from detail_log where length(device_id) > 0 and length(package_name) > 0 group by device_id, package_name order by rank desc;

If you want to split each entry by comma instead of tab, you can use awk (assume that your filename is test):
$ cat test | awk '{print $1“,"$2","$3}' > test_commma

For the download log, also I group them:
hive> select device_id, appid, count(appid) rank from download_log where length(device_id) > 0 group by device_id, appid order by rank desc;

Save results into local file:
insert overwrite local directory 'your_path'
row format delimited
fields terminated by '\t'
stored as textfile
select * from table where ...;

Spark MLlib中的数据类型

Please see this doc.

Spark

Please see my doc.

The official document is a good reference for you.

Maybe these links are useful: link1, link2, link3, link4.

Here is a blog about big data.

Sublime Text使用教程

Please see thie doc.

2015年12月17日星期四

IntelliJ IDEA分别搭建基于maven和sbt的Scala项目

About the settings and keyboard shortcuts in IntelliJ, please see this doc.

About how to build Spark apps in IntelliJ, please see this doc.

2015年12月16日星期三

Scala中this.type的使用

一直有一个疑问没搞懂，有必要在这里发上一帖，参见我提的问题和回答：链接。

问题如下：

I defined two classes:

class A { def method1 = this }
class B extends A { def method2 = this }

val b = new B

Then I checked the type of b.method1:

scala> b.method1.getClass
res29: Class[_ <: A] = class B

In this case, I cannot use b.method1.method2:

scala> b.method1.method2
<console>:11: error: value method2 is not a member of A
              b.method1.method2

So I have to define A and B like this:

class A { def method1: this.type = this } 
class B extends A { def method2: this.type = this } 

val b = new B

Now I check the type of b.method1:

scala> b.method1.getClass
res31: Class[_ <: B] = class B

Here b.method1.method2 works:

scala> b.method1.method2
res32: b.type = B@4ff3ac

My question here is what does it mean by saying Class[_ <: A] = class B and Class[_ <: B] = class B? And why does the first doesn't work as Class[_ <: A] = class B seems to say that it's also class B?

Reactormonk很好地解答了我的疑问：

Let's split the expression Class[_ <: A] = class B. The first part, Class[_ <: A] tells you what the compiler knows at compile time, that b.method1 returns something of type Class[_ <: A]. The second part, class B mentions that you have a class B as value, but the compiler doesn't know that. That's runtime information.

this.type specializes the method for subclasses, where the type inferred from a plain thisdoesn't.

2015年12月14日星期一

git中ssh密钥的配置

Windows：
首先安装Git和TortoiseGit两个软件，windows中使用TortoiseGit图形化界面进行git操作。
安装Git时Adjusting your PATH environment选择Use Git from the Windows Command Prompt，Configuring the line ending conversions选择Checkout Windows-style, commit Unix-style line endings。（Windows中的换行是CRLF，Linux中的换行是LF）

打开TortoiseGit中的PuTTYgen，点击"Generate"按钮，鼠标在上图的空白地方来回移动直到进度条完毕，就会自动生一个随机的key，在"Key passphrase"和"Confirm passphrase"的输入框中输入自己的密码。复制"Public key for pasting into OpenSSH authorized_keys file"下文本框中以ssh-rsa开头的所有内容，添加到github的公钥ssh key中。点击Save private key保存私钥，后缀名为ppk。

选择本地的文件夹，右键选择Git Clone...，输入url，在Load Putty Key中选择刚刚保存的ppk文件，OK。现在可以使用TortoiseGit来clone项目了。

如果你需要使用命令行，那还需要配置一下git的ssh密钥：
打开git安装目录下的git-bash.exe，键入命令：ssh-keygen -t rsa -C "xxx@email.com"
提醒你输入key的名称，直接回车使用默认名称。在C:\Users\$USER_NAME\.ssh中会生成id_rsa和id_rsa.pub，将id_rsa.pub文件中的内容保存到github中。

Linux：
安装Git。
cd ~/.ssh
ssh-keygen -t rsa -C "your-email-address"
将id_rsa.pub中的内容复制粘贴，添加到github的公钥中。
git config --global user.name "your-name"
git config --global user.email your-name@example.com

2015年12月13日星期日

C和C++项目中的extern "C" {}

参见这篇帖子。

关于C语言中的extern，一个例子：
1.c文件如下：
#include <stdio.h>

int fun1(int i)
{
return i + 1;
}

void main()
{
int i;
extern int j;
extern int fun2(int i);

printf("%d\n", fun2(j));
}

2.c文件如下：
extern int fun1(int i);

int j = 3;

int fun2(int i)
{
return fun1(i) + 2;
}

编译运行：
$ gcc -c 1.c 2.c
$ gcc 1.o 2.o -o 1
$ ./1
输出结果为6.

关于C调用C++、C++调用C，请参考上面帖子链接中的例子。

对于C和C++的混合编译，参考这个帖子。
一个例子：
$ g++ -c cpp2c.cpp -o cpp2c.o
$ gcc -c csayhello.c -o csayhello.o
$ gcc cpp2c.o csayhello.o -lstdc++ -o cpp2c
注意到，在最后链接的时候指定 C++ 标准库是必须的，这是因为我们用的是 gcc 而不是 g++ 调用的链接器。最后如果使用的是 g++ 的话，C++ 标准库默认会被链接。

2015年12月11日星期五

Mac中快捷键说明

：command

：option

：control

：shift

：caps lock

2015年12月7日星期一

Scala

Here are my own slides of learning Scala: link1 (Chinese), link2 (English).

This link is a Chinese tutorial of Scala.

See "A Tour of Scala" for more information.

Here are two video tutorials of Scala: link1, link2.

MacOS下Sublime Text搭建LaTeX编写环境

准备软件：
1、Sublime Text: 链接
2、BasicTeX: 链接
3、Skim: 链接

安装步骤：
1、安装Sublime Text，安装好后安装Package Control和LaTeXTools插件，参考之前的帖子。
2、安装BasicTeX和Skim。Skim安装完成后，在选项中设置同步如下：

3、打开终端，运行以下命令：
sudo tlmgr update --self
sudo tlmgr install latexmk
4、LaTeXTools配置
在Sublime Text -> Preferences -> Package Settings -> LatexTools中选择Reconfigure LaTeXTools and migrate settings，然后再选择Sublime Text -> Preferences -> Package Settings -> LatexTools -> Settings - User，在builder_settings中增加如下配置：
"program" : "xelatex",
"command" : ["latexmk", "-cd", "-e", "$pdflatex = 'xelatex -interaction=nonstopmode -synctex=1 %S %O'", "-f", "-pdf"],
如图所示：

否则会出现 Errors:/usr/local/texlive/2015/texmf-dist/tex/latex/fontspec/fontspec.sty:43: !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! [ }] 错误。
5、编辑你的tex文件，command + b编译，done。
6、反向搜索
在Skim中按Command + Shift + Click，你就会回到对应于点击位置的tex文件中。

2015年12月5日星期六

Mac找不到用于储存的钥匙串login解决方法

打开Finder，选择菜单栏前往，按住option键，进入资源库，删除资源库中Keychains文件夹中的内容，重启。
折腾了很久搜到了这个解决方法，Mac也有不靠谱的时候。

2015年12月4日星期五

关于移位操作

在计算机中，正数是用原码表示的，负数是用补码表示的。第一位是符号位，0表示正，1表示负。
举个例子，-1在计算机中的表示为1111 1111 1111 1111 1111 1111 1111 1111，由1的反码再加1得到的。

移位操作：左移<< 右移>> 无符号右移>>>
左移无论对于正数还是负数都是乘以2的次幂。

这里主要谈右移：
正数的右移左边补0，所以是简单的除以2的次幂。
负数的右移左边补1，举例：
-1：无论右移多少都是全1，所以-1右移多少位还是-1
-5>>1：1111 1111 1111 1111 1111 1111 1111 1011，右移1位变成1111 1111 1111 1111 1111 1111 1111 1101，所以-5右移1位是-3

无符号右移：无论负数还是正数前面都补0，所以无符号右移的结果一定不会是负数。

2015年11月30日星期一

使用Hive统计应用商店数据

Download log:
{
"device_id":"5d9e295b8705895f8c6de5420eb1973b",
"download_log":
{
"apk_url":"http://res.appstore.ticwear.com/com.broadlink.rmt_3.5.9-1.1.9.apk",
"appid":1000
},
"message_type":3,
"remote_ip":"220.200.5.247",
"time":1441571407,
"uid":"29716727"
}

Count the number of devices:
hive> select count(distinct device_id) from download_log;

Count the download times of different users:
hive> select device_id, count(device_id) rank from download_log group by device_id order by rank desc;
or: hive> select uid, device_id, count(device_id) rank from download_log group by uid, device_id order by rank desc;

Count the download times of different apps:
hive> select apk_url, appid, count(appid) rank from download_log group by apk_url, appid order by rank desc;

Detail log:
{
"detail_log":
{
"from":"",
"package_name":"com.stmp.minimalface",
"query":""
},
"device_id":"a5147789062ba29b06562bca6cbb3cc1",
"message_type":0,
"remote_ip":"61.239.223.125",
"time":1441331302,
"uid":"29728861"
}

Count the detail times of different users:
hive> select device_id, count(device_id) rank from detail_log group by device_id order by rank desc;

Count the detail times of different apps:
hive> select package_name, count(package_name) rank from detail_log group by package_name order by rank desc;

Save the result into local file system:
hive> insert overwrite local directory '$your_path'
select device_id, count(device_id) rank from download_log group by device_id order by rank desc;

Save the result into HDFS:

hive> insert overwrite directory '$your_path'
select device_id, count(device_id) rank from download_log group by device_id order by rank desc;

Save the result into other table:
hive> insert into table test

select device_id, count(device_id) rank from download_log group by device_id order by rank desc;

2015年11月25日星期三

在Hive中根据Json格式的日志创建用户数据表

PART 1
If your JSON file is not in nested structures, please see this link.Or you can also use the method in part 2.

PART 2
As this method does not support nested structures, there is another way to parse json files: link.

We need to parse the json file in the following format:
{
"detail_log":
{
"from":"",
"package_name":"com.aggro.wearappmanager",
"query":""
},
"device_id":"2fc35dde39093c78302ed45b287ae81e",
"message_type":0,
"remote_ip":"39.169.9.107",
"time":1446268416,
"uid":"29737982"
}

As listed in the official document, we can use get_json_object or json_tuple to parse json string.

At first, I wanted to create a app_log table with struct but I cannot keep going to extract json object, keep it unfinished:
hive> create table app_log (detail_log struct<`from`:string, package_name:string, query:string>, device_id string, message_type int, remote_ip string, time bigint, uid string);

Attention: Never load data from HDFS! See link1 and link2.

Below are my steps:
1. hive> create table src_json(json string);
hive> load data local inpath '/home/chenxiaoyu/data/app.log.2015-11-01.1446328474758' into table src_json;
2. hive> create table app_log as select get_json_object(src_json.json, '$.detail_log.from'), get_json_object(src_json.json, '$.detail_log.package_name'), get_json_object(src_json.json, '$.detail_log.query'), get_json_object(src_json.json, '$.device_id'), get_json_object(src_json.json, '$.message_type'), get_json_object(src_json.json, '$.remote_ip'), get_json_object(src_json.json, '$.time'), get_json_object(src_json.json, '$.uid') from src_json;
3. hive> alter table app_log change `_c0` `from` string;
hive> alter table app_log change `_c1` package_name string;
hive> alter table app_log change `_c2` query string;
hive> alter table app_log change `_c3` device_id string;
hive> alter table app_log change `_c4` message_type int;
hive> alter table app_log change `_c5` remote_ip string;
hive> alter table app_log change `_c6` time bigint;
hive> alter table app_log change `_c7` uid string;
done.

Till now the work has been finished, but for step 3, I have something to say.
I wanted to use json_tuple, but it seems that "select json_tuple(src_json.json, 'detail_log.package_name') from src_json;" doesn't work. Only "select json_tuple(src_json.json, 'detail_log') from src_json" works.
So I tried to create another table to save detail_log:
hive> create table src_json2 as select json_tuple(src_json.json, 'detail_log') from src_json；
hive> alter table src_json2 change c0 detail string;
But then "select json_tuple(src_json2.detail, 'from', 'package_name', 'query'), json_tuple(src_json.json, 'device_id', 'message_type', 'remmote_ip', 'time', 'uid') from src_json, src_json2" failed, this error occurs:

Only a single expression in the SELECT clause is supported with UDTF's.

So I don't know how to do with that, keep this unsolved.

PART 3
We encounter this problem: In the same file, we have two different types of JSON as shown below:
{
"detail_log":
{
"from":"",
"package_name":"com.mapelf",
"query":""
},
"device_id":"897ee392eaec9eeac0e9de069ab3aac5",
"message_type":0,
"remote_ip":"120.207.151.129",
"time":1446268421,
"uid":"29752894"
}

{
"device_id":"42f89c8deb5decd55380205cd6decaf6",
"download_log":
{
"apk_url":"http://res.appstore.ticwear.com/net.ddroid.aw.watchface.mv08_5.4.3.apk",
"appid":911
},
"message_type":3,
"remote_ip":"106.39.192.55",
"time":1446268426,
"uid":"29754909"
}

We can use a trick to deal with this case.
As the first JSON has the 'package_name' field and is not NULL and the second JSON has the 'apk_url' field and is not NULL, so we can distinguish between the two of them.
We save the two JSON logs into two tables: detail_log and download_log.

My steps:
If you want to use data in HDFS, try this step 1:
1. hive> create external table src_json(json string) stored as textfile location '<hdfs_location>';
Otherwise you can follow this step 1:
1. hive> create table src_json(json string);
hive> load data local inpath '/home/chenxiaoyu/data/app.log.2015-11-01.1446328474758' into table src_json;
2. hive> create table detail_log as select get_json_object(src_json.json, '$.detail_log.from'), get_json_object(src_json.json, '$.detail_log.package_name'), get_json_object(src_json.json, '$.detail_log.query'), get_json_object(src_json.json, '$.device_id'), get_json_object(src_json.json, '$.message_type'), get_json_object(src_json.json, '$.remote_ip'), get_json_object(src_json.json, '$.time'), get_json_object(src_json.json, '$.uid') from src_json where get_json_object(src_json.json, '$.detail_log.package_name') is not NULL;
3. hive> alter table detail_log change `_c0` `from` string;
hive> alter table detail_log change `_c1` package_name string;
hive> alter table detail_log change `_c2` query string;
hive> alter table detail_log change `_c3` device_id string;
hive> alter table detail_log change `_c4` message_type int;
hive> alter table detail_log change `_c5` remote_ip string;
hive> alter table detail_log change `_c6` time bigint;
hive> alter table detail_log change `_c7` uid string;
4. hive> create table download_log as select get_json_object(src_json.json, '$.device_id'), get_json_object(src_json.json, '$.download_log.apk_url'), get_json_object(src_json.json, '$.download_log.appid'), get_json_object(src_json.json, '$.message_type'), get_json_object(src_json.json, '$.remote_ip'), get_json_object(src_json.json, '$.time'), get_json_object(src_json.json, '$.uid') from src_json where get_json_object(src_json.json, '$.download_log.apk_url') is not NULL;
5. hive> alter table download_log change `_c0` device_id string;
hive> alter table download_log change `_c1` apk_url string;
hive> alter table download_log change `_c2` appid int;
hive> alter table download_log change `_c3` message_type int;
hive> alter table download_log change `_c4` remote_ip string;
hive> alter table download_log change `_c5` time bigint;
hive> alter table download_log change `_c6` uid string;
done.

Matlab 2015a and Visual Studio 2015

When entering this command: mex -setup, this error occurs:
Error using mex. No supported compiler or SDK was found.

Solution: see this link. It may help you.

2015年11月21日星期六

Hive

Install and configure Hive:
Please see this doc.

Hive operations:
Please see this doc.

A video tutorial of Hive: link.

Windows下Sublime Text搭建LaTeX编写环境

注意：
tex路径中千万不要有中文！否则ctrl+b编译会一片空白。

准备软件：
1、Sublime Text：链接.
2、TeX Live: 链接.
3、SumatraPDF：链接.

安装步骤：
1、安装Sublime Text。在安装时勾选Add to explorer context menu，这样在右键单击文件时就可以直接使用Sublime Text打开。
2、安装TeX Live。不要在任意的步骤中使用含有中文或空格的路径，以免触发一些潜在的bug。可以参考：链接。
3、安装SumatraPDF，安装路径中还是不要出现空格。

设置步骤：
1、Sublime Text
安装Package Control，网上有很多教程，自行搜索，参考链接。
装好Package Control后，安装LaTeXTools插件：在sublime text界面按下ctrl+shift+p，输入install package，回车，再输入latextools，回车，完成。

安装好LaTeXTools后，需要更改配置：
先点击Preferences - Package Settings -> LaTeXTools -> Reconfigure LaTeXTools and migrate settings
然后点击菜单栏：Preferences - Package Settings -> LaTeXTools -> Setting-User，
修改配置如下图：

保存。

2、编辑你的tex文件，ctrl+b编译，pdf文件会自动打开。

3、设置从SumatraPDF中反向搜索
打开SumatraPDF，设置 - 选项...，设置反向搜索命令行：
"C:\Program Files\Sublime Text 2\sublime_text.exe" "%f:%l"

确定。

设置好了之后双击pdf文件中的位置可以跳转到代码位置，done。

如果你发现ctrl+b编译时候是空白，试试这个方法：
卸载sublime text，然后删除C:\Users\your-username\AppData\Roaming中的sublime text文件夹，重新安装设置。

P.S. 推荐安装BracketHighlighter插件，可以高亮匹配括号。

2015年11月18日星期三

The difference between "hadoop fs" and "hdfs dfs"

Please see this link.

2015年11月17日星期二

用Maven构建Hadoop项目

参见用Maven构建Hadoop项目。

遇到的问题及解决方法：
1、安装Eclipse的Maven插件出错
错误提示：Cannot complete the install because one or more required items could not be found.
requires 'bundle org.slf4j.api 1.6.2' but it could not be found
解决方法：首先安装

Name: slf4j
Url: http://www.fuin.org/p2-repository/

参见Stack Overflow的帖子。

2015年11月12日星期四

解决virtualbox复制ubuntu后改变mac地址不能识别网卡问题

参见链接方法一中的（2）
vi /etc/udev/rules.d/70-persistent-net.rules

2015年11月11日星期三

VirtualBox中设置ubuntu虚拟机和主机通信且可以连接外网

网卡1设置为网络地址转换(NAT)，网卡2设置为host-only。
具体设置参考：为VirtualBox中的Ubuntu配置双网卡
我的/etc/network/interfaces文件如下：

有时候网连不上，需要先ifdown再ifup起来，原因不明。为了避免不必要的问题，主机最好连接有线网。

Hadoop安装

Ubuntu14.04：环境搭建
CentOS6.4：环境搭建
Hadoop2.7.1：安装

See also: install-hadoop or this doc

安装ubuntu server过程中出现的两个问题和解决方法

在VirtualBox中安装了ubuntu-14.04-server，遇到以下两个问题：

1、我安装了samba服务，开机时候遇到遇到一个fail：
Starting SMB/CIFS File and Active Directory Server. [FAIL]
查了一下，找到了一个解决方法：

echo manual | tee /etc/init/samba-ad-dc.override

参考：链接1，链接2

2、ubuntu屏幕分辨率太低
解决方法: http://blog.csdn.net/weilanxing/article/details/7664324

2015年6月16日星期二

Reading List 2015.6.16

Semidefinite Relaxation of Quadratic Optimization Problems:
Main contribution of this work transform the QCQP problem into SDP problem by ignoring the rank 1 constraint as an approximation.

Subsampling Algorithms for Semidefinite Programming:
An algorithm to solve spectral norm minimization problem as below:

All matrix parameters are semidefinite.
Based on stochastic decent algorithm, this paper propose an stochastic approximation algorithm where the gradient is approximated using randomized matrix multiplication and randomized low-rank approximation.

Uniform Sampling for Matrix Approximation:
A uniform sampling methods to approximate leverage score.

Simple and Deterministic Matrix Sketching:
A streaming algorithm to maintain a sketch low dimension B so that $A^T A \approx B^T B$ and this algorithm is perfectly parallelizable.

Code Generation for Embedded Second-Order Cone Programming:
This article proposes a parser/generator which only needs to map the parameters to the structure of standard form and then can be solved by custom solver. This paper only gives results for second-order cone programming, lacking semidefinite programming.
QCML is a python toolbox for this kind of matrix stuffing.

Performance Analysis of Cooperative Wireless Networks with Unreliable Backhaul Links:
This paper propose a closed-form for sum of independent bernoulli-weighted exponential random variables which characterizes the outage performance.

2015年6月4日星期四

Reading List 2015.6.3

A Pratical Guide to Randomized Matrix Computations with MATLAB Implementations

This is a very useful paper for whom wants to implements the randomized algorithm in MATLAB and is easy to follow and understand.
In this paper, several matrix sketching methods are first introduced. Then application of least square regression and rank k svd are introduced.
What confused me now is that for SPSD matrix sketching, it seems that for the methods of matrix inversion and eigenvalue decomposition given in the paper, the matrix needs to be SPSD. What if the matrix is not SPSD when I want to implement randomized algorithm to compute the inversion and eigenvalue decomposition?

2015年6月2日星期二

Reading List 2015.6.2

"Sketching as a Tool for Numerical Linear Algrbra"

-Randomized algorithms:
random sampling, see reference:
1. Randomized algorithms for matrices and data.
2. Iterative row sampling. In FOCS, pages 127-136.
3. Uniform sampling for matrix approximation.

random projection, see reference:
1. Randomized algorithms for matrices and data.
2. Sketching as a Tool for Numerical Linear Algebra.

Some applications: least square regression, least absolute deviation regression, low rank approximation, graph sparsification. The first three applications is introduced in the video tutorial here.

This paper is very technical and not easy to understand. Anyway, sketching technic is a very useful tool in many areas.

ADMM

“Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers”

Solve this kind of problem:

where f and g are convex.

Precursors:
- Dual Ascent

- Dual Decomposition
Related ideas appear in well known work by Dantzig and Wolfe [44] and Benders [13] on large-scale linear programming, as well as in Dantzig's seminal book [43]
The general idea of dual decomposition appears to be originally due to Everett [69] and is explored in many early references [107, 84, 117, 14].
Subgradient method to solve the dual problem by Shor [152].
Good references on dual methods and decomposition include the book by Bertsekas [16, chapter 6] and the survey by Nedic and Ozdaglar [131].
More generally, decentralized optimization.

- Augmented Lagrangians and the Method of Multipliers
Augmented Lagrangian methods yield convergence without assumptions like strict convexity or finiteness of f.
Augmented Lagrangians and the method of multipliers for constrained optimization were first proposed by Hestenes [97, 98] and Powell [138].

Alternating Direction Method of Multipliers:
unscaled form & scaled form.

Algorithms:

Scaled form:

Very useful algorithm, I need to spend more time on it to have a better understanding.

2015年6月1日星期一

Reading List - 2015.5.31

1. Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding
A theoretical paper for scs solver in cvx.
The main contribution and final algorithm:

2. Graph Implementations for Nonsmoooth Convex Programs
A general idea of transform problems into graph implementation form and automate the transformation of a problem into standard form for a particular solver.
Main contribution and idea:

3. Randomized algorithms for matrices and data
A very technical paper and not easy to understand. The main idea is reduce the dimension of the matrix using either random sampling or random projection algorithms.
For a simple case of LS approximation, the idea is:

2015年5月29日星期五

How to install Julia language in Linux/CentOS

I installed Julia in my minimal CentOS 7.

First, make sure you have already installed these packages:
yum install -y gcc
yum install -y gcc-c++
yum install -y gcc-gfortran
yum install -y git
yum install -y patch
yum install -y bzip2
yum install -y m4

Then,
git clone -b release-0.3 git://github.com/JuliaLang/julia.git
cd julia
make
ln -s usr/bin/julia /usr/local/bin/julia

Enjoy!

More information: Julia in Github

2015年5月21日星期四

Things need to do after minimal CentOS 7 installation. Cent OS 7最小化安装之后要做的几件事.

English version:

Before doing this, you need to know something about vi and vim first.

1. Connect to the network.
vi /etc/sysconfig/network-scripts/ifcfg-enoxxxxxxxx
Change the item ONBOOT=yes, then save and reboot.
Attention, we choose NAT network connection here. If you choose Bridge, you need to configure the IP address and others manually. Later I will talk about this.

2. Install ifconfig command
yum search ifconfig, to find the package related to ifconfig which is net-tools.
yum install net-tools

3. Install vim
yum search vim, to find that the package we need to install is vim-X11 and vim-enhanced.
We can just use this command to install all the vim packages:
yum install -y vim*

4. Disable the alarm bell
4.1 The alarm in command line
vim /etc/inputrc
Delete the commont "#" before "set bell-style none", save and reboot.
4.2 The alarm in vi and vim
vim /etc/vimrc
Add "set vb t_vb = " at the end of file.

5. Connect to the Internet using Bridge
vim /etc/sysconfig/network-scripts/ifcfg-enoxxxxxxxx
Add or change the configuration below:
BOOTPROTO=static
IPADDR=xxx.xxx.xxx.xxx
NETMASK=xxx.xxx.xxx.xxx
GATEWAY=xxx.xxx.xxx.xxx
DNS=xxx.xxx.xxx.xxx
If you have multiple DNS addresses, you can write:
DNS1=xxx.xxx.xxx
DNS2=xxx.xxx.xxx
Save and then restart the network service:

systemctl restart network

Chinese version

在这之前，你要清楚vi、vim最基本的编辑命令。

1、连接网络

vi /etc/sysconfig/network-scripts/ifcfg-enoxxxxxxxx

把最后一项的ONBOOT = yes，保存。

然后重启，注意，这里的网络连接方式选择为NAT，如果选择为Bridge模式，需要手动配置ip地址，后面会讲到如何配置。

2、安装ifconfig等相关命令

yum search ifconfig，找到需要安装的软件包是net-tools。

yum install net-tools

3、安装vim

yum search vim，找到要安装的软件包，安装vim-X11，vim-enhanced。

用下面命令一步到位：

yum install –y vim*

vim是一个很强大的编辑器，可以给vim加上很多个性化的自定义功能及按键，同时有很多很强大的插件可以使用，有兴趣的可以了解一下vim的配置。

4、删除讨厌的警告音

4.1 按Tab键总是有讨厌的警告音，修改以下文件：

vim /etc/inputrc

将第二行“#set bell-style none”前面的注释“#”去掉，保存，重启之后生效。

4.2 vi、vim中讨厌的警告音

vim /etc/vimrc

在文件最后添加配置“set vb t_vb = ”

5、修改系统时间

CentOS系统默认的系统时区是美国时区，我们把改成中国时区：

cp /usr/share/zoneinfo/Asia/Shanghai /etc/localtime

6、使用Bridge模式上网，使得外网的电脑能ping通本机

vim /etc/sysconfig/network-scripts/ifcfg-enoxxxxxxxx

改变或增加以下配置：

BOOTPROTO=static

IPADDR=xxx.xxx.xxx.xxx #你需要设置的静态ip地址

NETMASK=xxx.xxx.xxx.xxx # 子网掩码

GATEWAY=XXX.XXX.XXX.xxx # 网关

DNS=XXX.XXX.XXX.XXX # DNS地址

如果有多个DNS地址，可以写成：

DNS1=xxx.xxx.xxx.xxx

DNS2=xxx.xxx.xxx.xxx

保存。然后重启网络服务：

systemctl restart network

7、自己的电脑，为了省事，直接关闭了防火墙：
systemctl stop firewalld
关闭开机启动：systemctl disable firewalld

2015年5月20日星期三

How to install scs in matlab

My scs version: 1.1.0

First, add your scs/matlab folder to MATLAB search path, i.e. >> addpath xxx/scs-master/matlab;
Second, run make_scs.m file;
Last, rerun cvx_setup.m to add the scs solver into cvx.

One more thing, Xcode should be install first if you are using macbook.

2015年4月13日星期一

Fronthaul constrained problem

Fronthaul is defined as the link between BBUs and RRHs.

[1] “Fronthaul-Constrained Cloud Radio Access Networks: Insights and Challenges”
The heterogeneous cloud radio access network (H-CRAN) as a 5G paradigm toward green and soft themes is briefly presented in [3] to enhance C-RAN.

To alleviate the capacity constraint on the fronthaul links, a multi-service small-cell wireless access architecture based on combining radio-over-fiber with optical wavelength division multiplexing (WDM) techniques is proposed in [4].

- C-RAN System Architectures:
A. C-RAN components: RRH, BBU pool, Fronthaul.
Non-ideal fronthaul: bandwidth, time latency and jitter constraints.
B. C-RAN System Structures: Full centralization, Partial centralization, Hybrid centralization.
- Signal Compression and Quantization
An interesting result shows that by simply setting the quantization noise power proportional to the background noise level at each RRH, the quantize-and-forward scheme can achieve a capacity within a constant gap to a throughput performance upper bound. [6]
A. Compression and Quantization in the Uplink
Distributed Wyner-Ziv lossy compression; independent compression.
B. Compression and Quantization in the Downlink
Hybrid compression and message-sharing strategy for DL transmission is presented in [10].
- Coordinated Signal Processing and Clustering
A. Precoding Techniques
Two types of IQ-data transfer methods: after-precoding & before-precoding.
Sparsity: individual sparsity, group sparsity.
B. Clustering Techniques ？
Two types of RRH clustering schemes: disjoint clustering and user-centric clustering.
An explicit expression for the successful access probability (SAP) for clustered RRHs is derived by applying stochastic geometry in [12].
- Radio Resource Allocation and Optimization
There are mainly three approaches to deal with the delay-aware RRAO problem: equivalent rate constraint, Lyapunov optimization, and Markov decision processes (MDPs).
In [13], a hybrid coordinated multi-point transmission (H-CoMP) scheme is presented for downlink transmission in frontal constrained C-RANs, which fulfills the flexible tradeoff between large-scale cooperation processing gain and frontal consumption.
- Challenging Work and Open Issues
A. C-RANs with SDN
B. C-RANs with NFV
C. C-RANs with Inter-Connected RRHs

[2] “Joint Power Control and Fronthaul Rate Allocation for Throughput Maximization in OFDMA-based Cloud Radio Access Network”

[3] “Joint Precoding and Multivariate Backhaul Compression for the Downlink of Cloud Radio Access Networks”

[4] “Robust and Efficient Distributed Compression for Cloud Radio Access Networks”

[5] “Joint Decompression and Decoding for Cloud Radio Access Networks”

[6] “Performance Evaluation of Multiterminal Backhaul Compression for Cloud Radio”

[7] “Inter-Cluster Design of Precoding and Fronthaul Compression for Cloud Radio Access Networks”
Compared inter-cluster with intra-cluster.

[8] “Hybrid Compression and Message-Sharing Strategy for the Downlink Cloud Radio-Access Network”

2015年3月19日星期四

Reading List - 2015.3.18

“Fronthaul Compression for Cloud Radio Access Networks”

A survey of the work in the area of fronthaul compression with emphasis on advanced signal processing solutions based on network information theoretic concepts.

- Multiterminal compression
Uplink: The key technique is distributed compression or Wyner-Ziv coding[10].
Downlink: multivariate compression [9 Ch.9].
- Structured coding
compute-and-forward[11]

Uplink:

- Point-to-Point Fronthaul Compression

- Distributed Fronthaul Compression
First proposed in [17].
sequential decompression [9, Ch. 10] [18] [19].
Wyner-Ziv compression.
channel decoding algorithms: message passing or trellis search[10].
optimization problem of this scheme: block-coordinate optimization approach and leverages a key result in [20].

- Compute-and-forword [11]
- Multihop Fronthaul Topology [22]

Downlink

- Point-to-Point Fronthaul Compression
- Multivariate Fronthaul Compression
- Compute-and-forward [24]
- "dirty paper" nonlinear precoding [25]
- Performance Evaluation
cell-edge throughput versus the average per-UE spectral efficiency [8, Fig.5].

2015年3月14日星期六

Reading List 2015.3.14

“Gradient Descent for Unconstrained Smooth Minimization - Quanming Yao [1]”

1. Rudiments
- Taylor's Theorem
- Lipschitz Constant
- Convex and Strong Convex
- Hessian. Condition Number & Bound on Hessian
- Class of Differential Functions

2. Gradient Descent
- Gradient as General Descent Method:
choose the step size to ensure convergence to a stationary point of f(x).
- Gradient Descent under Strong Convex
convergent rate: 1-(1/k)
- Gradient Descent under Weak Convex
convergent rate: A0/(cA0k + 1)

3. Message from Quadratic Programming
first-order gradient descent: O(1/k) rate for weak convex; O(1-1/k)^k for strong convex.
- Heavy ball. Need to know function's parameter.
- Conjugate Gradient. Avoid the knowledge of function parameters.

4. Accelerated Gradient Descent
- Weak Convex
- Strong Convex

5. Newton Type Method

Proof of Lemma 1.2 in [1] referring to Lecture 2.

Very difficult for me to understand everything in the paper now.

2015年3月12日星期四

Reading List 2015.3.12

“Alternative Distributed Algorithms for Network Utility Maximization”

Decomposition techniques: primal decomposition & dual decomposition methods

subproblems (separable) & master problem (update coupling variable)

Solve coupling variable: primal method
Solve coupling constraint: dual method

- Direct Primal and Direct Dual Decompositions
- Indirect Primal and Indirect Dual Decompositions (transform coupling constraint into coupling variable)
- Multilevel Primal and Dual Decompositions
In problem (17): two sets of constraints (similar to my problem). dual-primal / dual-dual decomposition
- Gradient/Subgradient Methods
choices of stepsize[33][34][36].
- Standard Dual-Based Algorithm for Basic NUM (Network Utility Maximization)

Application:
- Power-Constrained Rate Allocation
- QoS Rate Allocation
- Hybrid Rate-Based and Price-Based Rate Allocation
- Multipath-Routing Rate Allocation

Reading List 2015.3.11

“Distributed Methods for Constrained Nonconvex Multi-Agent Optimization - Part I: Theory”

Comparison of some methods:

1) Feasible Sequential Quadratic Programming (FSQP) methods [2]; --- maintain feasibility but centralized.

2) Parallel Variable Distribution (PVD) methods [3]-[5]; --- parallel but an amount of info exchange/knowledge & convergence only for convex or non convex but block separable constraints.

3) SCA algorithms [6]-[11].

--- [6][7][11]: centralized; [8]-[10]: distributed methods but convex and separable constraints.

2015年3月11日星期三

Reading List 2015.3.11 - Parallel variable distribution

“Parallel variable distribution”

- Unconstrained parallel variable distribution;

- PVD with block separable constraints;

- PVD with general constraints: min f(x) such that g(x) <= 0;

Handling inseparable constraints: exterior penalty[8], augmented Lagrangian methods[17], [3]. Avoid both of difficulties of above: the dual differentiable exact penalty function[10].

“Parallel variable distribution for constrained optimization”

In parallel algorithms, an iteration consists of two steps: parallelization & synchronization[2].

Some methods: Block-Jacobi[2], updated conjugate subspaces[10], coordinate descent[21], parallel gradient distribution[14], PVD.

- Nonconvex separable constraints
- Convex inseparable constraints

“On the Convergence of Constrained Parallel Variable Distribution Algorithms”

Also mentioned some methods: block Jacobi[2], coordinate descent[26], parallel gradient distribution algorithms[16].
Mainly prove the convergence of optimization problems with general convex constraints.

订阅：博文 (Atom)