CHEN Xiaoyu's blog: 2017

2017年12月28日星期四

nginx反向代理之后flask无法获取真实访问ip

flask.request.remote_addr一直都是127.0.0.1

解决：https://www.jianshu.com/p/98bc849ef01a

proxy_set_header Host $host:8080;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header REMOTE-HOST $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;

以上的nginx配置我已经配置了的，将获取对端地址的代码修改为flask.request.headers['X-Real-Ip']即可。

所以代码如下所示：
try:
url_param['ip'] = flask.request.headers['X-Real-Ip']
except KeyError:
url_param['ip'] = flask.request.remote_addr

2017年12月19日星期二

mac使用bazel编译报错

TypeError: add() got an unexpected keyword argument 'replace'

原因：setuptools版本过低导致。

pip install --upgrade setuptools --user python

但pip list查看setuptools还是旧版本。然后顺便升级了一下mac系统，再发现setuptools版本正常了，可能是旧版本的mac系统的问题。

更新homebrew

brew update --verbose
如果遇到权限问题：sudo chown -R $(whoami) /usr/local
如果卡死不动，是github的下载速度或者连不上的问题。
更新完成后根据提示还原/usr/local的默认权限：sudo chown root:wheel /usr/local

brew upgrade xxx

其他命令参考：https://segmentfault.com/a/1190000004353419

2017年12月12日星期二

ubuntu安装python3.6

ubuntu14.04/16.04安装python3.6：
http://ubuntuhandbook.org/index.php/2017/07/install-python-3-6-1-in-ubuntu-16-04-lts/

sudo add-apt-repository ppa:jonathonf/python-3.6
sudo apt-get update
sudo apt-get install python3.6

sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.5 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 2
sudo update-alternatives --config python3

2017年12月8日星期五

可用性和可靠性区别

可用性根据系统正常运行时间 / 总时间来度量。

可靠性是指系统可以无故障地持续运行，根据时间间隔来度量。一个例子：如果系统在每小时崩溃1ms，那么它的可用性就超过99.9999%，但是它还是高度不可靠。

2017年11月29日星期三

centos防火墙

centos增加防火墙规则：
getenforce
firewall-cmd --state
firewall-cmd --add-service mongodb
firewall-cmd --add-port 27017/tcp
firewall-cmd --add-port 27017/tcp --permanent

2017年11月17日星期五

tensorflow serving的问题

用1.2.0或者以上tensorflow版本训练导出的模型，使用serving 0.6.0版本编译出的tensorflow_model_server有问题，如果代码中有用到beam search decode时运行tensorflow_model_server加载模型会报错：Not found: Op type not registered 'GatherTree'。

但是通过apt-get install的1.3.0或者1.4.0版本的tensorflow_model_server，在我的台式机上加载几十个模型就卡死，应该是个bug。用我之前旧版本编译出来的model_server就没有这种问题。

最早的时候我用的是 636e05b2d90feb7d868e29e23861e0a530e51682 到这个commit的master分支的代码编译出来的不会出现上述两种问题，注意在编译的时候加上编译参数：bazel build -c opt --copt=-msse4.1 --copt=-msse4.2 --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-O3 tensorflow_serving/model_servers:tensorflow_model_server
否则编译出来的tensorflow_model_server计算特别慢，cpu占用也很高。

2017年11月9日星期四

linux将用户从组中移除

gpasswd -d username groupname

2017年11月8日星期三

linux shell中获取正在执行脚本的路径

script_path=$(cd `dirname $0`; pwd)
cd ${script_path}
显示的就是正在执行的脚本所在的文件夹的绝对路径

2017年11月2日星期四

pyspark sql使用udf后yarn模式运行卡住

https://stackoverflow.com/questions/35157322/spark-dataframe-in-python-execution-stuck-when-using-udfs

我们用的pyspark 2.1.0版本，udf还有各种各样的问题，而且性能很差，只能转成rdd再做操作？

pyspark sql user defined function

参考https://docs.databricks.com/spark/latest/spark-sql/udf-in-python.html

举例：
>>> a = [{'a': 'a', 'b': 1}, {'a': 'aa', 'b': 2}]
>>> df = spark.createDataFrame(a)
[Row(a=u'a', b=1), Row(a=u'aa', b=2)]
>>> def func(str):
... return len(str) > 1
...
>>> from pyspark.sql.functions import udf
>>> from pyspark.sql.types import BooleanType
>>> func_udf = udf(func, BooleanType())
>>> df2 = df.filter(func_udf(df['a']))
>>> df2.collect()
[Row(a=u'aa', b=2)]

2017年11月1日星期三

解决github clone速度特别慢

参考http://www.jianshu.com/p/5e74b1042b70

vim ~/.gitconfig，添加：
[http]
proxy = socks5://127.0.0.1:8080
[https]
proxy = socks5://127.0.0.1:8080

使用 ssh -D 127.0.0.1:8080 username@服务器名命令开启sock5端口转发。

2017年10月31日星期二

spark合并小文件

spark使用FileUtil.copyMerge来进行小文件合并：https://hadoop.apache.org/docs/r2.7.1/api/org/apache/hadoop/fs/FileUtil.html

pyspark中dataframe union的一个问题

>>> a = [{'a': 1, 'b': 2}]
>>> x = spark.createDataFrame(a)
>>> b = sc.parallelize([(3, 4)])
>>> y = spark.createDataFrame(b, ['b', 'a'])
>>> x.collect()
[Row(a=1, b=2)]
>>> y.collect()
[Row(b=3, a=4)]
>>> z = x.union(y)
>>> z.collect()
[Row(a=1, b=2), Row(a=3, b=4)]

正确的结果应该是[Row(a=1, b=2), Row(a=4, b=3)]，但实际输出的结果第二个Row的a和b反了。猜测DataFrame的union是按照顺序来的，并不是按照column的名称对应的。

Also as standard in SQL, this function resolves columns by position (not by name). Spark 2.3提供了unionByName可以解决问题，目前解决办法是把x与y的字段名排序要一样才行。

2017年10月30日星期一

spark读写mongodb的一个问题

从mongodb的某个collection中读取了df，做了一些操作后又overwrite写回该collection会有问题。因为在写的时候才action，猜测可能因为分布式的同时读写造成的问题。

问题确认：
将df cache，在回写之前先做一次action，让结果缓存到内存，然后再写mongo没有问题。

解决：
从一个collection读，写到另一个 collection

pyspark中判断DataFrame是否为空

if df.head() is not None:

xxx

mongodb启动问题

在配置中修改了dbPath，无法启动，需要将修改的dbPath的用户和组设置为mongodb:mongodb即可。

还有可能在日志里面发现socket permission的问题，删除/tmp/mongodb-27017.sock后再sudo service mongod start

2017年10月29日星期日

npm install 速度慢

换为国内镜像：
npm install --registry=http://registry.npm.taobao.org

永久设置：
npm config set registry http://registry.npm.taobao.org

ubuntu下E: Unable to locate package问题

刚安装了ubuntu 16.04系统，sudo apt install tmux时候遇到E: Unable to locate package tmux的问题，解决：
sudo apt-get update

2017年10月27日星期五

tensorflow serving 运算特别慢

之前从源码编译的方式安装了tensorflow serving，但是部署到线上发现特别吃cpu，并且速度很慢。根据https://github.com/tensorflow/serving/issues/456，猜测应该是编译选项设置的问题。

后来换成了通过apt-get安装方式的二进制文件，问题解决。

spark 2.1.0 from_json使用中的问题

对于以下代码，spark2.2.0运行正常：
import json
from pyspark.sql import functions as f
from pyspark.sql.types import ArrayType, DoubleType, StringType, StructField, StructType
from pyspark.sql.functions import from_json

def func(value, score):
values = {}
for i in range(len(value)):
if value[i] in values:
values[value[i]] = values[value[i]] + score[i]
else:
values[value[i]] = score[i]
res = []
for k, v in values.items():
res.append({'value': k, 'score': v})
return json.dumps(res, ensure_ascii=False)

x = [{'user' : '86209203000295', 'domain' : 'music', 'subdomain' : 'artist', 'value' : 'xxx', 'score' : 0.8, 'ts' : '1508737410941'}, {'user' : '86209203000295', 'domain' : 'music', 'subdomain' : 'artist', 'value' : 'yyy', 'score' : 0.9, 'ts' : '1508737410941'}, {'user' : '86209203000685', 'domain' : 'music', 'subdomain' : 'artist', 'value' : 'zzz', 'score' : 0.8, 'ts' : '1508717416320'}]
df = spark.createDataFrame(x)
df = df.groupBy(df['user'], df['domain'], df['subdomain']).agg(f.collect_list(df['value']).alias('value'), f.collect_list(df['score']).alias('score'))
df = df.select(df['user'], df['domain'], df['subdomain'], f.UserDefinedFunction(func, StringType())(df['value'], df['score']).alias('values'))
df.collect()
schema = ArrayType(StructType([StructField('value', StringType()), StructField('score', DoubleType())]))
df = df.select(df['user'], df['domain'], df['subdomain'], from_json(df['values'], schema).alias('values'))
df.collect()

但是spark2.1.0运行报错：java.lang.ClassCastException: org.apache.spark.sql.types.ArrayType cannot be cast to org.apache.spark.sql.types.StructType

这个问题比较坑，2.1.0不支持ArrayType。

2017年10月13日星期五

intellij添加python包

File -> Project Sturcture... 中先看下左侧的Modules中有没有添加Python的module，没有则先添加python module。

然后再点左侧的Global Libraries，添加需要的python source包目录即可。

例如，我进行pyspark开发，将spark-2.2.0-bin-hadoop2.7/python添加到source中，这样项目中就可以跳转到pyspark的代码并且有代码提示了。

2017年10月9日星期一

linux命令后台运行

http://www.cnblogs.com/lwm-1988/archive/2011/08/20/2147299.html

一、在Linux中，如果要让进程在后台运行，一般情况下，我们在命令后面加上&即可，实际上，这样是将命令放入到一个作业队列中了：
$ ./test.sh &
[1] 17208
$ jobs -l
[1]+ 17208 Running ./test.sh &

二、对于已经在前台执行的命令，也可以重新放到后台执行，首先按ctrl+z暂停已经运行的进程，然后使用bg命令将停止的作业放到后台运行：
$ ./test.sh
[1]+ Stopped ./test.sh
$ bg %1
[1]+ ./test.sh &
$ jobs -l
[1]+ 22794 Running ./test.sh &

三、但是如上方到后台执行的进程，其父进程还是当前终端shell的进程，而一旦父进程退出，则会发送hangup信号给所有子进程，子进程收到hangup以后也会退出。如果我们要在退出shell的时候继续运行进程，则需要使用nohup忽略hangup信号，或者setsid将将父进程设为init进程(进程号为1)
$ echo $$
21734
$ nohup ./test.sh &
[1] 29016
$ ps -ef | grep test
515 29710 21734 0 11:47 pts/12 00:00:00 /bin/sh ./test.sh
515 29713 21734 0 11:47 pts/12 00:00:00 grep test

$ setsid ./test.sh &
[1] 409
$ ps -ef | grep test
515 410 1 0 11:49 ? 00:00:00 /bin/sh ./test.sh
515 413 21734 0 11:49 pts/12 00:00:00 grep test

四、上面的试验演示了使用nohup/setsid加上&使进程在后台运行，同时不受当前shell退出的影响。那么对于已经在后台运行的进程，该怎么办呢？可以使用disown命令（效果与setid相同，但是disown后无法通过jobs命令查看了）：
$ ./test.sh &
[1] 2539
$ jobs -l
[1]+ 2539 Running ./test.sh &
$ disown -h %1
$ ps -ef | grep test
515 410 1 0 11:49 ? 00:00:00 /bin/sh ./test.sh
515 2542 21734 0 11:52 pts/12 00:00:00 grep test

五、另外还有一种方法，即使将进程在一个subshell中执行，其实这和setsid异曲同工。方法很简单，将命令用括号() 括起来即可：
$ (./test.sh &)
$ ps -ef | grep test
515 410 1 0 11:49 ? 00:00:00 /bin/sh ./test.sh
515 12483 21734 0 11:59 pts/12 00:00:00 grep test

linux中&、jobs、fg、bg等命令的使用方法

http://blog.sina.com.cn/s/blog_673ee2b50100iywr.html

一. & 最经常被用到
这个用在一个命令的最后，可以把这个命令放到后台执行
二. ctrl + z
可以将一个正在前台执行的命令放到后台，并且暂停
三. jobs
查看当前有多少在后台运行的命令
四. fg
将后台中的命令调至前台继续运行
如果后台中有多个命令，可以用 fg %jobnumber将选中的命令调出，%jobnumber是通过jobs命令查到的后台正在执行的命令的序号(不是pid)
五. bg
将一个在后台暂停的命令，变成继续执行
如果后台中有多个命令，可以用bg %jobnumber将选中的命令调出，%jobnumber是通过jobs命令查到的后台正在执行的命令的序号(不是pid)

2017年9月30日星期六

bazel build ... experimental文件夹不生效

bazel build ... 发现experimental中的没有编出来，猜测应该是bazel在递归查找的时候自动过滤了experimental的文件夹。

cd experimental
bazel build ...
进入experimental目录之后编是可以的。

2017年9月25日星期一

vim中200e unicode字符

参考https://unix.stackexchange.com/questions/59447/replace-unicode-chars-in-vim

先按ctrl-v，然后输入u200e可以输入<200e>字符。

:help i_CTRL-V_digit 可以查看一些帮助。

2017年9月21日星期四

mac安装node之后运行npm出错

module.js:341
throw err;
^

Error: Cannot find module 'npmlog'
at Function.Module._resolveFilename (module.js:339:15)
at Function.Module._load (module.js:290:25)
at Module.require (module.js:367:17)
at require (internal/module.js:16:19)
at /usr/local/lib/node_modules/npm/bin/npm-cli.js:20:13
at Object.<anonymous> (/usr/local/lib/node_modules/npm/bin/npm-cli.js:76:3)
at Module._compile (module.js:413:34)
at Object.Module._extensions..js (module.js:422:10)
at Module.load (module.js:357:32)
at Function.Module._load (module.js:314:12)

因为之前卸载node没有卸载干净，解决方法：brew uninstall --force node
rm -rf /usr/local/lib/node_modules
rm /usr/local/bin/npm
brew install node

https://stackoverflow.com/a/39504056/5685754

pip install pyfst失败

Command "/usr/bin/python -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-etZdYk/pyfst/setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record /tmp/pip-9_TCLO-record/install-record.txt --single-version-externally-managed --compile" failed with error code 1 in /tmp/pip-build-etZdYk/pyfst/

解决：
有人说export CFLAGS="-std=c++11"，试了下，没用。

最后发现是版本问题，我安装的openfst是1.6.3版本，pyfst只支持到了1.3.3版本，再往上版本不兼容。

附，安装openfst：
./configure
make
sudo make install

安装完openfst运行时错误

fstinfo: error while loading shared libraries: libfstscript.so.8: cannot open shared object file: No such file or directory

解决：
在~/.bashrc中添加：
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/usr/local/lib

2017年9月20日星期三

递归修改文件夹下所有文件的fileencoding和fileformat

修改文件编码和文件格式，脚本如下：
#!/usr/bin/env bash

function walk()
{
for file in `ls $1`
do
local path=$1"/"$file
if [ -d $path ]
then
echo "DIR $path"
walk $path
else
echo "FILE $path"
vi +":set fileencoding=utf-8" +":set fileformat=unix" +":wq" $path
fi
done
}

if [ $# -ne 1 ]
then
echo "USAGE: $0 TOP_DIR"
else
walk $1
fi

vim的一些配置

vim ~/.vimrc，
保存后自动删除行尾空格：
autocmd BufWritePre * :%s/\s\+$//e

显示行尾空格：
highlight WhitespaceEOL ctermbg=red guibg=red
match WhitespaceEOL /\s\+$/

显示tab键：
set list
set listchars=tab:>-,trail:-

设置缩进：
set autoindent
set shiftwidth=2
set cindent

tab转为空格：
set expandtab
set tabstop=2

显示高亮：
syntax on

yarn的几个命令

杀掉任务：
yarn application -list
yarn application -kill <Application ID>

查看log：
yarn logs -applicationId application_1451022530184_0001

2017年9月19日星期二

linux shell每隔一段时间保存cpu和memory使用情况到文件

#!/usr/bin/env bash

while true
do
top -n 1 -b | grep -E "PID|java" > cpu_mem$(date -d "today" +"%Y%m%d_%H%M%S").txt
sleep 300
done

该脚本会保存所有的java进程cpu和内存使用情况。

2017年9月18日星期一

linux下压缩文件

tar
tar chf bazel-genfiles/data/uploader/uploader.tar -C bazel-out/host/bin/data/uploader uploader uploader.runfiles
其中，-C是切换工作目录。

zip没有tar的-C功能，可以通过下面的trick实现：
(cd bazel-out/host/bin/data/uploader && zip -qr - uploader uploader.runfiles) > bazel-out/local-fastbuild/genfiles/data/uploader/uploader.zip

2017年9月14日星期四

crontab修改默认编辑器

select-editor，选择3，使用vim

2017年9月13日星期三

spring boot结合hibernate使用中的一些问题

org.hibernate.MappingException: composite-id class must implement Serializable
如果指定了多个@Id，这个类必须implement Serializable

org.xml.sax.SAXException: null:11: Element <defaultCache> does not allow attribute "maxEntriesLocalHeap".
因为ehcache版本过低，需要在maven中单独引入,不能用hibernate默认的版本

org.hibernate.HibernateException: Could not obtain transaction-synchronized Session for current thread
参考https://stackoverflow.com/questions/26203446/spring-hibernate-could-not-obtain-transaction-synchronized-session-for-current
必须要指定@EnableTransactionManagement

获取spark-submit --files的文件

参考https://community.hortonworks.com/questions/9265/how-can-i-add-configuration-files-to-a-spark-job-r.html

If you add your external files using "spark-submit --files" your files will be uploaded to this HDFS folder: hdfs://your-cluster/user/your-user/.sparkStaging/application_1449220589084_0508

application_1449220589084_0508 is an example of yarn application ID!

1. find the spark staging directory by below code: (but you need to have the hdfs uri and your username)

System.getenv("SPARK_YARN_STAGING_DIR"); --> .sparkStaging/application_1449220589084_0508

2. find the complete comma separated file paths by using:

System.getenv("SPARK_YARN_CACHE_FILES"); --> hdfs://yourcluster/user/hdfs/.sparkStaging/application_1449220589084_0508/spark-assembly-1.4.1.2.3.2.0-2950-hadoop2.7.1.2.3.2.0-2950.jar#__spark__.jar,hdfs://yourcluster/user/hdfs/.sparkStaging/application_1449220589084_0508/your-spark-job.jar#__app__.jar,hdfs://yourcluster/user/hdfs/.sparkStaging/application_1449220589084_0508/test_file.txt#test_file.txt

我的总结（以--files README.md为例）：
方法1：按照上面所说，--files会把文件上传到hdfs的.sparkStagin/applicationId目录下，使用上面说的方法先获取到hdfs对应的这个目录，然后访问hdfs的这个文件。
spark.read().textFile(System.getenv("SPARK_YARN_STAGING_DIR") + "/README.md")解决。textFile不指定hdfs、file或者去其他前缀的话默认是hdfs://yourcluster/user/your_username下的相对路径。不知道是不是我使用的集群是这样设置的。

方法2：
SparkFiles.get(filePath)，我获取的结果是：/hadoop/yarn/local/usercache/research/appcache/application_1504461219213_9796/spark-c39002ee-01a4-435f-8682-2ba5950de230/userFiles-e82a7f84-51b1-441a-a5e3-78bf3f4a8828/README.md，不知道为什么，无论本地还是hdfs都没有找到该文件。看了一下，本地是有/hadoop/yarn/local/usercache/research/...目录下的确有README.md。worker和driver的本地README.md路径不一样。
原因：
https://stackoverflow.com/questions/35865320/apache-spark-filenotfoundexception
https://stackoverflow.com/questions/41677897/how-to-get-path-to-the-uploaded-file
SparkFiles.get()获取的目录是driver node下的本地目录，所以sc.textFile无法在worker节点访问该目录文件。不能这么用。
"""I think that the main issue is that you are trying to read the file via the textFile method. What is inside the brackets of the textFile method is executed in the driver program. In the worker node only the code tobe run against an RDD is performed. When you type textFile what happens is that in your driver program it is created a RDD object with a trivial associated DAG.But nothing happens in the worker node."""

关于--files和addfile，可以看下这个问题：https://stackoverflow.com/questions/38879478/sparkcontext-addfile-vs-spark-submit-files

cluster模式下本地文件使用addFile是找不到文件的，因为只有本地有，所以必须使用--files上传。

结论：不要使用textFile读取--files或者addFile传来的文件。

SparkFiles.get出现NullPointerException错误

错误代码：
val serFile = SparkFiles.get("myobject.ser")

原因：SparkFiles.get只能在spark算子内使用：
sc.parallelize(1 to 100).map { i => SparkFiles.get("my.file") }.collect()

2017年9月12日星期二

spring boot使用bazel编译运行时无法注入bean

使用到了azure-storage-spring-boot-starter中的bean，所以我增加了@ComponentScan({"com.xxx", "com.microsoft.azure"})，还是无法注入bean。

最后增加了@PropertySource("classpath:application.properties")显示指定application.properties文件（配置文件中有azure storage连接的配置）才解决问题。不知道啥原因，瞎打瞎碰解决了问题。

2017年9月5日星期二

sampled softmax

出自于On Using Very Large Target Vocabulary for Neural Machine Translation这篇paper，主要是解决词表过大训练时间长的问题。

下面这两篇blog讲的比较清楚：
On word embeddings - Part 2: Approximating the Softmax： http://ruder.io/word-embeddings-softmax/index.html#whichapproachtochoose
中文翻译：http://geek.csdn.net/news/detail/135736

关于sampling softmax中重要性采样的论文阅读笔记：http://blog.csdn.net/wangpeng138375/article/details/75151064

2017年8月9日星期三

pip镜像源的配置

参考http://www.jianshu.com/p/785bb1f4700d

临时使用：
pip install pythonModuleName -i https://pypi.douban.com/simple

修改~/.pip/pip.conf：
[global]
index-url = https://pypi.douban.com/simple

pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple

2017年8月7日星期一

ubuntu删除软件包

参考：https://www.linuxdashen.com/debianubuntu%E6%B8%85%E7%90%86%E7%A1%AC%E7%9B%98%E7%A9%BA%E9%97%B4%E7%9A%848%E4%B8%AA%E6%8A%80%E5%B7%A7

删除软件包：
sudo apt-get remove <package-name>
sudo apt-get purge <package-name>
remove是删除软件包，purge是删除配置文件。

查看系统上哪些软件包留下了残余的配置文件：
dpkg -l | grep "^rc"
删除这些软件包：
dpkg --list | grep "^rc" | cut -d " " -f 3 | xargs sudo dpkg --purge

删除孤儿软件包：
sudo apt-get autoremove

2017年8月3日星期四

LD_LIBRARY_PATH

LD_LIBRARY_PATH可以指定so动态库的搜索路径。

今天在服务器上安装luarocks install lzmq时因为没有root权限，我先在另一台和服务器的gcc版本一样的机器上安装了lzmq，把zmq的include和lib文件拷贝到了我自己的zmq目录下，增加了LD_LIBRARY_PATH。安装成功：
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/aifs/users/xyc43/tools/zmq
luarocks install lzmq ZMQ_DIR=/aifs/users/xyc43/tools/zmq

2017年8月1日星期二

tensorflow-gpu import tensorflow的一个问题

ImportError: libcusolver.so.8.0: cannot open shared object file: No such file or directory

是因为没有设置LD_LIBRARY_PATH环境变量，在~/.bashrc中添加export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64，解决。

2017年7月28日星期五

shell文件中source ~/.bashrc不起作用

参考https://askubuntu.com/questions/64387/can-not-successfully-source-bashrc-from-a-shell-script

脚本中source ~/.bashrc不会生效，改用exec bash即可。该方法实际上是进入了一个新的bash，加载了一次~/.bashrc

linux给用户添加sudo权限

增加sudoers文件的写权限：
sudo chmod u+w /etc/sudoers

sudo vim /etc/sudoers 找到 root ALL=(ALL) ALL 在这行下边添加 user_name ALL=(ALL) ALL

除去sudoers文件的写权限：
sudo chmod u-w /etc/sudoers

qsub提交指定环境变量

我的代码里面需要指定PYTHONPATH环境变量，qsub提交时命令：

qsub -v PYTHONPATH=... script.sh

2017年7月20日星期四

centos安装tensorflow serving

https://gist.github.com/jarutis/6c2934705298720ff92a1c10f6a009d4

自己怕安装各种库麻烦，直接运行了上面的部分脚本，除了bazel和tensorflow serving是自己安装之外，其他依赖直接按照脚本顺序运行的。

centos安装bazel

sudo yum install java-1.8.0-openjdk-devel

试了一下从git仓库下载源码安装，不work，只能从release页面下载安装包安装。

下载dist.zip或者对应系统版本的sh脚本，安装。

2017年7月13日星期四

linux文件恢复工具extundelete

手抖删除了文件夹，慌忙中装了extundelete恢复了文件，有惊无险。删除要谨慎，文件要备份。

2017年6月8日星期四

ssh通过跳板机来登录集群机器，配置免密码登录的方法

1. 在本机生成RSA公钥私钥

ssh-keygen

一直回车，会在~/.ssh/目录下生成id_rsa、id_rsa.pub两个文件

2. 拷贝id_rsa.pub文件中的内容到跳板机和你需要登录的机器的~/.ssh/authorized_keys文件中

3. 在本机~/.ssh/config中增加内容（注释要删除）：

Host jumpserver //跳板机别名

HostName jumpserver_ip //跳板机地址

User your_username //用户名

Port xxx //跳板机ssh端口

Host hostname //需要登录的服务器别名

HostName x.x.x.x //服务器地址

User your_username //服务器的用户名

ProxyCommand ssh -i ~/.ssh/id_rsa -q -W %h:%p jumpserver //跳板机PoryxCommand

在本机ssh hostname，即可登录。

luarocks install cunn报错

Error: Failed installing dependency: https://raw.githubusercontent.com/torch/rocks/master/cutorch-scm-1.rockspec - Build error: Failed building.
和这个issue同样的问题：https://github.com/torch/cunn/issues/113

因为没有安装cuda，参考https://github.com/facebook/fbcunn/blob/master/INSTALL.md#install-cuda

2017年6月2日星期五

docker ps查看完整的command

参考https://stackoverflow.com/questions/27380641/see-full-command-of-running-stopped-container-in-docker

docker ps -a --no-trunc

vim中合并行

命令J可以将本行和下一行合并为一行，同时用空格分隔这两行。我们也可以使用数字前缀来合并多行。例如3J会将当前行及之后的三行合并为一行。

通过设置joinspace选项，可以控制合并两行时的分隔符（如果一行是以标点符号来结尾）。如果设置:set nojoinspaces，用J命令合并两行时会用一个空格来分隔；如果设置:set joinspaces，用J命令合并两行时会用两个空格来分隔。如果不希望用空格来分隔合并的行，可以使用gJ命令。

参考http://yyq123.blogspot.hk/2010/07/vim-line-feed.html

2017年5月31日星期三

chmod递归分设置文件和文件夹权限

find your_path -type f -exec chmod 644 {} \;
find your_path -type d -exec chmod 755 {} \;

2017年5月25日星期四

ubuntu中使用intellij时ctrl+alt+b快捷键冲突

因为装了搜狗输入法，设置 -> 高级 -> 打开Fcitx设置，Global Config中Show Advance Option，找到Switching Vitual Keyboard，点击ctrl+alt+b，按esc取消快捷键设置，完成。

mac中brew link Error

Error: Cowardly refusing to 'sudo brew link'
You can use brew with sudo, but only if the brew executable is owned by root.
However, this is both not recommended and completely unsupported so do so at
your own risk.

ll /usr/local/bin | grep brew
-rwxr-xr-x 1 chenxiaoyu admin 656B Mar 11 2016 brew

将brew用户改为root：
sudo chown root:admin /usr/local/bin/brew
然后运行sudo brew link xxx，解决。

lua标准库

可以参考下http://www.cnblogs.com/apexaddon/articles/1486622.html

其中有简单介绍string.gmatch，string.match，string.gsub

linux中shell变量$#,$*,$@,$?,$$的含义

参考http://c.biancheng.net/cpp/view/2739.html

$0 当前脚本的文件名
$n 传递给脚本或函数的参数。n 是一个数字，表示第几个参数。例如，第一个参数是$1，第二个参数是$2。
$# 传递给脚本或函数的参数个数。
$* 传递给脚本或函数的所有参数。
$@ 传递给脚本或函数的所有参数。被双引号(" ")包含时，与 $* 稍有不同。
$? 上个命令的退出状态，或函数的返回值。
$$ 当前Shell进程ID。对于 Shell 脚本，就是这些脚本所在的进程ID。

$* 和 $@ 的区别：
$* 和 $@ 都表示传递给函数或脚本的所有参数，不被双引号(" ")包含时，都以"$1" "$2" … "$n" 的形式输出所有参数。
但是当它们被双引号(" ")包含时，"$*" 会将所有的参数作为一个整体，以"$1 $2 … $n"的形式输出所有参数；"$@" 会将各个参数分开，以"$1" "$2" … "$n" 的形式输出所有参数。

2017年5月24日星期三

ubuntu管理开机启动项

gnome-session-properties

配置git不用每次输入密码

git config --global credential.helper store

2017年5月17日星期三

lua中pairs和ipairs的区别

pairs遍历所有元素，而ipairs只能遍历从下标1开始的元素，遇到不连续的就停止。

如果对性能有要求尽量使用ipairs

参考：
http://www.cppblog.com/wc250en007/archive/2011/12/16/162203.html
https://moonbingbing.gitbooks.io/openresty-best-practices/lua/for.html

2017年5月16日星期二

源码安装lua错误解决

参考http://www.th7.cn/system/lin/201702/200995.shtml

lua.c:82:31: fatal error: readline/readline.h: No such file or directory

说明缺少libreadline-dev依赖包，添加依赖包：
centos: yum install readline-devel
ubuntu, debian: apt-get install libreadline-dev

ubuntu中安装deb后查找安装位置

dpkg -l | grep xxx，找到软件包名，然后dpkg -L xxx

ubuntu desktop常用快捷键

系统：
锁屏 ctrl+alt+l (因为和intellij快捷键冲突，我在system settings的keyboard -> shortcuts -> system -> lock screen 中修改为了super+l)
最小化 ctrl+alt+0(小键盘)，或者alt+space,n
最大化 ctrl+super+up，或者alt+space,x
还原 ctrl+super+up，或者alt+space,x
显示桌面 ctrl+super+d
关闭程序 alt+f4 / ctrl+q
打开左侧边栏程序 super+数字
复制 ctrl+c / ctrl+insert
粘贴 ctrl+v / shift+insert
同一应用中多窗口切换 alt + ~

终端：
打开终端 ctrl+alt+t
新建标签页 ctrl+shift+t
关闭标签页 ctrl+shift+w
终端中复制 ctrl+shift+c
终端中粘贴 ctrl+shift+v
全屏切换 f11
切换标签页 alt+数字
切换到上个标签 ctrl+pagedown
切换到下个标签 ctrl+pageup
光标移动到开头 ctrl+a
光标移动到结尾 ctrl+e
光标向前移动一个单词 ctrl+left
光标向后移动一个单词 ctrl+right
删除光标前所有字符 ctrl+u
删除光标后所有字符 ctrl+k
向前删除一个单词 ctrl+w
恢复之前删除的内容 ctrl+7
清楚屏幕内容 ctrl+l，或者clear

2017年5月15日星期一

ubuntu修改键位

将大写锁定Caps Lock和左Control键互换，修改~/.Xmodmap文件内容为：
remove control = Control_L
remove control = Caps_Lock
remove lock = Caps_Lock
remove lock = Control_L
keycode 66 = Control_L NoSymbol Control_L
keycode 37 = Caps_Lock NoSymbol Caps_Lock
add control = Control_L
add lock = Caps_Lock

重启即可生效。
缺点：
切换tty时会失效，锁屏后再次登陆会失效。
失效的时候可以使用命令：
xmodmap .Xmodmap
来恢复之前的设置。
参考http://forum.ubuntu.org.cn/viewtopic.php?t=463634

将Caps Lock键修改为Control键，并且Shift+CapsLock为CapsLock。修改~/.Xmodmap文件内容为：
clear lock
clear control
add control = Caps_Lock Control_L Control_R
keycode 66 = Control_L Caps_Lock NoSymbol NoSymbol
参考https://traceflight.github.io/tech/modify-caps-lock-to-ctrl.html

ubuntu16.04 desktop无法安装软件

ubuntu 16.04桌面版安装软件提示This software comes from a 3rd party and may contain non-free components，无法安装。或者用系统自带的software install无法安装。

sudo apt install gdebi
右键安装包，open with GDebi Package Installer

参考https://askubuntu.com/questions/761210/16-04-cannot-install-anything-from-ubuntu-software-center

2017年4月21日星期五

protobuf中set_allocate_和mutable_

都可以用于嵌套的消息使用，mutable_返回对应消息的指针，set_allocate_中传的指针必须是new出来的堆内存而不能是栈内存。

2017年3月29日星期三

clion设置代码格式

设置：
Editor -> Code Style：
Default Options中设置right margin为80。

Editor -> Code Style -> C/C++：
右侧的Set from...，选择导入google code style。
设置好后再做一些微调：
Wrapping and Braces标签页中'switch' statement里面去掉Keep simple cases in one line。
Spaces标签页中Others里面勾上After '*' in declarations和After '&' in declarations，去掉Before '*' in declarations和Before '&' in declarations。

2017年3月24日星期五

Java和Scala容器的转换

http://docs.scala-lang.org/zh-cn/overviews/collections/conversions-between-java-
and-scala-collections

import scala.collection.JavaConversions._

2017年3月19日星期日

5. Longest Palindromic Substring 最长回文子串问题

Manacher算法，复杂度O(n)，贴一下自己的代码：
public class Solution {
public String longestPalindrome(String s) {
List<Character> s2 = new ArrayList<>();
int size = s.length();
int[] p = new int[size * 2 + 2];
s2.add('$');
s2.add('#');
for (int i = 0; i < size; ++i) {
s2.add(s.charAt(i));
s2.add('#');
}
s2.add('@');
int index = 0;
int max = 0;
int len2 = s2.size();
for (int i = 1; i < len2 - 1; ++i) {
if (max > i) {
p[i] = Math.min(max - i, p[2 * index - i]);
} else {
p[i] = 1;
}
for (; s2.get(i - p[i]) == s2.get(i + p[i]); p[i] += 1);
if (p[i] + i > max) {
max = p[i] + i;
index = i;
}
}
int res = 0;
int resIndex = 0;
for (int i = 1; i < len2 - 1; ++i) {
if (res < p[i]) {
res = p[i];
resIndex = i;
}
}
StringBuilder sb = new StringBuilder();
for (int i = resIndex - res + 1; i < resIndex + res; ++i) {
if (s2.get(i) != '#') {
sb.append(s2.get(i));
}
}
return sb.toString();
}
}

2017年3月2日星期四

传递java option的-D参数给spark-submit

参考http://stackoverflow.com/questions/28166667/how-to-pass-d-parameter-or-environment-variable-to-spark-job

我使用了com.typesafe.config，需要根据生产环境通过java option参数指定不同的config文件，saprk-submit增加如下选项：
--files your/config/file
--conf "spark.driver.extraJavaOptions=-Dconfig.resource=your_config_file.conf"
--conf "spark.executor.extraJavaOptions=-Dconfig.resource=your_config_file.conf"

在yarn-cluster模式下可行。
尝试了把--conf "spark.driver.extraJavaOptions" 换成了--driver-java-options，yarn-client模式依然出错。有时间再看看是什么问题。

spring data jpa中repository.save中文乱码

mysql编码设置正常，自己使用insert语句中文也显示正常，所以问题在spring。

数据库的url需要加上编码设置，类似：
jdbc:mysql://localhost:3306/springexample?characterEncoding=utf-8

2017年2月28日星期二

scala convert int to long

参考http://stackoverflow.com/questions/19647525/how-to-convert-any-a-number-to-a-long

val number:AnyVal = 10

val l:Long = number.asInstanceOf[Number].longValue

直接用asInstanceOf[Long]会报错。

2017年2月27日星期一

spring boot data jpa: operations allowed after connection closed连接异常

参考http://www.jianshu.com/p/1626d41572f2

#以下为连接池的相关参数配置
spring.datasource.primary.max-idle=10
spring.datasource.primary.max-wait=10000
spring.datasource.primary.min-idle=5
spring.datasource.primary.initial-size=5
spring.datasource.primary.validation-query=SELECT 1
spring.datasource.primary.test-on-borrow=false
spring.datasource.primary.test-while-idle=true
spring.datasource.primary.time-between-eviction-runs-millis=18800

2017年2月26日星期日

spring boot打包错误Unable to find a single main class from the following candidates

因为spring发现了多个类似@SpringBootApplication的入口类，解决方法：
在pom.xml的<properties></properties>中增加：
<start-class>your/main/class</start-class>

参考：http://www.jianshu.com/p/b521f819b06a

2017年2月25日星期六

gitlab-ctl restart或stop时候postgresql timeout问题

ps -ef | grep postgresql
杀掉 runsv postgresql 这个进程和相关的其他进程

gitlab-ctl reconfigure

gitlab check检查

参考了http://g23988.blogspot.hk/2015/07/gitlabtimeout-down-postgresql-0s.html

gitlab-rake gitlab:check

可以根据该命令的输出按照说明fix一些问题。

2017年2月23日星期四

sbt中查看dependency tree

在项目目录下的新建project/plugins.sbt，内容：
addSbtPlugin("net.virtual-void" % "sbt-dependency-graph" % "0.8.2")

运行sbt dependency-graph

git fatal: Cannot update paths and switch to branch 'xxx' at the same time.

参考http://www.liaoxuefeng.com/discuss/001409195939432748a2c9fae3846bc98b3c2a547fa321b000/001443433776805e57345a4967740a1a6adada3cbf6af2c000

先git fetch origin，解决。

2017年2月16日星期四

sbt test ERROR SparkContext: Error initializing SparkContext.

参考http://stackoverflow.com/questions/41887273/sparkcontext-error-running-in-sbt-there-is-already-an-rpcendpoint-called-localb

sbt默认同时运行所有的test，但spark context只能有一个。

在build.sbt中增加：
parallelExecution in Test := false

解决问题。

sbt assembly忽略test

参考http://stackoverflow.com/questions/20131854/sbt-assembly-jar-exclusion

sbt 'set test in assembly := {}' clean assembly

配置sbt镜像

sbt默认的镜像经常下载失败，简单查了一下设置其他镜像的方法：

修改系统环境变量：export SBT_OPTS="-Dsbt.override.build.repos=true"

vim ~/.sbt/repositories，增加：
[repositories]
local
repo2:http://repo2.maven.org/maven2/
ivy-typesafe:http://dl.bintray.com/typesafe/ivy-releases, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]
ivy-sbt-plugin:http://dl.bintray.com/sbt/sbt-plugin-releases/, [organization]/[module]/(scala_[scalaVersion]/)(sbt_[sbtVersion]/)[revision]/[type]s/[artifact](-[classifier]).[ext]

没有深入研究，有空再系统看下。

下面的链接给了一些镜像，可以试下效果：
http://www.jianshu.com/p/c8c48b0b3866
https://gist.github.com/ysrotciv/267f05270a2cfab084c123316c0a82ee

2017年2月13日星期一

Mac中Beyond Compare激活

vim /Applications/Beyond\ Compare.app/Contents/Resources/trial.key，将内容替换为windows版本的key：
--- BEGIN LICENSE KEY ---
H1bJTd2SauPv5Garuaq0Ig43uqq5NJOEw94wxdZTpU-pFB9GmyPk677gJ
vC1Ro6sbAvKR4pVwtxdCfuoZDb6hJ5bVQKqlfihJfSYZt-xVrVU27+0Ja
hFbqTmYskatMTgPyjvv99CF2Te8ec+Ys2SPxyZAF0YwOCNOWmsyqN5y9t
q2Kw2pjoiDs5gIH-uw5U49JzOB6otS7kThBJE-H9A76u4uUvR8DKb+VcB
rWu5qSJGEnbsXNfJdq5L2D8QgRdV-sXHp2A-7j1X2n4WIISvU1V9koIyS
NisHFBTcWJS0sC5BTFwrtfLEE9lEwz2bxHQpWJiu12ZeKpi+7oUSqebX+
--- END LICENSE KEY -----

重启Beyond Compare。

目前该方法已经无效，参考https://gist.github.com/huqi/35f2a0792aef830898ca 中最新的方法。

2017年2月9日星期四

spark运行时java.io.IOException: No space left on device

参考http://stackoverflow.com/questions/25707784/why-does-a-job-fail-with-no-space-left-on-device-but-df-says-otherwise

修改%SPARK_HOME/conf/spark-defaults.conf，增加spark.local.dir SOME/DIR/WHERE/YOU/HAVE/SPACE

Exclude dependency from maven assembly plugin

在<dependency>...</dependency>中增加<scope>provided</scope>即可。

tensorflow serving 0.5.0新特性

tensorflow serving 0.5.0中使用的tensorflow版本为0.12.0。

0.5.0的tensorflow_model_server默认从use_saved_model=false变为了use_saved_model=true，注意这个问题。
SessionBundle已经过时，在后面的tensorflow serving 1.0版本中官方将不再支持。

tensorflow serving server运行时SSE4.1不支持的问题

tensorflow serving 0.5.0版本，编译成功后，运行tensorflow_model_server后报错：
2017-02-08 22:06:49: F external/org_tensorflow/tensorflow/core/platform/cpu_feature_guard.cc:35] The TensorFlow library wa
s compiled to use SSE4.1 instructions, but these aren't available on your machine.
Aborted (core dumped)

查了相关issue：https://groups.google.com/a/tensorflow.org/forum/#!msg/discuss/qCbVWKa4GU0/6PC3x8TtEQAJ
http://stackoverflow.com/questions/41474136/disable-sse4-1-when-compiling-tensorflow/41477681#41477681

解决方法：
删除serving/tensorflow/tensorflow/tensorflow.bzl文件中的if_x86(["-msse4.1"]) + 这行

2017年2月7日星期二

tensorflow serving运行tensorflow/configure的问题

tensorflow serving 0.5.0版本，运行configure时报错：
java.lang.RuntimeException: Unrecoverable error while evaluating node 'REPOSITORY_DIRECTORY:@jpeg' (requested by nodes 'RE
POSITORY:@jpeg')

参考了https://github.com/tensorflow/serving/issues/301

解决：
apt-get install ca-certificates-java
update-ca-certificates -f

如果还没有解决，把bazel升级到0.4.4