Han's Notebook

Semi-Supervised Learning 部分技巧简介

2020-03-21T12:27:42.000Z

最近看了几篇 Semi-Supervised Learning 的文章，感觉要达到 state-of-the-art 的话就是要把几个比较有效的技巧比较好地结合到一起，所以写一篇文章分别介绍一下这些技巧。文章总体的结构内容基于 MixMatch: A Holistic Approach to Semi-Supervised Learning (NeurIPS 2019)，及一些适当的延伸。

我们主要讨论就是 Transductive Learning 的场景，即在训练时 $(X_{label}, Y_{label}, X_{unlabel})$，来预测 $Y_{label}$ 。Inductive Learning 的场景就更像是 supervised learning，即在训练时只用$(X_{label}, Y_{label})$ ，在预测的时候才用到$X_{unlabel}$ 来预测 $Y_{unlabel}$。半监督学习的主要想法就是，认为需要预测的内容的特征（例如分布信息）可以帮助学到一个更好的 classifier。

Entropy Minimization

在 Semi-Supervised Learning 中，一个常见的 Assumption 是，分类器的 decision boundary 不应该穿过数据分布中的 high-density 区域。一种直接的方式就是增加一个 loss term 来直接显式地降低 entropy。

Sharpening

对于unlabeled的数据，可以用 Sharpening 的技巧来隐式地降低Entropy。

$$
\text {Sharpen}\left(p_{i}, T\right):=p_{i}^{\frac{1}{T}} / \sum_{j=1}^{C} p_{j}^{\frac{1}{T}}
$$

其中，$p$ 是 categorical distribution，$T$ 是一个 hyperparameter。当$T \rightarrow 0$ 时， $\text {Sharpen}\left(p_{i}, T\right)$ 就会接近于 Dirac Distribution (one-hot)。

由于 $\text {Sharpen}\left(p_{i}, T\right)$ 会被用作模型对于 unlabeled 数据的预测的 target，所以较低的 $T$ 会使得模型更倾向于输出低 entropy 的结果。

Consistency Regularization

简单来说就是把 data augmentation 的技巧运用到 Semi-Supervised Learning 中来。基于的基本想法就是对于一个sample，即使它被增强过，分类器输出的 class distribution 也应该是不变的。

Regularization with stochastic perturbations

这是最简单直接的方式，增加一个loss来控制不同的 stochastic transformations 对输出带来的影响:

$$
| p_{\text {model }}(y | \text { Augment }(x) ; \theta)-p_{\text {model }}(y | \text { Augment }(x) ; \theta) |_{2}^{2}
$$

Label Guessing

对于一个 unlabeled 的 sample，可以先用模型输出来为这个 sample 猜出一个 label 出来。然后这个猜出来的 label 可以用在 unsupervised loss 中（即 $\mathcal{L}\mathcal{U}=\frac{1}{L\left|\mathcal{U}^\prime \right|} \sum{u, q \in \mathcal{U}^\prime}\left|q - p_{model}(y | u ; \theta)\right|_2^2 $ ， $u$ 是 unlabeled item， $q$ 是 guessing label (distribution)）。

实践中可以用 $K$ 个 augmented 的 $u_b$ 的模型预测的平均作为 guessing label 来增加稳定性：

\[
\mathcal{L}{\mathcal{U}}=\frac{1}{\left|\mathcal{U}^{\prime}\right|} \sum{u, q \in \mathcal{U}^{\prime}}|q-p_{model}(y | u ; \theta)|_{2}^{2}.
\]

也有文章有讨论了 mean-teacher （即 averaging model weights instead of predictions），认为这种方式会更好。

Exponential Moving Average (EMA)

即是对模型输出的 predictions 进行 EMA 的计算来作为新的 target。

$$
Z = \alpha Z + (1-\alpha)z
$$

$$
\tilde{z} = \frac{Z}{1-\alpha^t}
$$

$Z$ 被初始为 $\mathbf{0}_{N\times C}$，$z$ 是每个 epoch 模型对于每个 sample 输出，$t$ 是epoch，$\tilde{z}$是经过 bias correction 的 target vector。

Virtual Adversarial Training (VAT)

Virtual adversarial loss is defined as the robustness of the conditional label distribution around each input data point against local perturbation.

首先 adversarial training 就是在输入中加入一个扰动 $r_{adv}$，使得模型的输出发生尽可能大程度的变化，即：

$$
L_{\mathrm{adv}}\left(x_{l}, \theta\right):=D\left[q\left(y | x_{l}\right), p\left(y | x_{l}+r_{\mathrm{adv}}, \theta\right)\right]
$$

其中

$$
r_{\mathrm{adv}}:=\underset{r ;|r| \leq \epsilon}{\arg \max } D\left[q\left(y | x_{l}\right), p\left(y | x_{l}+r, \theta\right)\right]
$$

$D[q, p]$是描述两个分布之间的 divergence 的非负函数，例如 Cross Entropy。

至于 Virtual Adversarial Training，就是用当前模型的输出 $p(y|x, \hat\theta)$ 来近似数据label的真实概率分布 $q(y|x)$ 。这样就定义了一种 virtual adversarial perturbation. 这样就很容易写出对应的 loss item:

$$
\operatorname{LDS}\left(x_*, \theta\right):=D\left[p\left(y | x_*, \hat{\theta}\right), p\left(y | x_*+r_{\text {vadv }}, \theta\right)\right]
$$

其中 $$r_{\mathrm{vadv}}:=\underset{r ;|r|{2} \leq \epsilon}{\arg \max } D\left[p\left(y | x{*}, \hat{\theta}\right), p\left(y | x_{*}+r\right)\right]$$

$x_*$ 包含了 $x_{label}, x_{unlabel}$.

Generic Regularization

有一些 Regularizaion 的方法就是给模型加上一些 constraint 使它避免“记住”训练数据，从而更好地 generalize 到别的 unseen data。最为常见的一种做法就是给模型参数加上一个 $L_2$-weight-decay。

Mixup

简单来说，mixup 就是构造这样的虚拟training samples:

$$
\tilde{x} = \lambda x_i + (1-\lambda)x_j, \ \tilde{y} = \lambda y_i + (1-\lambda)y_j. \
$$

其中 $x_i, x_j$ 是原始的输入vector，$y_i, y_j$ 是 one-hot label encoding，$\lambda \in [0, 1]$。

把 mixup 应用到 Semi-Supervised Learning 的话，可以把 labeled data 和 unlabeled data 一起 mixup 起来，其中 unlabeled sample 的 $y$ 可以换成 guessing label $q$。另外还可以加一个小trick就是让$\lambda \in [0.5, 1]$，使得虚拟的 sample 可以更靠近真实的数据。这种情况下mixup其实应该属于 Consistency Regularization.

Why $L_2$ loss

用 cross entropy 的时候，需要先用 Softmax 计算出概率，但是如果所有输出值都加上一个常数的话，softmax 的结果是不变的。所以为了让两个向量尽可能相等，$L_2$ 是更为严格的限制。

Warmup of $\lambda$

整体的 loss function 是由监督loss和unlabeled data的loss组合起来的 $L = L_X + \lambda L_U$，所以中间会有一个$\lambda$ 来控制两者的比例。相比于直接将$\lambda$设置为一个常数，一些实验中发现将它从0慢慢 linear warmup 到它的 final value 可以提升最后的分类 accuracy。

References

Notes about GCN Sampling

2019-09-08T12:47:54.000Z

I read several papers about Sampling Algorithm in Graph Convolution Network training last week. So I wrote this note to simply record them.

METHOD	SAMPLING SCHEME	CONFERENCE
GraphSAGE	node-wise	NIPS 17
FastGCN	layer-wise	ICLR 18
StochasticGCN	node-wise	ICML 18
AdaptiveSampling	layer-wise	NIPS 18
ClusterGCN	?node-wise	KDD 19

GraphSAGE

Paper Link

This sampling method is uniform sampling and very easy to understand. It uses a Top-down approach, which means that when it is calculating a node’s output, the algorithm finds the node’s neighbors layer by layer, until the node’s representation vector can be calculated.

In my opinion, this paper was not made for Sampling. It just use a simple and naive uniform sampling trick to avoid too heavy calculation. The main purpose of this paper might just be to make the train of GCN can be processed in a batch manner.

FastGCN

Paper Link

This paper rewrites the Message Passing formula in am integral form, then uses Monte Carlo Sampling to approximate the integral value.

$$
\tilde{h}{t+1}(v)=\int \hat{A}(v, u) h^{(l)}(u) W^{(l)} d P(u), h^{(l+1)}(v)=\sigma\left(\tilde{h}^{(l+1)}(v)\right), \quad l=0, \ldots, M-1 $$
$$
L=\mathrm{E}{v \sim P}\left[g\left(h^{(M)}(v)\right)\right]=\int g\left(h^{(M)}(v)\right) d P(v) $$
$$
\tilde{h}{t{l+1}}^{(l+1)}(v) :=\frac{1}{t_{l}} \sum_{j=1}^{t_{l}} \hat{A}\left(v, u_{j}^{(l)}\right) h_{t_{l}}^{(l)}\left(u_{j}^{(l)}\right) W^{(l)}, h_{t_{l+1}}^{(l+1)}(v) :=\sigma\left(\tilde{h}{t{l+1}}^{(l+1)}(v)\right), \quad l=0, \ldots, M-1
$$

It uses a factor proportional to the degree of the node as the importance sampling factor.

StochasticGCN

Paper Link

The algorithm is not complicated in this paper, but this paper provides many theoretical results and proofs.

The algorithm is that when aggregating neighbors’ features/activations, only a few of neighbors will actually computes their activations, while the others will use the historical activations as approximation.

It can be easily derived that in the end when the message passing reaches a stationary point, the sampling variance will be eliminated to zero.

Along with the insight, the paper also provides the convergence guarantee and variance analysis. It has high theoretical value.

Adaptive Sampling

Paper Link

This paper follows the layer-wise style as FastGCN. It rewrites the message passing formula as

$$
\begin{array}{c}{h^{(l+1)}\left(v_{i}\right)=\sigma_{W^{(l)}}\left(N\left(v_{i}\right) \mathbb{E}{q\left(u{j} | v_{1}, \cdots, v_{n}\right)}\left[\frac{p\left(u_{j} | v_{i}\right)}{q\left(u_{j} | v_{1}, \cdots, v_{n}\right)} h^{(l)}\left(u_{j}\right)\right]\right)} \ {h^{(l+1)}\left(v_{i}\right)=\sigma_{W^{(l)}}\left(N\left(v_{i}\right) \hat{\mu}{q}\left(v{i}\right)\right)} \ {\hat{\mu}{q}\left(v{i}\right)=\frac{1}{n} \sum_{j=1}^{n} \frac{p\left(\hat{u}{j} | v{1}, \cdots, v_{n}\right)}{q\left(\hat{u}{j} | v{1}, \cdots, v_{n}\right)} h^{(l)}\left(\hat{u}{j}\right), \quad \hat{u}{j} \sim q\left(\hat{u}{j} | v{1}, \cdots, v_{n}\right)}\end{array}
$$.

So the important thing is to model the $q\left(u_{j} | v_{1}, \cdots, v_{n}\right)$.

In order to minimize the sampling variance, the optimal $q(u_j)$ can be modeled as

$$
\qquad \operatorname{Var}{q}\left(\hat{\mu}{q}\left(v_{i}\right)\right)=\frac{1}{n} \mathbb{E}{q\left(u{j}\right)}\left[\frac{\left(p\left(u_{j} | v_{i}\right)\left|h^{(l)}\left(u_{j}\right)\right|-\mu_{q}\left(v_{i}\right) q\left(u_{j}\right)\right)^{2}}{q^{2}\left(u_{j}\right)}\right] \ \qquad q^{*}\left(u_{j}\right)=\frac{p\left(u_{j} | v_{i}\right)\left|h^{(l)}\left(u_{j}\right)\right|}{\sum_{j=1}^{N} p\left(u_{j} | v_{i}\right)\left|h^{(l)}\left(u_{j}\right)\right|}
$$

But the $q(u_j)$ can’t be calculated before the layer was constructed. So the paper proposes to approximate it as

$$
q^{*}\left(u_{j}\right)=\frac{p\left(u_{j} | v_{i}\right)\left|g\left(x\left(u_{j}\right)\right)\right|}{\sum_{j=1}^{N} p\left(u_{j} | v_{i}\right)\left|g\left(x\left(u_{j}\right)\right)\right|}
$$
.

So the $q(u_i)$ can be calculated as
$$
q^{*}\left(u_{j}\right)=\frac{\sum_{i=1}^{n} p\left(u_{j} | v_{i}\right)\left|g\left(x\left(u_{j}\right)\right)\right|}{\sum_{j=1}^{N} \sum_{i=1}^{n} p\left(u_{j} | v_{i}\right)\left|g\left(x\left(v_{j}\right)\right)\right|}
$$
.

This paper doesn’t provide as many theoretical proofs as Stochastic GCN.

Cluster GCN

Paper Link

This paper doesn’t contain any theoretical proof, but reports better results than all previous work, which means it’s algorithm is intuitive but empirically efficient.

The algorithm is easy to follow according to the pseudocode below. No need for explanation.

两道有趣的离散数学题目

2019-01-15T04:25:00.000Z

有两道比较有趣的题目，为了防止忘掉，记录一下。

1 实数集uncountable

这里countable的定义就是与集合里的元素能与自然数集一一对应，比如说偶数集和自然数集有2n和n的对应关系，所以说这两个集合大小相等，都是$\aleph 0$.

这道题目是当年大二时候的离散数学课后习题，最近刚好跟人聊天聊到相关的话题，回忆了一下怎么证明。

这里记录几个简单的结论/题目。

1.1 有理数集countable

法一（直观）：

对于有理数m/n, 按

1/1
1/2 2/1
1/3 2/2 3/1
1/4 2/3 3/2 4/1
...

排列去数，可以与自然数一一对应。

法二：

对于任意一个既约有理数m/n，构造映射$y=2^n3^m$，y是自然数，那么对于不同的m/n，一定有不同的自然数y。所以自然数集小于等于有理数集。

反过来，自然数是有理数的子集，所以自然数集又不大于有理数集。

综上，两集合基数相等，所以有理数集是可数集。

1.2 若集合A, B都countable，则$A \cup B$ countable

一、若$A \subseteq B$ 或者 $A \supseteq B$, 显然。

二、若$A \backslash B \neq \phi$ 且 $B \backslash A \neq \phi$

$$
A \cup B = A \cup (B \backslash A)
$$

$A$ countable, 对应$f:A \rightarrow N$

$B \backslash A$ countable, 对应$g:(B \backslash A) \rightarrow N$.

定义 $h:A \cup B \rightarrow N$:

$$
h(x)=
\begin{cases}
2f(x)& x \in A\
2g(x)+1& x \in B \backslash A
\end{cases}
$$

即可证明 $A \cup B$ countable.

1.3 (0, 1)的无理数uncountable

假设(0,1)的实数countable,

则对于(0,1)的实数集：X {x1,x2,x3,…,xn}

总能找到一个实数H=0.abcdefg….. , 使得

a != x1小数点后第一位

b != x2小数点后第二位

c != x3小数点后第三位

…

由此得出$H \notin X$

产生矛盾, 所以(0,1)的实数集uncountable.

实数=有理数+无理数

有理数countable，所以无理数uncountable.

由(0,1)的实数集uncountable可知实数集uncountable.

2 Stolen Necklace Problem

这道题目来自3Blue1Brown的Sneaky Topology。这里简单总结一下要点。

题目：把一串有n种宝石的项链平分给两个人（每种宝石有偶数个），那么在项链上至多切n刀即可完成。

2.1 Borsuk-Ulam Theorem

Borsuk-Ulam Theorem

简单地拿三维空间里的球体来说，通过一个连续函数将其映射到一个二维平面 $f: R^3 \rightarrow R^2$ ，必然可以找到一对在两极的点(antipodes 对跖点)在映射后是二维平面上的同一个点。$f(x) = f(-x)$

简单证明一下：

构造$g(x)$，

$$ g(x) = f(x) - f(-x) $$

$$ g(x) = -g(-x) $$

所以对于赤道上的点，$g(x)$的图像是围绕原点的一个圈。将赤道这条纬线连续向北极移动，到北极的时候$g(x)$的值是一个点。在这个连续的过程中$g(x)$的图像必然经过原点，这就证明了$g(x)$有零点，原命题得证。

2.2 回到原题目

假设项链总长度为1，切两刀后的三段长度为$x,y,z$。那么
$$ x^2+y^2+z^2=1 $$
意味着每种切法都对应球上一点。
antipodes对应的切法相同，但是分法互换。(e.g. xz给A，y给B 和 y给A，xz给B 这两种分法)
$f(x)=f(-x)$意味着AB两人分得的内容相同，互换后不变。

这样，Borsuk-Ulam Theorem 就证明了 2种宝石的 Stolen Necklace Problem 可以用2刀解决。

Borsuk-Ulam Theorem 和 Stolen Necklace Problem 都可以推广到n.

基于OpenCV 3的柱面全景拼接

2018-06-15T14:54:18.000Z

全景图拼接是利用同一场景的多张图像通过重叠部分寻找匹配关系，从而生成整个场景图像的技术。全景图的拼接方法有很多，如按场景和运动的种类可以分为单视点全景拼接和多视点全景拼接。对于平面场景和只通过相机旋转拍摄的场景来说，可以使用求每两幅图像之间的一个Homography变换来映射到一张图像的方法，还可以使用恢复相机的旋转的方式得到最终的全景图。当相机固定只有水平方向旋转时，也可以使用柱面或球面坐标映射的方式求得全景图。

实验目标

实现一个Panorama类，实现给定一组序列图片和焦距，输出拼接的全景图像的功能。

算法原理

柱面投影

目标

把平面图像投影到柱面上。

原理

$$
x^{‘}=fatan\left(\frac {x-0.5width}{f}\right)+fatan\left( \frac {0.5width}{f}\right)
$$

$$
y^{‘}=\frac {f*(y-0.5height)}{\sqrt {(x-0.5width)^2+f^2}}+0.5*height
$$

特征抽取与匹配

目标

对每两幅相邻的柱面图像进行特征提取和匹配，寻找两幅相邻图像的对应关系。

原理

SIFT 特征是基于物体上的一些局部外观的兴趣点而与影像的大小和旋转无关。对于光线、噪声、些微视角改变的容忍度也相当高。
通过SIFT特征的提取，然后用BruteForceMatch或者KnnMatch可以对SIFT计算出匹配。
用匹配的特征点可以训练出homography。

计算变换，进行拼接

目标

使用得到的匹配关系，求出每两幅柱面图像的平移变换，利用平移变换将所有图像拼接到一起。得到全景图。

原理

通过RANSAC之后的匹配特征点，可以从中计算得出homography。利用这个homography，可以算出图片的变换，利用此变换可以将两幅图像拼接在一起。

代码实现

接口

核心代码是实现如下的接口：

class CylindricalPanorama
{
public:
    virtual bool makePanorama(
        std::vector& img_vec, cv::Mat& img_out, double f
    ) = 0;
};

流程

对列表中所有图片进行柱面投影，并存下来
对于上一次的拼接结果和下一张图片求SIFT特征点
匹配SIFT特征点
计算homography
利用homography进行变换，拼接
重复上述步骤直到用完所有图片，完成全景拼接

这里有一个细节是如果从左往右拼接的话，最好是把左边的图片变换到右边图片的坐标系中，
这样可以方便之后的特征点匹配和homography的计算。

柱面投影

这里通过最近邻插值算法来求柱面图上的点到原图的对应位置，并用此位置的像素值作为此点的像素值。

Mat cylinder(Mat& img, double f) {
    Mat output;
    int cols = (int)2 * f * atan(0.5*img.cols / f);
    int rows = (int)img.rows;
    output.create(rows, cols, CV_8UC3);
    for (int i = 0; i < rows; i++) {
        for (int j = 0; j < cols; j++) {
            int x = (int)(f * tan((float)(j - cols * 0.5) / f) + img.cols*0.5);
            int y = (int)((i - 0.5*rows)*sqrt(pow(x - img.cols*0.5, 2) + f*f) / f + 0.5*img.rows);
            if (0 <= x && x < img.cols && 0 <= y && y < img.rows) {
                output.at(i, j) = img.at(y, x);
            }
            else {
                output.at(i, j) = Vec3b(0, 0, 0);
            }
        }
    }
    return output;
}

特征点提取

这里的SIFT特征点是用OpenCV 3 的写法。

Ptr f2d = xfeatures2d::SIFT::create();
vector kps_0, kps_1;
f2d->detect(img_1, kps_0);
f2d->detect(img_2, kps_1);

Mat descriptors_0, descriptors_1;
f2d->compute(img_1, kps_0, descriptors_0);
f2d->compute(img_2, kps_1, descriptors_1);

特征点的匹配和筛选

其中对于distance过大的点进行了筛选处理，保留比较好的点。


FlannBasedMatcher matcher;
//BFMatcher matcher;
vector matches;
matcher.match(descriptors_0, descriptors_1, matches);
sort(matches.begin(), matches.end());
float min_v = numeric_limits<float>::max();
float max_v = 0;
for (int i = 0; i < matches.size(); ++i) {
    min_v = min(min_v, matches[i].distance);
    max_v = max(max_v, matches[i].distance);
}
vector ps_0, ps_1;
//assert(matches.size() > 500);
cout << "min_v " << min_v << endl;
cout << "max_v " << max_v << endl;
for (int i = 0; isize(); ++i) {
    DMatch m = matches[i];
    if (m.distance > max_v / 2 )continue;
    ps_0.push_back(kps_0[m.queryIdx].pt);
    ps_1.push_back(kps_1[m.trainIdx].pt);
}

计算homography并计算图像扩大行列

利用匹配点来计算出Homography。
并且利用边界点计算出拼接后的图像的大小。


Mat rev_H = findHomography(ps_1, ps_0, RANSAC);
Mat H = findHomography(ps_0, ps_1, RANSAC);

cout << "begin stitcher....  " << i << endl;
vector corners_1(4);
vector corners_2(4);
corners_1[0] = Point2f(0, 0);
corners_1[1] = Point2f((float)img_1.cols, 0);
corners_1[2] = Point2f((float)img_1.cols, (float)img_1.rows);
corners_1[3] = Point2f(0, (float)img_1.rows);

perspectiveTransform(corners_1, corners_2, H);
int down_rows = (int)min(corners_2[0].y, corners_2[1].y);
down_rows = min(0, down_rows) * -1;
int right_cols = (int)min(corners_2[0].x, corners_2[3].x);
right_cols = min(0, right_cols) * -1;

计算变换后的坐标并进行变换，拼接

Mat stitch_img = Mat::zeros(img_2.rows+down_rows, img_2.cols+right_cols, CV_8UC3);
img_2.copyTo(Mat(stitch_img, Rect(right_cols, down_rows, img_2.cols, img_2.rows)));
for (int i = 0; i < stitch_img.rows; ++i) {
    for (int j = 0; j < stitch_img.cols; ++j) {
        if (stitch_img.at(i, j) != Vec3b(0, 0, 0)) {
            continue;
        }
        int x0 = j - right_cols;
        int y0 = i - down_rows;
        vector pix, dst;
        pix.emplace_back(x0, y0);
        perspectiveTransform(pix, dst, rev_H);
        Point2f pos = dst[0];
        //cout << pos << endl;
        int x = (int)floor(pos.x);
        int y = (int)floor(pos.y);
        if (0 < y && y < img_1.rows && 0 < x && x < img_1.cols && img_1.at(y,x) != Vec3b(0,0,0) ) {
            Vec3b c = img_1.at(y, x);
            //if (stitch_img.at(i,j) != Vec3b(0, 0, 0)) { c += (stitch_img.at(i,j)-c)/2; }
            stitch_img.at(i, j) = c;
        }
    }
}
last_result = stitch_img;

实验结果

对于两组图像，拼接得到的结果如下所示

附

完整代码和数据：

panorama

基本滤波器及图像傅里叶变换

2018-03-31T08:10:41.000Z

去年写过手撸BMP文件头，直接操作图像文件的版本，这次因为课程作业，刚好就用OpenCV重新实现一遍。

实验内容

实现盒状均值滤波
实现高斯滤波
实现中值滤波
实现简单的双边滤波
利用傅里叶变换完成图像的频域变换

理论及实验细节及效果

均值滤波

原理

均值滤波是一种线性滤波，它的卷积核在3x3的kernel中应该是：
$$
\begin{bmatrix}{1/9 ,1/9, 1/9 \1/9 ,1/9, 1/9 \1/9 ,1/9, 1/9 }\end{bmatrix}
$$

总之就是每个像素点滤波后的值是周围像素值的平均，即卷积核的值为$1/(w \times h)$.

实现

先是命令行参数的解析，后面每一个程序的基本结构都如下：

int main(int argc, char** argv) {
if (argc != 5) {
cout << "Input Illegal!" << endl;
}
char* input = argv[1];
char* output = argv[2];
int w = atoi(argv[3]);
int h = atoi(argv[4]);
Mat mat = imread(input);
Mat dst;
myBoxFilter(mat, dst, 3, 3);
imwrite(output, dst);
return 0;
}

BoxFilter实现起来很简单，只需要把卷积核生成好，然后调用filter2D进行计算就可以了。

#include 
#include 
#include 
using namespace std;
using namespace cv;

void myBoxFilter(Mat& src, Mat& dst, int w, int h) {
Mat mask(Size(h, w), CV_32FC1, Scalar(1.0 / w / h));
filter2D(src, dst, src.depth(), mask);
}

结果

原图：

用3x3的均值滤波后的结果：

高斯滤波

原理

通常我们认为图像像素之间的相关性随着距离增加应该不断减弱。均值滤波不能体现这一性质。在对图像进行均值滤波时，如果图像中有一些很显著的亮点，滤波后它的周围会形成光斑。这正是因为均值滤波无视了距离，对很远处的像素依旧采用同样的权重导致的。

因此，采用高斯卷积核，可以用距离来控制周围像素对中心的影响。

高斯卷积核理论上使无穷大的，但是距离远的点对于中心的影响比较小，同时为了方便计算，只取高斯核的中心部分进行卷积运算。

二维的高斯分布公式为：
$$
G(x,y) = \frac{1}{2\pi\sigma^2}e^{-\frac{x^2+y^2}{2\sigma^2}}
$$
可以由此公式构建高斯卷积核。

实现

先实现一个计算高斯分布的函数，此处x2相当于$x^2+y^2$

double G(double x2, double& sigma) {
return (1.0 / 2 / PI / sigma) * exp(-1.0*(x2) / (2.0*sigma*sigma));
}

然后计算出高斯核中每个元素的值，并进行归一化。

此处默认取高斯核大小为5。

void myGaussianFilter(Mat& src, Mat& dst, double sigma) {
int size = 5;
int mid = size / 2;
Mat mask(Size(size, size), CV_32FC1);
for (int i = 0; i < size; ++i) {
for (int j = 0; j < size; ++j) {
mask.at<float>(i, j) = (float)G(pow(i - mid, 2) + pow(j - mid, 2), sigma);
}
}
double s = sum(mask)[0];
mask = mask / s;
filter2D(src, dst, src.depth(), mask);
}

结果

原图：

取sigma=10时的结果：

中值滤波

原理

中值滤波是一种基于统计的非线性滤波。它取周围像素的种植作为当前的像素的值，可以有效地抵抗椒盐噪声的影响。

实现

因为属于非线性滤波，所以无法使用统一的线性卷积核来运算。

对于每一个点，都要取周围的点来排序取中值。

这里，我对于边缘的点采取的处理是仅取卷积核和图像重叠的部分参与运算。

对于每个点的操作是如下代码：

int medianProcess(Mat& src, int y, int x, int w, int h, int ch) {
vector<int> pixels;
int mid_w = w / 2;
int mid_h = h / 2;
Size size = src.size();
for (int i = y - mid_h; i <= y + mid_h; ++i) {
for (int j = x - mid_w; j <= x + mid_w; ++j) {
if (i >= 0 && i < size.height && j >= 0 && j < size.width) {
pixels.push_back(src.at(i, j)[ch]);
}
else {
continue;
}
}
}
sort(pixels.begin(), pixels.end());
return pixels.at(pixels.size() / 2);
}

然后，单独运算每个点的像素值就可以了。

void myMedianFilter(Mat& src, Mat& dst, int w, int h) {
Size size = src.size();
dst.create(size, src.type());
for (int i = 0; i < size.height; ++i) {
for (int j = 0; j < size.width; ++j) {
dst.at(i, j)[0] = medianProcess(src, i, j, w, h, 0);
dst.at(i, j)[1] = medianProcess(src, i, j, w, h, 1);
dst.at(i, j)[2] = medianProcess(src, i, j, w, h, 2);
}
}
}

效果

原图

取w=3,h=3的中值滤波结果：

双边滤波

原理

高斯滤波可以平滑图像，但是会模糊边界。为了可以保边，可以使用双边滤波。

在高斯滤波中，像素之间的欧式距离影响了其对于中心像素的影响。

在双边滤波中，不仅仅考虑欧式距离，还考虑像素值之间的距离的影响。对于像素值相差较远的点，认为此处是边界，所以其对于中心像素的影响较小。

将两个影响因素相乘，就是该像素对于中心像素的影响。最后进行归一化即可。

实现

对于每个点的操作如下：

对于每个点的周围mask范围，计算欧式距离，带入G函数，计算像素距离，带入G函数，作为该点的mask，进行卷积运算。

运算完后做一下归一化，保证影响的factor加起来等于1，使图片不会过亮或者过暗。

double bilateralProcess(Mat& src, int y, int x, int kernel_size, int ch, double sigma_s, double sigma_r) {
Mat mask = Mat(Size(kernel_size, kernel_size), CV_32FC1);
Size size = src.size();
double result = 0;
double factor = 0;
double total_factor = 0;
Vec3b color = src.at(y, x);
for (int i = y - kernel_size / 2; i <= y + kernel_size / 2; ++i) {
for (int j = x - kernel_size / 2; j <= x + kernel_size / 2; ++j) {
if (i >= 0 && i < src.rows && j >= 0 && j < src.cols) {
factor = G(pow(i - y, 2) + pow(j - x, 2), sigma_s) * G(pow(src.at(i, j)[ch] - color[ch], 2), sigma_r);
result += factor * src.at(i, j)[ch];
total_factor += factor;
}
else {
continue;
}
}
}
return result / total_factor;
}

每个点的运算函数写好后，只要对于每个点进行运算即可：

void myBilateralFilter(Mat& src, Mat& dst, int kernel_size, double sigma_s, double sigma_r) {
dst.create(src.size(), src.type());
for (int i = 0; i < src.rows; i++)
{
for (int j = 0; j < src.cols; j++)
{
dst.at(i, j)[0] = (char)bilateralProcess(src, i, j, kernel_size, 0, sigma_s, sigma_r);
dst.at(i, j)[1] = (char)bilateralProcess(src, i, j, kernel_size, 1, sigma_s, sigma_r);
dst.at(i, j)[2] = (char)bilateralProcess(src, i, j, kernel_size, 2, sigma_s, sigma_r);
}
}
}

结果

原图：

取sigma_s = 10, sigma_r = 10的结果：

傅里叶变换完成图像的频域变换

原理

图像的存储是在空间域上的，可以通过傅里叶将其转换到频域并进行可视化。

实现

先把图像扩展到DFT效率最高的尺寸上，

然后用一个两通道的图像矩阵作为dft的结果。

然后调整dft结果的位置.

最后取对数，normalize之后可视化。

比较需要注意的是读入和输出的时候图像的type的转换。

void myDFT(Mat& src, Mat& dst) {
int m = getOptimalDFTSize(src.rows);
int n = getOptimalDFTSize(src.cols);
Mat padded;
copyMakeBorder(src, padded, 0, m - src.rows, 0, n - src.rows, BORDER_CONSTANT, Scalar::all(0));

Mat planes[] = { Mat_<float>(padded), Mat::zeros(padded.size(),CV_32F) };
Mat complexMat;
merge(planes, 2, complexMat);

dft(complexMat, complexMat, DFT_COMPLEX_OUTPUT);

fftshift(complexMat, complexMat);

split(complexMat, planes);
magnitude(planes[0], planes[1], planes[0]);
Mat mag = planes[0];
mag += Scalar::all(1);

log(mag, mag);

normalize(mag, mag, 0, 1, CV_MINMAX);
mag.convertTo(dst, CV_8U, 255);
}

调整dft结果位置的代码给出：

void fftshift(const Mat &src, Mat &dst) {
dst.create(src.size(), src.type());
int rows = src.rows, cols = src.cols;
Rect roiTopBand, roiBottomBand, roiLeftBand, roiRightBand;
if (rows % 2 == 0) {
roiTopBand = Rect(0, 0, cols, rows / 2);
roiBottomBand = Rect(0, rows / 2, cols, rows / 2);
}
else {
roiTopBand = Rect(0, 0, cols, rows / 2 + 1);
roiBottomBand = Rect(0, rows / 2 + 1, cols, rows / 2);
}
if (cols % 2 == 0) {
roiLeftBand = Rect(0, 0, cols / 2, rows);
roiRightBand = Rect(cols / 2, 0, cols / 2, rows);
}
else {
roiLeftBand = Rect(0, 0, cols / 2 + 1, rows);
roiRightBand = Rect(cols / 2 + 1, 0, cols / 2, rows);
}
Mat srcTopBand = src(roiTopBand);
Mat dstTopBand = dst(roiTopBand);
Mat srcBottomBand = src(roiBottomBand);
Mat dstBottomBand = dst(roiBottomBand);
Mat srcLeftBand = src(roiLeftBand);
Mat dstLeftBand = dst(roiLeftBand);
Mat srcRightBand = src(roiRightBand);
Mat dstRightBand = dst(roiRightBand);
flip(srcTopBand, dstTopBand, 0);
flip(srcBottomBand, dstBottomBand, 0);
flip(dst, dst, 0);
flip(srcLeftBand, dstLeftBand, 1);
flip(srcRightBand, dstRightBand, 1);
flip(dst, dst, 1);
}

效果

原图：

结果：

Principle of Programming Language 复习笔记

2018-01-22T12:11:36.000Z

PPL知识点图：

Intro

Imperative

Functional

Logic

History

Lexical analysis: converts characters in the source program into lexical unit
Syntax analysis: transforms lexical units into parse trees which represent syntactic structure of program
Semantics analysis: generate intermediate code
Code generation: machine code is generated
Link and load

Most important criteria for evaluating programming languages include:
Readability, writability, reliability, cost
Major influences on language design have been application domains, machine architecture and software development methodologies
The major methods of implementing programming languages are: compilation, pure interpretation, and hybrid implementation

Syntax

Syntax: 语法 the form or structure of the expressions, statements, and program units

Semantics: 语义 the meaning of the expressions, statements, and program units

What programs do, their behavior and meaning

Subprogram

In-Out Mode

JVM

JAVA 的 JVM 的内存可分为 3 个区：堆 (heap)、栈 (stack) 和方法区 (method)
堆区:

存储的全部是对象，每个对象都包含一个与之对应的 class 的信息。(class 的目的是得到操作指令)
jvm 只有一个堆区 (heap) 被所有线程共享，堆中不存放基本类型和对象引用，只存放对象本身

栈区:

每个线程包含一个栈区，栈中只保存基础数据类型的对象和自定义对象的引用 (不是对象)，对象都存放在堆区中
每个栈中的数据 (原始类型和对象引用) 都是私有的，其他栈不能访问。
栈分为 3 个部分：基本类型变量区、执行环境上下文、操作指令区 (存放操作指令)。

方法区:

又叫静态区，跟堆一样，被所有的线程共享。方法区包含所有的 class 和 static 变量。
方法区中包含的都是在整个程序中永远唯一的元素，如 class，static 变量。

Parallelism

并发：一个程序的多个任务同时执行
并行：一个任务分解为多个子任务同时执行，协作完成一个问题
分布式：并行的计算在不同的计算机上进行

加速比：在p个核上的程序的加速比S=T串行/T并行

Amdahl’s Law

speedup <= work/span
q = fraction of sequential work
speedup <= 1/q

结构并行

Folk/Join 框架

if (任务足够小){
  直接执行该任务;
}
else{
  将任务一分为二;
  执行这两个任务并等待结果;
}

函数并行

Future
Memorization

循环并行

Forall 框架
栅栏问题

Recursion

线性递归和迭代

牛顿逼近平方根

def sqrt(x):
  threshold = 0.0001
  return sqrt_iter(1.0, x, threshold)

def sqrt_iter(guess, x, threshold):
  if abs(guess*guess -x) < threshold:
    return guess
  else:
    return sqrt_iter((guess + x/guess)/2, x, threshold)

阶乘

操作系统复习笔记

2018-01-21T07:29:54.000Z

Intro

系统提供的接口有二类：
命令级接口，它提供键盘或鼠标等命令。
程序级接口，它提供一组系统调用System calls ，即操作系统服务，供用户程序和其它程序调用。

OS is a resource allocator

Manages all resources
Decides between conflicting requests for efficient and fair resource use

OS is a control program

Controls execution of programs to prevent errors and improper use of the computer

bootstrap program is loaded at power-up or reboot

Typically stored in ROM or EEPROM, generally known as firmware
Initializates all aspects of system
Loads operating system kernel and starts execution

Operating System Structures

System Calls ：Programming interface to the services provided by the OS
System calls are the programming interface between processes and the OS kernel.

Why use API’s rather than system calls?( exam of my system programming)

System calls differ from platform to platform. By using a stable API, it is easier to migrate your software to different platforms.
The operating system may provide newer versions of a system call with enhanced features. The API implementation will typically also be upgraded to provide this support, so if you call the API, you’ll get it. If you make the system call directly, you won’t. (For example, code that called the Linux pthreads API for mutexes got the benefit of futexes without adding a single line of code. Had you called the system directly, that would not have happened.)
The API usually provides more useful functionality than the system call directly. If you make the system call directly, you’ll typically have to replicate the pre-call and post-call code that’s already implemented by the API. (For example the ‘fork’ API includes tons of code beyond just making the ‘fork’ system call. So does ‘select’.)
The API can support multiple versions of the operating system and detect which version it needs to use at run time. If you call the system directly, you either need to replicate this code or you can only support limited versions.

WINDOWS 启动

ROM中POST（Power On Self-Test）代码
BIOS/EFI（Extended Firmware Interfacte）
MBR(Main Boot Record)
引导扇区(Boot sector)
NTLDR/WinLoad
NTOSKRNL/HAL/BOOTVID/KDCOM
SMSS.EXE
WinLogon.EXE

Process

进程是什么？

一个具有一定独立功能的程序在一个数据集合上的一次动态执行过程。
正在执行中的程序 a program in execution

A process includes:

Program counter (PC)
Registers
Data section (global data)
Stack (temporary data)
Heap (dynamically allocated memory

As a process executes, it changes state

New（新）: The process is being created.
Running（运行、执行）: Instructions are being executed.
Ready（就绪）: The process is waiting to be assigned to a processor (CPU).
Waiting（等待、blocked阻塞）: The process is waiting for some event to occur.
Terminated（终止）: The process has finished execution.

Process state
Program counter
CPU registers
CPU scheduling information
Memory-management information
Accounting information
File management
I/O status information

3.2.2 Scheduling Queues

Job queue – set of all processes in the system.
Ready queue – set of all processes residing in main memory, ready and waiting to execute.
Device queues – set of processes waiting for an I/O device.
Process migration between the various queues.

3.2.3 Context Switch（上下文切换）

When CPU switches to another process, the system must save the state of the old process and load the saved state for the new process via a context switch
Context of a process represented in the PCB
Context-switch time is overhead; the system does no useful work while switching
Time dependent on hardware support

3.3.1 Process Creation进程创建

Parent process create children processes, which, in turn create other processes, forming a tree of processes.
Generally, process identified and managed via a process identifier (pid)

Resource sharing：

Parent and children share all resources.
Children share subset of parent’s resources.
Parent and child share no resources.
fork system call creates new process
- int pid = fork();
- 从系统调用 fork 中返回时，两个进程除了返回值 pid 不同外，具有完全一样的用户级上下文。在子进程中，pid 的值为0;父进程中， pid 的值为子进程的进程号。
exec system call used after a fork to replace the process’ memory space with a new program

Producer-Consumer Problem

unbounded-buffer places no practical limit on the size of the buffer.
bounded-buffer assumes that there is a fixed buffer size.

Threads

The concept of a process as embodying two characteristics :

Unit of Resource ownership （资源拥有单位）- process is allocated a virtual address space to hold the process image
Unit of Dispatching （调度单位）- process is an execution path through one or more programs
- execution may be interleaved with other processes

A thread (or lightweight process) is a basic unit of CPU utilization; it consists of:

a thread ID
program counter
register set
stack space

Has an execution state (running, ready, etc.)；Saves thread context when not running；Has an execution stack；Has some per-thread static； storage for local variables；Has access to the memory and resources of its process，all threads of a process share this。

A thread shares with threads belonging to the same process its:

code section
data section
operating-system resources

(Process Have a virtual address space which holds the process image Protected access to processors, other processes, files, and I/O resources)

User Threads（用户级线程）

Thread management done by user-level threads library

用户线程的维护由应用进程完成；
内核不了解用户线程的存在；
用户线程切换不需要内核特权；
用户线程调度算法可针对应用优化；
一个线程发起系统调用而阻塞，则整个进程在等待。

Three primary thread libraries:

POSIX Pthreads
Win32 threads
Java threads

Kernel Threads （内核级线程）
Supported by the Kernel

内核维护进程和线程的上下文信息；
线程切换由内核完成；
时间片分配给线程，所以多线程的进程获得更多CPU时间；
一个线程发起系统调用而阻塞，不会影响其他线程的运行。

Examples

Windows XP/2000 及以后
Solaris
Linux
POSIX Pthreads
Mac OS X

CPU scheduling

Turnaround time周转时间 =完成时间-提交时间
Average Turnaround time平均周转时间=Σ周转时间/进程数
Response time响应时间：从进程提出请求到首次被响应（而不是输出结果）的时间段（在分时系统环境下）
Waiting time等待时间 – 进程在就绪队列中等待的时间总和
Throughput(吞吐量) – # of processes that complete their execution per time unit

first-come, first served (FCFS)
shortest job first (SJF)

provably optimal, but difficult to know CPU burst

Highest Response Ratio Next ( HRRN,最高响应比优先)

响应比R = (等待时间 + 要求执行时间) / 要求执行时间
是FCFS和SJF的折衷

general priority scheduling

starvation, and aging

round-robin (RR)

for time-sharing, interactive system
problem: how to select the time quantum?

Multilevel queue

different algorithms for different classes of processes

Multilevel feedback queue

allow process to move from one (ready) queue to another

Process Synchronization

Critical-Section Problem

Each process has a code segment, called critical section（临界区）, in which the shared data is accessed.
Problem – ensure that when one process is executing in its critical section, no other process is allowed to execute in its critical section.

// TODO

Deadlock

4个必要条件

Mutual exclusion（互斥）: only one process at a time can use a resource.
Hold and wait（占有并等待、请求和保持） : a process holding at least one resource is waiting to acquire additional resources held by other processes.请求和保持(Hold and wait)条件：进程已经保持了至少一个资源，但又提出了新的资源要求，而该资源又已被其它进程占有，此时请求进程阻塞，但又对已经获得的其它资源保持不放
No preemption（不可抢占、不剥夺） : a resource can be released only voluntarily by the process holding it, after that process has completed its task.
Circular wait（循环等待）: there exists a set {P0, P1, …, Pn} of waiting processes such that P0 is waiting for a resource that is held by P1, P1 is waiting for a resource that is held by P2, …, Pn–1 is waiting for a resource that is held by Pn, and Pn is waiting for a resource that is held by P0.

Resource-Allocation Graph

请求边分配边

Safety State

Avoidance algorithms
Single instance of a resource type

Use a resource-allocation graph

Multiple instances of a resource type

Use the banker’s algorithm

Resource-Allocation Graph Algorithm

算法：假设进程Pi申请资源Rj。只有在需求边Pi > Rj 变成分配边 Rj > Pi 而不会导致资源分配图形成环时，才允许申请。
用算法循环检测，如果没有环存在，那么资源分配会使系统处于安全状态。如果存在环，资源分配会使系统不安全。进程Pi必须等待。

Detection:

wait-for graph

Banker Algorithm

Main Memory

Contiguous Allocation

fixed partitioning
dynamic partition

Paging

Page table is kept in main memory

Page-table base register (PTBR) points to the page table， x86: cr3
Page-table length register (PRLR) indicates size of the page table
In this scheme every data/instruction access requires two memory accesses. One for the page table and one for the data/instruction.
The two memory access problem can be solved by the use of a special fast-lookup hardware cache called associative memory or translation look-aside buffers (TLBs) (联想寄存器、快表）
Some TLBs store address-space identifiers (ASIDs) in each TLB entry – uniquely identifies each process to provide address-space protection for that process

Effective Access Time (EAT)
EAT = (t+e) a + ( t + t + e) (1 – a)

Virtual Memory

局部性原理(principle of locality)：指程序在执行过程中的一个较短时期，所执行的指令地址和指令的操作数地址，分别局限于一定区域。表现为：

时间局部性：一条指令的一次执行和下次执行，一个数据的一次访问和下次访问都集中在一个较短时期内；
空间局部性：当前指令和邻近的几条指令，当前访问的数据和邻近的数据都集中在一个较小区域内。

虚拟存储器是具有请求调入功能和置换功能，能仅把进程的一部分装入内存便可运行进程的存储管理系统，它能从逻辑上对内存容量进行扩充的一种虚拟的存储器系统

The effective memory-access time is
(1 – p) x physical-memory-access + p x ( page-fault-overhead + swap-page-out + swap-page-in + restart-overhead )

Page Replacement Algorithms

First-In-First-Out Algorithm (FIFO，先进先出算法)
Optimal Algorithm （OPT 最佳页面置换算法）
Least Recently Used (LRU) Algorithm (最近最久使用算法)
LRU Approximation Algorithms （近似LRU算法）：
Additional-Reference-Bits Algorithm
Second-Chance（clock） Algorithm
Enhanced Second-Chance Algorithm
Counting-Base Page Replacement：
Least Frequently Used Algorithm (LFU最不经常使用算法）
Most Frequently Used Algorithm (MFU引用最多算法)
Page Buffering Algorithm（页面缓冲算法）

Buddy

Slab

File System

FAT32磁盘的结构
主引导记录MBR是主引导区的第一个扇区，它由二部分组成:

第一部分主引导代码，占据扇区的前446个字节，磁盘标识符（FD 4E F2 14）位于这段代码的未尾。
第二部分是分区表，分区表中每个条目有16字节长，分区表最多有4个条目，第一个分区条目从扇区的偏移量位置是0x01BE。

扩展引导记录与主引导记录类同，如该扩展分区未装操作系则第一部分主引导代码为0，标签字也标记一个扩展分区引导区和分区引导区的结束。
PC计算机系统启动时，首先执行的是BIOS引导程序，完成自检，并加载主引导记录和分区表，然后执行主引导记录，由它引导激活分区引导记录，再执行分区引导记录，加载操作系统，最后执行操作系统，配置系统。

IO System

Polling
interruption
Direct memory access

Buffering - store data in memory while transferring between devices，用来保存在两设备之间或在设备和应用程序之间所传输数据的内存区域。

缓冲作用：

解决设备速度不匹配
解决设备传输块的大小不匹配
为了维持拷贝语义“copy semantics”要求

Caching （高速缓存）- fast memory holding copy of data

缓冲与高速缓存的差别是缓冲只是保留数据仅有的一个现存拷贝，而高速缓存只是提供了一个驻留在其他地方的数据的一个高速拷贝。
高速缓存和缓冲是两个不同的功能，但有时一块内存区域也可以同时用于两个目的。
当内核接收到I/O请求时，内核首先检查高速缓存以确定相应文件的内容是否在内存中。如果是，物理磁盘I/O就可以避免或延迟。

SPOOLing（Simultaneous Peripheral Operation On Line）称为假脱机技术。用来保存设备输出的缓冲，这些设备如打印机不能接收交叉的数据流。
操作系统通过截取对打印机的输出来解决这一问题。应用程序的输出先是假脱机到一个独立的磁盘文件上。当应用程序完成打印时，假脱机系统将相应的待送打印机的假脱机文件进行排队
Printing：打印机虽然是独享设备，通过SPOOLing技术，可以将它改造为一台可供多个用户共享的设备。

RAID 0：如果你有n块磁盘，原来只能同时写一块磁盘，写满了再下一块，做了RAID 0之后，n块可以同时写，速度提升很快，但由于没有备份，可靠性很差。n最少为2。
RAID 1：正因为RAID 0太不可靠，所以衍生出了RAID 1。如果你有n块磁盘，把其中n/2块磁盘作为镜像磁盘，在往其中一块磁盘写入数据时，也同时往另一块写数据。坏了其中一块时，镜像磁盘自动顶上，可靠性最佳，但空间利用率太低。n最少为2。
RAID 3：.RAID 3是若你有n块盘，其中1块盘作为校验盘，剩余n-1块盘相当于作RAID 0同时读写，当其中一块盘坏掉时，可以通过校验码还原出坏掉盘的原始数据。这个校验方式比较特别，奇偶检验，1 XOR 0 XOR 1=0，0 XOR 1 XOR 0=1，最后的数据时校验数据，当中间缺了一个数据时，可以通过其他盘的数据和校验数据推算出来。但是这有个问题，由于n-1块盘做了RAID 0，每一次读写都要牵动所有盘来为它服务，而且万一校验盘坏掉就完蛋了。最多允许坏一块盘。n最少为3.
RAID 5：在RAID 3的基础上有所区别，同样是相当于是1块盘的大小作为校验盘，n-1块盘的大小作为数据盘，但校验码分布在各个磁盘中，不是单独的一块磁盘，也就是分布式校验盘，这样做好处多多。最多坏一块盘。n最少为3.

RAID 6：在RAID 5的基础上，又增加了一种校验码，和解方程似的，一种校验码一个方程，最多有两个未知数，也就是最多坏两块盘。

学长的去年题目回忆

70题选择题，每题1分。约15分题目涉及到实验内容。

三道简答题：

一个文件系统采用index allocation, 有16个direct index, single/double/triple indirect index各一个，Block size=1024B, block number fits into 4bytes, 计算最大支持的文件大小.
buddy memory allocation.
hashed page table, inverted page table.

计算机视觉复习笔记

2018-01-16T09:26:44.000Z

Intro

计算机视觉

研究用计算机来模拟生物外显或宏观视觉功能的科学和技术．计算机视觉系统的主要目标是用图像创建或恢复现实世界模型，然后认知现实世界．

计算机视觉中心任务就是对图象进行理解

对单幅图象的理解
对多幅图象的理解
对视频图象的理解

理解什么？

形状、位置、运动、类别

核心问题

计算机视觉五大研究内容

1）输入设备 (input device) 包括成像设备和数字化设备．成象设备是指通过光学摄像机或红外、激光、超声、X射线对周围场景或物体进行探测成象，得到关于场景或物体的二维或三维数字化图像．

2）低层视觉 (low level/early) 主要是对输入的原始图像进行处理．这一过程借用了大量的图像处理技术和算法，如图像滤波、图像增强、边缘检测、纹理检测、运动检测，以便从图像中抽取诸如角点、边缘、线条、边界、⾊彩、纹理、运动等关于场景的基本特征．

3）中层视觉（middle level）主要任务是恢复场景的深度、表面法线方向、轮廓等有关场景的2.5维信息，实现的途径有立体视觉（ stereovision ）、测距成像（rangefinder）、从X恢复形状（Shape from X, X = 明暗、纹理、运动）系统标定、系统成像模型等研究内容一般也是在这个层次上进行的．分割、拟合等

4）高层视觉（high level）主要任务是在以物体为中心的坐标系中，在原始输入图像、图像基本特征、2.5维图的基础上，恢复物体的完整三维图，建立物体三维描述，识别物体并确定物体的位置和方向

5）体系结构（system architecture）在高度抽象的层次上，根据系统模型而不是根据实现设计的具体例子来研究系统的结构．为了说明这一点，可以考虑建筑设计中某一时期的建筑风格（如清朝时期）和根据这一风格设计出来的具体建筑之间的区别．体系结构研究涉及一系列相关的课题：并行结构、分层结构、信息流结构、拓扑结构以及从设计到实现的途径等等．

Marr 视觉计算理论

Marr视觉计算理论立足于计算机科学，系统地概括了心理生理学、神经生理学等方面取得的所有重要成果，是视觉研究中迄今为止最为完善的视觉理论．
Marr建立的视觉计算理论，使计算机视觉研究有了一个比较明确的体系，并大大推动了计算机视觉研究的发展．人们普遍认为，计算机视觉这门学科的成与Marr的视觉理论有着密切的关系．

信息处理分析的三个层次

计算层	表示和算法层	实现层
计算的目的是什么？为什么这一计算是合适的？执行计算的策略是什么？	如何实现这个计算？输入、输出的表示是什么？表示与表示之间的变换是什么？	在物理上如何实现这些表示和算法？

视觉表示框架的三个阶段

第一阶段(Primal Sketch)：将输入的原始图像进行处理，抽取图像中诸如角点、边缘、纹理、线条、边界等基本特征，这些特征的集合称为基元图；

第二阶段(2.5D Sketch)：指在以观测者为中心的坐标系中，由输入图像和基元图恢复场景可见部分的深度、法线方向、轮廓等，这些信息包含了深度信息，但不是真正的物体三维表示，因此，称为二维半图；

第三阶段(3D Model)：在以物体为中心的坐标系中，由输入图像、基元图、二维半图来恢复、表示和识别三维物体。

Gestalt 理论

Gestalt理论反映了方类视觉本质的某些方面，但它对感知组织的基本原理只是一种公理性的描述(descriptive)，⽽不是一种机理性的解释(explanatory)。

Law of Proximity
- Elements that are closer together will be perceived as a coherent object
Law of Similarity
- Elements that look similar will be perceived as part of the same form
Law of Good Continuation
- Humans tend to continue contours whenever the elements of the pattern establish an implied direction
Law of Closure
- Humans tend to enclose a space by completing a contour and ignoring gaps in the figure.
Law of Prägnanz (good form)
- A stimulus will be organized into as good a figure as possible. Here, good means symmetrical, simple, and regular.
Law of Figure/Ground
- A stimulus will be perceived as separate from it’s ground.

二值图像

二值化

Otsu流程：

先计算影像的直方图
把直方图强度大于阈值的像素分成一组，把小于阈值的像素分成另一组。
分别计算这两个组的变异数，并把两个组内变异数相加。
将0〜255当前阈值来计算组内变异数和，总和值最小的就是结果阈值。

几何特性

面积（零阶矩）

$$
A = \sum_{i=0}^{n-1}\sum_{j=0}^{m-1}B[i,j]
$$

区域中心（一阶矩）

$$
\overline x = \frac{ \sum_{i=0}^{n-1}\sum_{j=0}^{m-1}j B[i,j]}{A} \
\overline y = \frac{ \sum_{i=0}^{n-1}\sum_{j=0}^{m-1}i B[i,j]}{A}
$$

方向

– 某些形状(如圆)是没有方向的;
– 假定物体是长形的,长轴方向为物体的方向;

求方向：最小化问题

最小二乘法 $r_{ij}^2$为点到直线的距离
$$
\chi^2 = \sum_{i=0}^{n-1}\sum_{j=0}^{m-1}r_{ij}^2B[i,j]
$$

伸长率

$$
E = \frac{\chi_{max}}{\chi_{min}}
$$

密集度

$\rho$是周长
$$
C = \frac{A}{\rho^2}
$$
圆 > 正方形 > 长方形

形态比

区域的最小外接矩形的长与宽之比

欧拉数

亏格数 (genus)

连通分量数减去洞数

$$ E = C -H $$

平移、旋转、放缩不变

距离度量

metric:
- d(p,q) >= 0, 当且仅当 p=q时， d(p,q)=0
- d(p,q) = d(q,p)
- d(p,r) <= d(p,q) + d(q,r)

常用距离
- 欧几里德距离(Euclidean)
- 街区距离(block)
- 棋盘距离(chess) max
- Minkowski 距离(p-norm distance) $L_p$

投影计算

水平投影：计算每一列像素为1的个数。

垂直投影：计算每一行像素为1的个数。

对角线投影：从左下到右上，计算每一个对角线像素为1的个数。

连通区域标记

四联通邻点 / 路径

八联通邻点 / 路径

连通是等价关系，自反性，对称性，传递性

连通分量：连通像素的集合

递归算法

(1)扫描图像，找到没有标记的一个前景点（即像素值为1），分配标记L
(2)递归分配标记L给该点的邻点
(3)如果不存在没标记的点，则停⽌
(4)返回第(1)步

序贯算法

序贯算法（for 4连通）

(1)从左到右、从上到下扫描图像
(2)如果像素点值为1，则（分4种情况）

如果上面点和左面点有且仅有一个标记，则复制这一标记
如果两点有相同标记，复制这一标记
如果两点有不同标记，则复制上点的标记且将两个标记输⼊等价表中作为等价标记
否则给这一个象素点分配一新的标记并将这一标记输⼊等价表

(3)如果需要考虑更多点，则返回(2)
(4)在等价表的每一等价集中找到最低的标记
(5)扫描图像，用等价表中的最低标记取代每一标记

边界跟踪算法

(1)从左到右，从上到下扫描图像，求区域S的起始点
(2)用c表示当前边界上被跟踪的象素点，置c=s(k)，记c的左邻点为b
(3)按逆时针方向记从b开始的c的8个8邻点分别为
(4)从b开始，沿逆时针方向找到第一个 ni in S
(5)置 c = s(k) = ni, b = ni-1
(6)重复步骤(3),(4),(5)，直到s(k)=s(0)

形态学算子

膨胀 Dilate
腐蚀 Erode
开操作 Open
闭操作 Close

Edge Detection

基本思想：

函数导数反映图像灰度变化的显著程度。
一阶导数的局部极大值，或二阶导数的过零点。

模板 & 卷积

模板(Template/Kernel): A matrix represents an operator. A convolution template centers on each pixel in an image and generates new output pixels.
卷积(Convolution): by using the template, the new pixel value is computed by multiplying each pixel value in the neighborhood with the corresponding weight in the convolution mask and summing these products.

基于一阶导数的边缘检测

梯度

$$
G(x, y) = \begin{bmatrix}
G_x,\
G_y
\end{bmatrix}
= \begin{bmatrix}
\frac{\delta f}{\delta x}\
\frac{\delta f}{\delta y}
\end{bmatrix}
\
\arrowvert G(x,y) \arrowvert = \sqrt{G_x^2 + G_y^2}\
\arrowvert G(x,y) \arrowvert = \arrowvert{G_x} \arrowvert+ \arrowvert{G_y} \arrowvert \
\arrowvert G(x,y) \arrowvert \approx max(\arrowvert{G_x} \arrowvert, \arrowvert{G_y} \arrowvert )
$$

梯度方向
$$
\alpha(x, y) = arctan(G_y / G_x)
$$
梯度方向为函数最大变化率方向。

用差分近似偏导数
$$
G_x = f \lbrack x+1, y \rbrack - f \lbrack x,y \rbrack
\
G_y = f \lbrack x,y \rbrack - f \lbrack x, y+1\rbrack
\
G_x = \begin{bmatrix}
-1 & 1
\end{bmatrix}
\
G_y = \begin{bmatrix}
1 \
-1
\end{bmatrix}
\
$$

Roberts交叉算子

Sobel算子

Prewitt算子

基于二阶导数的边缘检测

图像灰度二阶导数的过零点对应边缘点。

Laplacian 算子

二阶导数的二维等价形式
$$
\nabla ^2 f = \frac{\delta ^2 f}{\delta x^2} + \frac{\delta^2 f}{\delta y^2}\
\nabla^2 \approx \begin{bmatrix}
0 & 1 & 0\ \
1 & -4 & 1 \
0 & 1 & 0
\end{bmatrix}\
\nabla^2 \approx \begin{bmatrix}
1 & 4 & 1\ \
4 & -20 & 4 \
1 & 4 & 1
\end{bmatrix}\
$$

LoG 边缘检测

Laplacian of Guassian

平滑滤波器是高斯滤波器
采用拉普拉斯算子计算二阶导数
边缘检测判据是二阶导数零交叉点并对应一阶导数的较大峰值

使用线性内插方法在子像素分辨率水平上估计边缘的位置

两种等效计算方法

图像与高斯函数卷积，再求卷积的拉普拉斯微分
求高斯函数的拉普拉斯微分，再与图像卷积

墨西哥草帽算子

Canny 边缘检测器

算法步骤：

用高斯滤波器平滑图像
用一阶偏导有限差分计算梯度幅值和方向
对梯度幅值进行非极大值抑制（NMS）
用双阈值算法检测和连接边缘

Why 高斯滤波器？
平滑去噪和边缘检测是一对⽭盾，应用高斯函数的一阶导数，在二者之间获得最佳的平衡

非最大值抑制

去掉幅值局部变化非极大的点．

将梯度角离散为圆周的四个扇区之一，以便用3×3的窗作抑制运算 //TODO
方向角离散化：
ζ [i, j] = Sector(θ[i, j])
抑制，得到新幅值图：
N[i, j]= NMS(M[i, j],ζ [i, j])

How to抑制？若M[i,j]不比沿梯度线方向上的两个相邻点幅值大，则N[i,j]=0

双阈值化

双阈值化并边缘链接
(a) 取高低两个阈值(T2, T1)作用于新幅值图N[i,j]，得到两个边缘图：高阈值和低阈值边缘图。
高阈值图：N[i,j] > T2；
低阈值图：N[i,j] > T1
(b) 连接高阈值边缘图，出现断点时，在低阈值边缘图中的8邻点域搜寻边缘点。

阈值太低: 假边缘;
阈值太高: 部分轮廊丢失.
选用两个阈值: 更有效的阈值方案．

Local Feature

Harris Corner Detector

$$
E(u,v) = \sum_{x,y}w(x,y)[I(x+u, y+v) - I(x, y)]^2 \
E(u,v) = \sum_{x,y}w(x,y)[uI_x + vI_y]^2 \
E \cong \begin{bmatrix} u & v \end{bmatrix} M \begin{bmatrix}u \ v\end{bmatrix} \
M = \sum_{x,y}w(x,y)\begin{bmatrix} I_x^2 & I_xI_y \ I_xI_y & I_y^2 \end{bmatrix} \
det M = \lambda_1 \lambda_2 \
trace M = \lambda_1 + \lambda_2 \
R = det M -k (trace M)^2
$$
若 R>0（大于某一阈值），则为角点；R<0，则为边；R 绝对值很小，则为平面区域。

选取 R 得到的符合条件点的局部最大值作为结果。

Scale Invariant Detection: Summary

Given: two images of the same scene with a large scale difference between them
Goal: find the same interest points independently in each image
Solution: search for maxima of suitable functions in scale and in space (over the image)

Methods:

Harris-Laplacian [Mikolajczyk, Schmid]: maximize Laplacian over scale, Harris’ measure of corner response over the image
SIFT [Lowe]: maximize Difference of Gaussians over scale and space

Laplacian Harris Corner

设计一个尺度不变函数，不同尺度下的图片找到的区域是相同的

在多尺度检测关键点
然后找上下不同尺度的局部最大值
消除低于阈值的点

Laplacian

Difference of Gaussians

$$
L = \sigma^2(G_{xx}(x, y, \sigma) +G_{yy}(x, y, \sigma)) \
DoG = G(x,y,k\sigma) - G(x,y,\sigma)
$$

SIFT Descriptor

基本步骤

构建尺度空间，建立图像金字塔。
寻找极值点（相邻的 26 个点中最大 / 最小值）
去除不好的特征点：使用近似的 harris corner，检测关键点的位置和尺度，并且去除边缘响应点。
用 16x16 的窗口放在特征点附近
将 16x16 分成 16 个 4x4 的窗口
计算窗口中每个像素的边的方向（梯度角减去 90°）
丢掉方向能量小的边（使用阈值）用直方图描述结果
将每个小窗口中的所有的方向离散成 8 个方向，一共 16x8=128 个

为什么只使用梯度信息
因为梯度信息可以表示边缘信息，并且在光照变化时有抵抗能力
如何实现旋转不变的
因为我们找的是对应位置的参考方向而非绝对方向
尺度不变的原理
因为在使用高斯模糊的不同尺度（如图像金字塔）处重新采样图像，仅当在两个尺度之间观察到最大值时才将梯度存储为描述符

Curves

Hough 变换

Hough 变换是基于投票原理的参数估计方法，是一种重要的形状检测技术
基本思想：图像中的每一点对参数组合进行表决，赢得多数票的参数组合为胜者（结果）

用极坐标来表示直线，从 $(x,y)$ 转换到 $(\rho , \theta)$ 空间。

步骤：

适当地量化参数空间（合适的精度即可）
假定参数空间每一个单元都是一个累加器，把累加器初始化为 0
对图像空间的每一点，在其所满足的参数方程对应的累加器上加 1
累加器阵列最大值对应模型的参数

Fourier Transform 傅立叶变换

变换：用正弦来表示，对于二维图像而言，由以下的基图像表示：

低频与高频：亮度灰度剧烈变化的地方是高频（图像边缘和轮廓的度量），对应边缘；

变化不大的是低频（图像强度的综合度量），对应大片色块。

近处看到的是高频分量，远处观察到的是低频分量。

怎么理解拉普拉斯金字塔的每一层是带通滤波？

拉普拉斯金字塔是将图像下采样后再上采样得到的差值图像。

相减保留细节高通

下采样降噪低通

Image Pyramids

“Gaussian” Pyramid
“Laplacian” Pyramid

$$
L_i = G_i – expand(G_{i+1})
$$

PCA

主元分析 (PCA)

用于数据集降维。

选择一个新的坐标系统进行线性降维，使得第一轴上是最大投影方向，第二轴上是第二大投影方向…… 以此类推。

Eigenface

预处理：根据人眼位置进行裁剪，进行灰度均衡化。
将二维人脸图像按一行行向量拼成一列，得到列图像；并把所有列图像拼起来，并求出平均人脸。
图像的协方差矩阵。
求协方差矩阵的特征值，以及归一化的特征向量，即为特征人脸。

识别

将两张图像都投影到人脸空间，比较投影向量的欧氏距离。

重构

将图像投影到人脸空间，通过左乘特征人脸空间矩阵恢复。

Camera Model

Thin Lenses

$$
\frac{1}{d_o} +\frac{1}{d_i} = \frac{1}{f}
$$

DOF: Depth of Field

FOV: Field of View focal length
$$
\varphi = tan^{-1}(\frac{d}{2f})
$$

Pinhole Camera Model

基本投影

$$
\frac{-x}{f} = \frac{X}{Z}
\
-x = f \frac{X}{Z}
\
(x, y,z) \rightarrow (-d \frac{x}{z}, -d\frac{y}{z}, -d)
$$

perspective projection

projection matrix
$$
\begin{bmatrix}
1 & 0 & 0 & 0 \
0 & 1 & 0 & 0 \
0 & 0 & 1 & 0 \
0 & 0 & -1/d & 0 \
\end{bmatrix} \begin{bmatrix}
x \ y \ z \ 1
\end{bmatrix} = \begin{bmatrix}
x \ y \ z \ -z/d
\end{bmatrix} \rightarrow \begin{bmatrix}
-d\frac{x}{z} \ -d\frac{y}{z} \ -d \ 1
\end{bmatrix} \rightarrow (-d\frac{x}{z} ,-d\frac{y}{z})
$$

Intrinsic Parameters

$$
(f_x, f_y, c_x, c_y)
$$

$$
\begin{bmatrix}
f_x & 0 & c_x \
0 & f_y & c_y \
0 & 0 & 1
\end{bmatrix}
$$

f: 焦距对应的像素

c: 图像中心与坐标原点的偏移

Lens Distortion

Radial Distortion

Caused by imperfect lenses: Geometry of Lens, Aperture Position(几何性质，光圈位置)

Deviations are most noticeable for rays that pass through the edge of the lens

常见枕形畸变和桶型畸变：

Correcting

k1, k2, k3
$$
x_{corrected} = x(1+k_1r^2+k_2r^4+k_3r^6) \
y_{corrected} = y(1+k_1r^2+k_2r^4+k_3r^6) \
$$

Tangential Distortion

The decentering of the optical component (assembly process)

由于 CMOS 等感光元件摆放倾斜，没有平行于图像平面
越靠近中间，畸变越小

Correcting

p1, p2
$$
x_{corrected} = x + [2p_1y+p_2(r^2+2x^2)] \
x_{corrected} = x + [p_1(r^2+2y^2)+2p_2x]
$$

Distortion Parameters

$$
(k_1, k_2, p_1, p_2, k_3)
$$

Extrinsic Parameters

$$
(\theta, \phi, \varphi,t_x, t_y, t_z)
$$

成像过程

内参，外参，畸变参数在成像各阶段中的角色（三维物体到真实图像的过程）

第一步是从世界坐标系转为相机坐标系，这一步是从三维点到三维点的转换，包括 R，t 等参数（相机外参）
第二步是从相机坐标系转为成像平面坐标系（像素坐标系），这一步是三维点到二维点的转换，包括 K 等参数（相机内参）
最后再用到畸变参数

Motion Estimation

Optical Flow

光流解决的是什么问题？

评估从 H 到 I 的像素运动，给出图像 H 中的一个像素，找到图像 I 中相同颜色的相近像素。解决的是像素对应问题。

光流三个基本假设是什么？

亮度恒定性 brightness constancy $I(x+u, y+v, t+1) = I(x, y, t)$

空间相干性 spatial coherence

细微运动 small motion

哪些位置的光流比较可靠，为什么

纹理复杂区域，梯度比较大且方向不同，求出来的特征值比较大

推导

$$
0 = I(x+u, y+v) - H(x, y) \
\approx [I(x,y)-H(x,y)] + uI_x + vI_y \
\approx I_t + uI_x + vI_y \
\approx I_t + \nabla I [u, v]
$$

Camera Calibration

Given:

N correspondences b/w scene and images

Recover the camera parameters:

Distortion coefficients, intrinsic para., extrinsic para

基于 Pattern/Reference Object 的相机标定

已知：给定标定物体的 N 个角点，K 个视角（棋盘格子两个点可以得出四个等式）
求解：所有的参数。N 个点 K 个视角可以列出 2NK 个等式，会带来 6K+4 个参数。需要 2NK>6K+4.

简述其基本过程

获取标定物体网格的角点在坐标系的位置
找到图片的角点
根据图像空间坐标系到世界坐标系列出等式
求解相机参数

Two View Vision

Triangular 测量

$$
\frac{T-(x^;-x^r)}{Z-f} = \frac{T}{Z} \rightarrow Z = \frac{fT}{x^l-x^r}
$$

视差：$d=x_l-x_r$，表示左右两摄像头成像的距离

Z 的结果误差主要在分母（视差）那里。视差小的时候，视差的误差会对 Z 产生很大的影响。

T 越小，误差越小

T 越大，看到的范围越小（因为是取两眼图像的交叉部分）

How To Do Stereo

undistortion 消除畸变影响
rectification 校正相机位置 row-aligned
correspondence 找到对应点计算视差disparity
Reprojection 三角测量->depth map

三维数据获取

Structured Lighting 结构光成像系统结构

结构光投影仪 + CCD 相机 + 深度信息重建系统

projector (one or more), CCD camera (one or more), and depth recovery system

结构光获取三维数据原理

encoding

位码

3：4567

2：2367

1: 1457

ICP - Iterative Closet Point

ICP：迭代最近点方法（用于多个摄像机的配准问题，即把多个扫描结果拼接在一起形成对扫描对象的完整描述）

基本步骤：

给定两个三维点集 X 与 Y，将 Y 配准到 X：

建立两个扫描结果之间的对应关系
通过迭代获得一个仿射变换函数能够描述1中对应点之间的变换关系
对 Y 应用上一步求得的仿射变换，更新 Y
两个结果中距离最近的点作为对应点，计算对应点的距离如果大于阈值，重复23，否则停止计算

寻找F就变成了找到使Cost最小的点的搜索过程，这就是本算法称为ICP的原因。

Image Segmentation

K-means

第一步：任意选择 k 个 sift 特征点作为初始聚类质心。

第二步：对于每个 sift 特征点，计算它们与 k 个聚类质心的欧式距离，找到最小的那个聚类质心，将该特征点放入此聚类质心集合中。

第三步：对于每个聚类质心集合，用所有元素均值来更新质心。

第四步：比较更新前后聚类质心集合No points are re-assigned, 否则返回步骤 2，如果迭代次数太多聚类失败。

Pros
- Simple and fast
- easy to implement
Cons
- Need to choose K
- Sensitive to outliers
Usage
- Rarely used for pixel segmentation

Graph Cut

Input: User provides rough indication of foreground region.
Goal: Automatically provide a pixel-level segmentation.

识别 Visual Recognition

基本任务及挑战因素

基本任务大概可以分为哪几大类

图片分类
检测和定位物体/图片分割
估计语义和几何属性
对人类活动和事件进行分类

都有哪些挑战因素

视角变换
光线变化
尺度变化
物体形变
物体遮挡
背景凌乱
内部类别多样

基于词袋 (BoW) 的物体分类

图像的BoW（bag-of-words）是指什么意思?

图像中的单词被定义为一个图像块的特征向量，图像的Bow模型即图像中所有图像块的特征向量得到的直方图

基本步骤

特征提取与表示
通过训练样本聚类来建立字典 (codewords dictionary)
用字典的直方图来表达一张图像
根据 bag of words 来分类未知图像

Stitching

Detect key points
Build the SIFT descriptors
Match SIFT descriptors
Fitting the transformation
RANSAC
Image Blending

RANSAC

RANdom SAmple Consensus

Approach: we want to avoid the impact of outliers, so let’s look for “inliers”, and use only those.

Intuition: if an outlier is chosen to compute the current fit, then the resulting line won’t have much support from rest of the points.

RANSAC loop:

Randomly select a seed group of points on which to base transformation estimate (e.g., a group of
matches)
Compute transformation from seed group
Find inliers to this transformation
If the number of inliers is sufficiently large, recompute least-squares estimate of transformation on
all of the inliers
Keep the transformation with the largest number of inliers

n, k, t, d

优点：是大范围模型匹配问题的一个普遍意义上的方法，且运用简单，计算快。
缺点：只能计算outliers不多的情况（投票机制可以解决outliers高的情况）

椭圆检测与拟合

2018-01-07T08:54:27.000Z

实验目标

调⽤CvBox2D cvFitEllipse2( const CvArr* points )实现椭圆拟合

实验环境

Windows 10 1709
OpenCV 3.3

实验过程

实现了一个fitEllipse()函数，函数原型如下：

1	void fitEllipse(char* filename, int threshold);

传入图片路径，然后显示出图片椭圆拟合之后的效果。

支持命令行解析图片路径参数。

1	FitEllipse.exe test.png

如果没有路径参数，默认时当前目录下的test.png。

首先把图片读进来，包括一份灰度图和一份原图。

1 2	Mat gray_img = imread(filename, IMREAD_GRAYSCALE); Mat result = imread(filename);

把灰度图二值化：

1	Mat binary_img = gray_img >= thresh;

然后使用findContours()检测二值化图像的轮廓点。

1	findContours(binary_img, contours, RETR_LIST, CHAIN_APPROX_NONE);

其中，参数3可以取值为：

RETR_EXTERNEL: 只检测最外围轮廓，包含在外围轮廓内的内围轮廓被忽略
RETR_LIST: 检测所有的轮廓，包括内围、外围轮廓，但是检测到的轮廓不建立等级关系，彼此之间独立，没有等级关系，这就意味着这个检索模式下不存在父轮廓或内嵌轮廓
RETR_CCOMP: 检测所有的轮廓，但所有轮廓只建立两个等级关系，外围为顶层，若外围内的内围轮廓还包含了其他的轮廓信息，则内围内的所有轮廓均归属于顶层
RETR_TREE: 检测所有轮廓，所有轮廓建立一个等级树结构。外层轮廓包含内层轮廓，内层轮廓还可以继续包含内嵌轮廓。

这里我们只选择RETR_LIST即可满足椭圆拟合的要求。

参数4可以取值为:

CHAIN_APPROX_NONE 保存物体边界上所有连续的轮廓点到contours向量内
CHAIN_APPROX_SIMPLE 仅保存轮廓的拐点信息，把所有轮廓拐点处的点保存入contours向量内，拐点与拐点之间直线段上的信息点不予保留

CHAIN_APPROX_TC89_L1, CHAIN_APPROX_TC89_KCOS: 使用teh-Chin Chain 近似算法

这里直接选择简单的CHAIN_APPROX_NONE。

然后对于检测出的轮廓点，用椭圆去拟合：

for each (auto contour in contours)
{
  if (contour.size() < 6) continue;
  RotatedRect box = fitEllipse(contour);
  ellipse(result, box, Scalar(0, 255, 255), 1, LINE_AA);
}

椭圆的拟合至少需要6个点，所以把少于6个点的检测结果直接丢弃，然后对于剩下的点用cv2::fitEllipse()来拟合，然后把椭圆绘制在原图上。

之后再保存结果就行了。

实验结果

原图：

结果：

可以看到椭圆基本上都检测并拟合出来了。

心得体会

这次实验就是先检测出图像的轮廓点，然后用fitEllipse()函数来拟合椭圆，整体不是太难。然后注意到一点就是imread()读入图片的时候第二个参数可以选择读入的模式，可以用IMREAD_GRAYSCALE让其读入单通道的图片矩阵数据。

附：源代码

// main.cpp
#include 
#include 
using namespace std;
using namespace cv;

void fitEllipse(char* filename, int threshold);

int main(int argc, char** argv) {
char* filename;

if (argc == 2) {
filename = argv[1];
}
else {
filename = "test.png";
}


fitEllipse(filename, 150);
cvWaitKey(0);
destroyAllWindows();

return 0;
}


void fitEllipse(char* filename, int thresh) {

Mat gray_img = imread(filename, IMREAD_GRAYSCALE);
Mat result = imread(filename);
vector> contours;
Mat binary_img = gray_img >= thresh;
findContours(binary_img, contours, RETR_LIST, CHAIN_APPROX_NONE);
for each (auto contour in contours)
{
if (contour.size() < 6) continue;
RotatedRect box = fitEllipse(contour);
ellipse(result, box, Scalar(0, 255, 255), 1, LINE_AA);
}

imwrite("result.png", result);
imshow(filename, gray_img);
imshow("result", result);
}

Harris Corner Detector

2017-12-20T15:22:32.000Z

实现Harris Corner Detector，输出结果以及中间过程。

实验过程

实现了一个HarrisCornerDetector函数，函数原型如下：

1	void HarrisCornerDetector(Mat& src, Mat& R, int aperture_size, double k);

整体实现过程按照HarrisCornerDetector的运算过程来。

先将彩色图片转为单通道的灰度图，便于计算。

1	cvtColor(src, src_gray, cv::COLOR_BGR2GRAY);

然后计算X和Y方向的导数，本质上是用一个算子做一下卷积，我这里使用了Sobel算子，会根据aperture_size生成对应的模板来计算近似的导数。

1 2	Sobel(src_gray, Ix, CV_32FC1, 1, 0, aperture_size); Sobel(src_gray, Iy, CV_32FC1, 0, 1, aperture_size);

然后就对整张导数的图来计算IxIx, IxIy, IyIy:

for (int i = 0; i < size.height; ++i) {
    for (int j = 0; j < size.width; ++j) {
        IxIx.at<float>(i,j) = Ix.at<float>(i, j) * Ix.at<float>(i, j);
        IxIy.at<float>(i,j) = Ix.at<float>(i, j) * Iy.at<float>(i, j);
        IyIy.at<float>(i,j) = Iy.at<float>(i, j) * Iy.at<float>(i, j);
    }
}

如此就得到了这个矩阵的值：

$$
\begin{vmatrix}
I_xI_x & I_xI_y \
I_xI_y & I_yI_y
\end{vmatrix}
$$

然后就是用W[x,y] 来求和

$$
\sum W(x,y) \begin{vmatrix}
I_xI_x & I_xI_y \
I_xI_y & I_yI_y
\end{vmatrix}
$$
W(x,y)可以取高斯滤波函数。

Size block(3,3);
GaussianBlur(IxIx, IxIx, block, 0);
GaussianBlur(IxIy, IxIy, block, 0);
GaussiaBlur(IyIy, IyIy, block, 0);

然后就可以求得到的矩阵的特征值。

因为是2x2的矩阵，特征方程就是 λ^2-(a+d)λ+ad-bc=0, 直接使用求根公式来求特征值，用韦达定理可以得到R的值。
$$
R = det(H) - k \times trace(H) ^2 \
$$

for (int i = 0; i < size.height; ++i) {
for (int j = 0; j < size.width; ++j) {
float a = IxIx.at<float>(i, j);
float b = IxIy.at<float>(i, j);
float c = b;
float d = IyIy.at<float>(i, j);
// 2-D mat a b c d
// λ^2-(a+d)λ+ad-bc=0
// λ1 + λ2 = a+d
// λ1 * λ2 = ad-bc
R.at<float>(i,j) = (a*d - b*c) - k*(a + d)*(a + d);
largeEigen.at<float>(i, j) = ((a + d) + sqrt((a + d)*(a + d) - 4 * (a*d - b*c))) / 2;
smallEigen.at<float>(i, j) = ((a + d) - sqrt((a + d)*(a + d) - 4 * (a*d - b*c))) / 2;
}
}

然后就可以根据R矩阵的值来画出检测出是角的点。

画的时候为避免多个点聚集，使用了Non Maximum Suppression, 只取一定范围内R值最大的点作为角的特征点。

for (int i = 0; i < size.height; ++i) {
    for (int j = 0; j < size.width; ++j) {
        if ((int)R.at(i, j) > threshold) {
            // Non Maximum Suppression
            if (R.at(i, j) == maxValue(R, NMS_size, i, j)) {
              circle(src, Point(j, i), 5, Scalar(0, 0, 255.0), 2, 8, 0);
            }
        }
    }
}

最后展示结果并存储结果

imshow("LargeEigen", largeEigen);
imshow("SmallEigen", smallEigen);
imshow("R", R);
imshow("result", src);

waitKey(0);

imwrite("LargeEigen.png", largeEigen);
imwrite("SmallEigen.png", smallEigen);
imwrite("R.png", R);
imwrite("result.png", src);

实验结果

略。

心得体会

实验中实现了 Harris Corner Detector，充分体会了这样的角点检测的运算过程。在实现的过程中，深刻理解了Harris的这种检测的原理，推导了计算的公式，加深了理解。实验的结果也符合预期。

在代码的编写过程中也进一步熟悉了OpenCV的使用，可以熟练地使用OpenCV进行图片的处理和卷积运算，提高了使用的熟练度。

附：源代码

#include 
#include 
#include 
#include 
using namespace std;
using namespace cv;

void HarrisCornerDetector(Mat& src, Mat& R, int aperture_size, double k);

int main(int argc, const char** argv) {
char* filename = new char[100];
//system("dir");
int apertureSize = 3;
double k = 0.04;

if (argc != 4) {
cout << "Illegal Input." << endl;
cout << "HarrisDetector.exe $path $k $aperture_size." << endl;
//cout << "using default settings." << endl;
//filename = "sample.png";
filename = "Sydney_Opera_House_Sails_edit02_adj.jpg";
}
else {
 // parse commandline args
sscanf_s(argv[1], "%s", filename, 99);
sscanf_s(argv[2], "%lf", &k);
sscanf_s(argv[3], "%d", &apertureSize);
}

Mat src = imread(filename);
if (!src.data) {
cout << "imread failed" << endl;
return 0;
}
Mat dst;

cout << "path: " << filename << endl;
cout << "k: " << k << endl;
cout << "aperture_size: " << apertureSize << endl;

HarrisCornerDetector(src, dst, apertureSize, k);

return 0;
}

int maxValue(Mat& img, int size, int y, int x)
{
uchar maxval = 0;
for (int i = y-size; i
{
if (i < 0 || i >= img.rows)continue;
for (int j = x-size; j< x + size; ++j)
{
if (j<0 || j >= img.cols)continue;
if (img.at(i,j) > maxval)
{
maxval = img.at(i, j);
}
}
}
//cout << "maxval" << (int)maxval << endl;
return maxval;
}

void HarrisCornerDetector(Mat& src, Mat& R, int aperture_size, double k)
{
Mat src_gray;
// convert to gray
cvtColor(src, src_gray, cv::COLOR_BGR2GRAY);
// normalize src
normalize(src_gray, src_gray, 0, 255, NORM_MINMAX);
convertScaleAbs(src_gray, src_gray);

R.create(src_gray.size(), CV_32FC1);
Mat Ix, Iy;

//sobel operation get Ix, Iy 
Sobel(src_gray, Ix, CV_32FC1, 1, 0, aperture_size);
Sobel(src_gray, Iy, CV_32FC1, 0, 1, aperture_size);
//cout << Ix.size() << " " << Ix.channels() << " " << Ix.depth() << endl;

// prepare Mat to store info
Mat IxIx(src_gray.size(), CV_32FC1);
Mat IxIy(src_gray.size(), CV_32FC1);
Mat IyIy(src_gray.size(), CV_32FC1);
Mat largeEigen(src_gray.size(), CV_32FC1);
Mat smallEigen(src_gray.size(), CV_32FC1);
//Mat heat(src_gray.size(), CV_8SC3);

Size size = src_gray.size();
for (int i = 0; i < size.height; ++i) {
for (int j = 0; j < size.width; ++j) {
IxIx.at<float>(i,j) = Ix.at<float>(i, j) * Ix.at<float>(i, j);
IxIy.at<float>(i,j) = Ix.at<float>(i, j) * Iy.at<float>(i, j);
IyIy.at<float>(i,j) = Iy.at<float>(i, j) * Iy.at<float>(i, j);
}
}

// W[x,y] * I
// W[x,y] is Gaussian Filter
Size block(3,3);
GaussianBlur(IxIx, IxIx, block, 0);
GaussianBlur(IxIy, IxIy, block, 0);
GaussianBlur(IyIy, IyIy, block, 0);

for (int i = 0; i < size.height; ++i) {
for (int j = 0; j < size.width; ++j) {
float a = IxIx.at<float>(i, j);
float b = IxIy.at<float>(i, j);
float c = b;
float d = IyIy.at<float>(i, j);
// 2-D mat a b c d
// λ^2-(a+d)λ+ad-bc=0
// λ1 + λ2 = a+d
// λ1 * λ2 = ad-bc
R.at<float>(i,j) = (a*d - b*c) - k*(a + d)*(a + d);
largeEigen.at<float>(i, j) = ((a + d) + sqrt((a + d)*(a + d) - 4 * (a*d - b*c))) / 2;
smallEigen.at<float>(i, j) = ((a + d) - sqrt((a + d)*(a + d) - 4 * (a*d - b*c))) / 2;
}
}
cout << endl;
normalize(R, R, 0, 255, NORM_MINMAX, CV_32FC1, Mat());
convertScaleAbs(R, R);
int threshold = 50;
int NMS_size = 15;
// Draw circles with NMS
for (int i = 0; i < size.height; ++i) {
for (int j = 0; j < size.width; ++j) {
if ((int)R.at(i, j) > threshold) {
// Non Maximum Suppression
if (R.at(i, j) == maxValue(R, NMS_size, i, j)) {
circle(src, Point(j, i), 5, Scalar(0, 0, 255.0), 2, 8, 0);
}
}
}
}

// Show the result
imshow("LargeEigen", largeEigen);
imshow("SmallEigen", smallEigen);
//imshow("heat", heat);
imshow("R", R);
imshow("result", src);

waitKey(0);

imwrite("LargeEigen.png", largeEigen);
imwrite("SmallEigen.png", smallEigen);
imwrite("R.png", R);
imwrite("result.png", src);
destroyAllWindows();
}

Change CUDA / Cudnn version without root privileges

2017-12-10T12:31:13.000Z

On a machine that is used publicly, one can’t update its cuda or cudnn version arbitrarily. Here is a way you can use the version you need.

Without root privileges, you can do this to make your code run well:

Below is the example of cudnn:(Cuda can be done in the same way.)

Because the essence of cuda or cudnn is dynamic link library, so all you need do is to make your machine know where to link it properly.

You can download and extract the proper version of cuda or cudnn to your ~ directory (or anywhere you like).

Then append these to your ~/.bashrc (or something like .zshrc etc.)

export PATH=/usr/local/cuda/bin:$PATH
export CUDA_HOME=/usr/local/cuda
export LD_LIBRARY_PATH=~/cuda/lib64:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local/cuda/lib64

Then source .bashrc will make it work perfectly.

Windows 重建引导

2017-11-29T16:38:35.000Z

1	bcdboot C:\Windows

University Course Timetabling Problem的一些笔记

2017-08-28T02:01:51.000Z

University Course Timetabling Problem

H1: Only one event is assigned to each room at any timeslot.
H2: The room is big enough for hosting all attending students and satisfies all the features required by the event.
H3: No student attends more than one event at the same time.

In addition, a candidate timetable receives a penalty cost for violating any of the following three soft constraints:

S1: A student should not have a class in the last slot of a day.
S2: A student should not have more than two classes in a row.
S3: A student should not have a single class on a day.

A Hybrid Algorithm for UCTP

hybrid algorithm mainly based on construction heuristics and meta-heuristics

The algorithm deals separately with hard and soft constraints.

The hard constraints:
- Local search and tabu search procedures

The soft constraints:
- Variable neighborhood descent and simulated annealing

In particular, simulated annealing plays a significant role.

The algorithm was developed, configured and tuned through the race-based experimental methodology.

heuristic algorithm

The use of experience and practical efforts to find answers to questions or to improve performance

维基百科词条heuristic，将其定义为基于经验的技巧（technique），用于解决问题、学习和探索。并对该词进行了更详尽的解释并罗列了多个相关领域：

A heuristic method is used to rapidly come to a solution that is hoped to be close to the best possible answer, or ‘optimal solution’. A heuristic is a “rule of thumb“, an educated guess, an intuitive judgment or simply common sense.

A heuristic is a general way of solving a problem. Heuristics as a noun is another name for heuristic methods.

Heuristic可以等同于：实际经验估计（rule of thumb）、有依据的猜测（educated guess, a guess beased on a certain amount of information, and therefore likely to be right）和常识（由经验得来的判断力）。

驾驶汽车到达某人的家，写成算法是这样的：沿167 号高速公路往南行至Puyallup；从South Hill Mall 出口出来后往山上开 4.5 英里；在一个杂物店旁边的红绿灯路口右转，接着在第一个路口左转；从左边褐色大房子的车道进去，就是North Cedar 路714 号。

用启发式方法来描述则可能是这样：找出上一次我们寄给你的信，照着信上面的寄出地址开车到这个镇；到了之后你问一下我们的房子在哪里。这里每个人都认识我们——肯定有人会很愿意帮助你的；如果你找不到人，那就找个公共电话亭给我们打电话，我们会出来接你。

启发式策略（heuristic）是一类在求解某个具体问题时，在可以接受的时间和空间内能给出其可行解，但又不保证求得最优解（以及可行解与最优解的偏离）的策略的总称。许多启发式算法是相当特殊的，依赖于某个特定问题。启发式策略在一个寻求最优解的过程中能够根据个体或者全局的经验来改变其搜索路径，当寻求问题的最优解变得不可能或者很难完成时（e.g. NP-Complete 问题），启发式策略就是一个高效的获得可行解的办法。

元启发式策略（metaheuristic）则不同，元启发式策略通常是一个通用的启发式策略，他们通常不借助于某种问题的特有条件，从而能够运用于更广泛的方面。元启发式策略通常会对搜索过程提出一些要求，然后按照这些要求实现的启发式算法便被称为元启发式算法。许多元启发式算法都从自然界的一些随机现象取得灵感（e.g. 模拟退火、遗传算法）。现在元启发式算法的重要研究方向在于防止搜索过早得陷入局部最优，已经有很多人做了相应的工作，例如禁忌搜索（tabu）和非改进转移（模拟退火）。

作者：王斌链接：https://www.zhihu.com/question/36635796/answer/70528089

来源：知乎著作权归作者所有。商业转载请联系作者获得授权，非商业转载请注明出处。

模拟退火

http://www.cnblogs.com/heaad/archive/2010/12/20/1911614.html

爬山算法

只能接受下一个解比当前解好的情况

模拟退火

以一定概率接受比当前解差的解，此概率随时间经过逐渐变小

/*
* J(y)：在状态y时的评价函数值
* Y(i)：表示当前状态
* Y(i+1)：表示新的状态
* r： 用于控制降温的快慢
* T： 系统的温度，系统初始应该要处于一个高温的状态
* T_min ：温度的下限，若温度T达到T_min，则停止搜索
*/
while( T > T_min )
{
　　dE = J( Y(i+1) ) - J( Y(i) ) ; 

　　if ( dE >=0 ) //表达移动后得到更优解，则总是接受移动
Y(i+1) = Y(i) ; //接受从Y(i)到Y(i+1)的移动
　　else
　　{
      // 函数exp( dE/T )的取值范围是(0,2/w/611) ，dE/T越大，则exp( dE/T )也
      if ( exp( dE/T ) > random( 0 , 1 ) )
      Y(i+1) = Y(i) ; //接受从Y(i)到Y(i+1)的移动
　　}
　　T = r * T ; //降温退火 ，0
　　/*
　　* 若r过大，则搜索到全局最优解的可能会较高，但搜索的过程也就较长。若r过小，则搜索的过程会很快，但最终可能会达到一个局部最优值
　　*/
　　i ++ ;
}

禁忌搜索

http://www.wangxianfeng.name/2012/08/intelligent-optimization-algorithms-tabu-search/

禁忌搜索算法的基本思想：一群兔子如何寻找世界最高峰

一群兔子要寻找世界最高山峰，兔子们找到了泰山，它们之中的一只就会留守在这里，其他的再去别的地方寻找。就这样，一大圈后，把找到的几个山峰一一比较，珠穆朗玛峰脱颖而出。

当兔子们再寻找的时候，一般地会有意识地避开泰山，因为他们知道，这里已经找过，并且有一只兔子在那里看着了，这就是禁忌搜索中的“禁忌表（Tabu List）”；
那只留在泰山的兔子一般不会就安家在那里了，它会在一定时间后重新回到找最高峰的大军，因为这个时候已经有了许多新的消息，泰山毕竟也有一个不错的高度，需要重新考虑，这个归队时间，就是禁忌搜索中的“禁忌长度（Tabu Length）”
如果在搜索的过程中，留守泰山的兔子还没有归队，但是找到的地方全是华北平原等比较低的地方，兔子们就不得不再次考虑选中泰山，也就是说，当一个有兔子留守的地方优越性太突出，超过了“best so far”的状态，就可以不顾及有没有兔子留守，都把这个地方考虑进来，这就是禁忌搜索中的“特赦准则（aspiration criterion）”。

禁忌搜索算法的基本流程

给定一个初始解和一种邻域，然后在当前解的邻域中确定若干候选解；若最佳候选解对应的目标值优于“best so far”状态，则忽视其禁忌特性，用其替代当前解和“best so far”状态，并将相应的对象加入禁忌表，同时修改禁忌表中各对象的任期；若不存在上述候选解，则选择在候选解中选择非禁忌的最佳状态为新的当前解，而无视它与当前解的优劣，同时将相应的对象加入禁忌表，并修改禁忌表中各对象的任期；如此重复上述迭代搜索过程，直至满足停止准则。条理化些，则简单禁忌搜索的算法步骤可描述如下：

1. 按照随机方法产生一个初始解作为当前解X，置空禁忌表。2. 判断算法是否满足终止准则？若是，则停止算法并输出优化结果；否则，继续以下步骤。3. 利用邻域函数在当前解X的邻域N(X)中选出满足禁忌要求的候选解集C-N(X);4. 在候选解集C-N(X)中选一个评价值最好的解作为当前解Y=C-N(X)-best。5. 对候选解判断藐视准则是否满足？若成立，则用满足藐视准则的最佳状态Y替代x成为新的当前解，即X=Y，并用与Y对应的禁忌对象替换最早进入禁忌表的禁忌对象，同时用Y替换“best so far”状态，然后转步骤7；否则，继续以下步骤。6. 判断候选解对应的各对象的禁忌属性，选择候选解集中非禁忌对象对应的最佳状态为新的当前解，同时用与之对应的禁忌对象替换最早进入禁忌表的禁忌对象元素。7. 转到步骤2。

可以明显地看到，邻域函数、禁忌对象、禁忌表和藐视准则，构成了禁忌搜索算法的关键。其中，邻域函数沿用局部邻域搜索的思想，用于实现邻域搜索；禁忌表和禁忌对象的设置，体现了算法避免迂回搜索的特点；藐视准则，则是对优良状态的奖励，它是对禁忌策略的一种放松。

禁忌搜索的关键参数

禁忌表：是用来存放禁忌对象的一个容器，放入禁忌表中的禁忌对象在解禁之前不能被再次搜索。禁忌表模拟了人的记忆机制，主要目的是阻止搜索过程中出现循环和避免陷入局部最优，进而探索更多搜索空间；
禁忌长度：可以为常数，也可以根据问题的规模确定；
评价函数：可以为直接评价函数，通过目标函数的运算得到评价函数；也可以是间接评价函数，构造其他评价函数替代目标函数（应反映目标函数的特性）减少计算复杂性.
藐视准则：它保证搜索过程在全部候选解被禁或者是有优于当前最优解的候选解被禁时，能够释放特定的解，从而实现全局优化搜索。当一个禁忌移动在随后T次的迭代内再度出现时，如果它能把搜索带到一个从未搜索过的区域，则应该接受该移动即破禁，不受禁忌表的限制。
终止规则：保证算法具有优良的优化性能和时间性能，可以
- (1) 确定步数终止，无法保证解的效果，应记录当前最优解；
- (2) 频率控制原则，当某一个解、目标值或元素序列的频率超过一个给定值时，终止计算；
- (3) 目标控制原则，如果在一个给定步数内，当前最优值没有变化，可终止计算。

Neighborhood Structure

$N_1(a)$ move a single event to a different room and time slot. (an event involved in at least one hard constraint violated)
$N_2(a)$ swap timeslots and rooms of two events.(at least one causes a hard constraint violated.)

Do not introduce violations of new the hard constraints.

$N_1’ \subseteq N_1$ , $N_2’ \subseteq N_2$

$N_3(a)$ swap all events assigned to two timeslots
$N_4(a)$ (Kempe chain ?…待续)

side walk move: move that doesn’t change the evaluation function value.

Simulated Annealing解决TSP问题

2017-07-25T08:19:08.000Z

模拟退火

参考: http://www.cnblogs.com/heaad/archive/2010/12/20/1911614.html

爬山算法

只能接受下一个解比当前解好的情况

模拟退火

以一定概率接受比当前解差的解，此概率随时间经过逐渐变小

/*
* J(y)：在状态y时的评价函数值
* Y(i)：表示当前状态
* Y(i+1)：表示新的状态
* r： 用于控制降温的快慢
* T： 系统的温度，系统初始应该要处于一个高温的状态
* T_min ：温度的下限，若温度T达到T_min，则停止搜索
*/
while( T > T_min )
{
　　dE = J( Y(i+1) ) - J( Y(i) ) ; 

　　if ( dE >=0 ) //表达移动后得到更优解，则总是接受移动
 Y(i+1) = Y(i) ; //接受从Y(i)到Y(i+1)的移动
　　else
　　{
      // 函数exp( dE/T )的取值范围是(0,1) ，dE/T越大，则exp( dE/T )也
      if ( exp( dE/T ) > random( 0 , 1 ) )
      Y(i+1) = Y(i) ; //接受从Y(i)到Y(i+1)的移动
　　}
　　T = r * T ; //降温退火 ，0
　　/*
　　* 若r过大，则搜索到全局最优解的可能会较高，但搜索的过程也就较长。若r过小，则搜索的过程会很快，但最终可能会达到一个局部最优值
　　*/
　　i ++ ;
}

参考：https://zh.wikipedia.org/wiki/%E6%A8%A1%E6%8B%9F%E9%80%80%E7%81%AB

演算步骤

初始化

生成一个可行的解作为当前解输入迭代过程，并定义一个足够大的数值作为初始温度。

迭代过程

迭代过程是模拟退火算法的核心步骤，分为新解的产生和接受新解两部分：

由一个产生函数从当前解产生一个位于解空间的新解；为便于后续的计算和接受，减少算法耗时，通常选择由当前新解经过简单地变换即可产生新解的方法，如对构成新解的全部或部分元素进行置换、互换等，注意到产生新解的变换方法决定了当前新解的邻域结构，因而对冷却进度表的选取有一定的影响。
计算与新解所对应的目标函数差。因为目标函数差仅由变换部分产生，所以目标函数差的计算最好按增量计算。事实表明，对大多数应用而言，这是计算目标函数差的最快方法。
判断新解是否被接受，判断的依据是一个接受准则，最常用的接受准则是Metropolis准则：若Δt′<0则接受S′作为新的当前解S，否则以概率exp（-Δt′/T）接受S′作为新的当前解S。
当新解被确定接受时，用新解代替当前解，这只需将当前解中对应于产生新解时的变换部分予以实现，同时修正目标函数值即可。此时，当前解实现了一次迭代。可在此基础上开始下一轮试验。而当新解被判定为舍弃时，则在原当前解的基础上继续下一轮试验。

模拟退火算法与初始值无关，算法求得的解与初始解状态S（是算法迭代的起点）无关；模拟退火算法具有渐近收敛性，已在理论上被证明是一种以概率1收敛于全局最优解的全局优化算法；模拟退火算法具有并行性。

停止准则

迭代过程的停止准则：温度T降至某最低值时，完成给定数量迭代中无法接受新解，停止迭代，接受当前寻找的最优解为最终解。

退火方案

在某个温度状态T下，当一定数量的迭代操作完成后，降低温度T，在新的温度状态下执行下一个批次的迭代操作。

TSP问题示例代码

import numpy as np
import math
import random
from copy import deepcopy
import matplotlib.pyplot as plt


class Tsp(object):
    def __init__(self, city_num, times, steps, init_temperature, simulated_k):
        self.city_num = city_num
        self.distance = np.mat(np.zeros((city_num, city_num)), dtype=int)
        self.init_temperature = init_temperature
        self.times = times
        self.steps = steps
        self.simulated_k = simulated_k

        self.x = list(range(self.city_num))
        self.y = list(range(self.city_num))
        self.now_path = list(range(self.city_num))
        self.new_path = list(range(self.city_num))
        self.best_path = list(range(self.city_num))
        self.now_value = 0
        self.new_value = 0
        self.best_value = -1
        self.path = [int(item)-1 for item in "1 8 38 31 44 18 7 28 6 37 19 27 17 43 30 36 46 33 20 47 21 32 39 48 5  42 24 10 45 35 4 26 2 29 34 41 16 22 3 23 14 25 13 11 12 15 40 9".split()]

        self.best_time = -1

    def read(self, path):
        print("Reading data...")
        # x = [0 for i in range(self.city_num)]
        # y = [0 for i in range(self.city_num)]
        with open(path, "r") as file:
            lines = file.readlines()
            self.x = [int(line.split()[1]) for line in lines]
            self.y = [int(line.split()[2]) for line in lines]
        x = self.x
        y = self.y
        for i in range(0, self.city_num):
            for j in range(0, self.city_num):
                self.distance[i, j] = round(math.sqrt((x[i]-x[j])*(x[i]-x[j]) + (y[i]-y[j])*(y[i]-y[j])))
        print(self.distance)
        print("Read data done.")

    def init_path(self):
        random.shuffle(self.now_path)

    def value_of_path(self, path):
        value = 0
        for (index, i) in enumerate(path):
            value += self.distance[path[index-1], i]
        return value

    def get_neighbor(self):
        self.new_path = deepcopy(self.now_path)
        i_1, i_2 = random.randint(0, self.city_num-1), random.randint(0, self.city_num-1)
        self.new_path[i_1], self.new_path[i_2] = self.new_path[i_2], self.new_path[i_1]

    def solve(self):
        self.init_path()
        self.now_value = self.value_of_path(self.now_path)
        print(self.now_value)
        temperature = self.init_temperature
        k = 0
        # while k < self.times:
        for k in range(self.times):
            print("k: ", k, "  now best: ", self.best_value)
            for n in range(self.steps):
                self.get_neighbor()
                self.new_value = self.value_of_path(self.new_path)
                if self.new_value < self.best_value or self.best_value == -1:
                    self.best_value = self.new_value
                    self.best_path = deepcopy(self.new_path)
                    self.best_time = k
                    print(self.best_value)
                # random_value = random.random()
                if self.new_value < self.now_value or math.exp((self.now_value - self.new_value)/temperature)>random.random():
                    self.now_path = deepcopy(self.new_path)
                    self.now_value = self.new_value
            temperature *= self.simulated_k

        print("best: ", self.best_value)
        # self.draw_best()
        # self.draw()
        # self.show()

    def test(self):
        # best = 33551(ceil)  33522(round)
        print(self.value_of_path(self.path))

    def draw_path(self, path):
        x = [self.x[i] for i in path]
        y = [self.y[i] for i in path]
        x.append(self.x[path[0]])
        y.append(self.y[path[0]])
        plt.plot(x, y, "-o")
        # plt.show()

    def draw_best(self):
        self.draw_path(self.path)

    def draw(self):
        self.draw_path(self.best_path)

    def show(self):
        plt.show()


if __name__ == "__main__":
    tsp = Tsp(48, 1000, 1000, 10000, 0.992)
    tsp.read("att48.tsp")
    tsp.solve()
    tsp.draw_best()
    # tsp.init_path()
    # tsp.draw_path(tsp.now_path)
    # tsp.test()
    tsp.draw()
    tsp.show()

运行结果

已知结果取round的情况下最优解是33522。

我以代码中的参数迭代1000次的结果是34384。

可视化如下：

图中蓝色路线为最优解，橙色路线为SA跑出来的解。

用C++实现MVVM

2017-07-12T15:07:31.000Z

序

MVVM(Model-View-ViewModel)是现在比较流行的GUI程序的框架。

整体代码的sample在Graphics Editor可以看到。

GUI库使用了QT5.9，功能代码主要使用了OpenCV库。

后面一些功能的编写不是我写的，所以代码风格可能有些不和谐，这里主要集中精力于整个框架的实现，忽略其各项功能的实现。

如果有任何理解不对的地方，欢迎批评指出。

MVVM

在阮一峰的”MVC，MVP 和 MVVM 的图示”中，介绍了三个架构之间的区别。

总结来说，就是在Model，View，ViewModel三个模块之间，View与ViewModel之间的数据通过双向绑定进行联系，View与Model之间不产生联系，ViewModel操作Model进行数据处理。

（这里实际写代码的时候好像跟阮老师所说的有一些区别：按照阮老师所说，应该是ViewModel在功能上相当于MVP模式中的Presenter，所有逻辑都部署在这里，实际上写的时候应该是大部分逻辑都部署在Model层进行数据操作，然后通知ViewModel和View进行更新，不知道是否是在我的理解中出现问题……）

项目目录

.
├── app.cpp
├── app.h
├── command.cpp
├── command.h
├── Commands
│   ├── alter_bright_command.cpp
│   ├── alter_bright_command.h
│   ├── crop_command.cpp
│   ├── crop_command.h
│   ├── detect_face_command.cpp
│   ├── detect_face_command.h
│   ├── filter_command.cpp
│   ├── filter_command.h
│   ├── open_file_command.cpp
│   ├── open_file_command.h
│   ├── reset_command.cpp
│   ├── reset_command.h
│   ├── rotate_command.cpp
│   ├── rotate_command.h
│   ├── save_bmp_command.cpp
│   ├── save_bmp_command.h
│   ├── save_file_command.cpp
│   └── save_file_command.h
├── common.cpp
├── common.h
├── GraphicsEditor.pro
├── GraphicsEditor.pro.user
├── LICENSE
├── main.cpp
├── model.cpp
├── model.h
├── MyView.cpp
├── MyView.h
├── notification.cpp
├── notification.h
├── parameters.cpp
├── parameters.h
├── README.md
├── test.pro
├── test.pro.user
├── view.cpp
├── view.h
├── viewmodel.cpp
├── viewmodel.h
└── view.ui

项目架构介绍

各个类以及之间关系如下：

App

class App
{
private:
    std::shared_ptr view;
    std::shared_ptr model;
    std::shared_ptr viewmodel;

public:
    App();
    void run();
};

在构造函数中，对各项需要初始化和绑定的数据进行绑定：


App::App():view(new View),model(new Model), viewmodel(new ViewModel)
{

    viewmodel->bind(model);

    view->set_img(viewmodel->get());

    view->set_open_file_command(viewmodel->get_open_file_command());
    view->set_alter_bright_command(viewmodel->get_alter_bright_command());
    view->set_filter_rem_command(viewmodel->get_filter_rem_command());
    view->set_reset_command(viewmodel->get_reset_command());
    view->set_detect_face_command(viewmodel->get_detect_face_command());
    view->set_save_file_command(viewmodel->get_save_file_command());
    view->set_save_bmp_file_command(viewmodel->get_save_bmp_file_command());
    view->set_rotate_command(viewmodel->get_rotate_command());
    view->set_crop_command(viewmodel->get_crop_command());

    viewmodel->set_update_view_notification(view->get_update_view_notification());
    model->set_update_display_data_notification(viewmodel->get_update_display_data_notification());

}

View

class View : public QMainWindow
{
    Q_OBJECT

public:
    explicit View(QWidget *parent = 0);
    ~View();

    void update();
    void set_img(std::shared_ptr image);
    void set_open_file_command(std::shared_ptr);
    void set_alter_bright_command(std::shared_ptr);
    void set_filter_rem_command(std::shared_ptr);
    void set_reset_command(std::shared_ptr);
    void set_detect_face_command(std::shared_ptr);
    void set_save_file_command(std::shared_ptr);
    void set_save_bmp_file_command(std::shared_ptr);
    void set_rotate_command(std::shared_ptr);
    void set_crop_command(std::shared_ptr);
    std::shared_ptr get_update_view_notification();

private slots:
    void on_button_open_clicked();
    void on_brightSlider_valueChanged(int value);
    void on_contrastSlider_valueChanged(int value);
    void on_filter_1_clicked();
    void on_reset_clicked();
 void on_actionOpen_File_triggered();
    void on_button_detect_face_clicked();
    void on_actionSave_triggered();
    void on_action_bmp_triggered();
    void on_action_png_triggered();
    void on_action_jpeg_triggered();
    void on_rotateSlider_valueChanged(int value);

private:
    Ui::View *ui;
    MyView* canvas;
    std::shared_ptr q_image;
    std::shared_ptr open_file_command;
    std::shared_ptr alter_bright_command;
    std::shared_ptr filter_rem_command;
    std::shared_ptr reset_command;
    std::shared_ptr detect_face_command;
    std::shared_ptr save_file_command;
    std::shared_ptr save_bmp_file_command;
    std::shared_ptr rotate_command;
    std::shared_ptr crop_command;

    std::shared_ptr update_view_notification;
};

本身提供一个用于更新的notification, 并提供get()方法交给ViewModel层进行绑定，如此可以实现ViewModel通知View进行更新。

同时，本身提供很多Command的成员变量，这些变量本省并不属于View层，本身属于ViewModel层，并在ViewModel层提供get方法给View层进行set绑定，这样就实现了View发送command给ViewModel层，View就可以在不知道Command具体派生类的情况下写代码。

ViewModel

class ViewModel
{
private:
    std::shared_ptr q_image;
    std::shared_ptr model;


    std::shared_ptr open_file_command;
    std::shared_ptr alter_bright_command;
    std::shared_ptr filter_rem_command;
    std::shared_ptr reset_command;
    std::shared_ptr detect_face_command;
    std::shared_ptr save_file_command;
    std::shared_ptr save_bmp_file_command;
    std::shared_ptr rotate_command;
    std::shared_ptr crop_command;

    std::shared_ptr update_display_data_notification;

    std::shared_ptr update_view_notification;

public:
    ViewModel();
    void bind(std::shared_ptr model);
    void exec_open_file_command(std::string path);
    void exec_alter_bright_command(int nBright, int nContrast);
    void exec_filter_rem_command();
    void exec_reset_command();
    void exec_detect_face_command();
    void exec_save_file_command(std::string path);
    void exec_save_bmp_file_command(std::string path);
    void exec_rotate_command(int angle);
    void exec_crop_command(double x_s, double y_s, double x_e, double y_e);

    void set_update_view_notification(std::shared_ptr notification);

    std::shared_ptr get_open_file_command();
    std::shared_ptr get_alter_bright_command();
    std::shared_ptr get_filter_rem_command();
    std::shared_ptr get_reset_command();
    std::shared_ptr get_detect_face_command();
    std::shared_ptr get_save_file_command();
    std::shared_ptr get_save_bmp_file_command();
    std::shared_ptr get_rotate_command();
    std::shared_ptr get_crop_command();

    std::shared_ptr get_update_display_data_notification();
    std::shared_ptr get();

    void notified();
};

与View层之间的通信在之前已经讲过，在构造函数中初始化具体的命令，然后get交给View的set进行绑定。这其中有一个向基类指针的转换，我是这么写的：

1	open_file_command = std::static_pointer_cast(std::shared_ptr (new OpenFileCommand(std::shared_ptr(this))));

然后与Model间的通信没有通过Command，而是直接获得一个Model的指针，调用它的功能函数即可。

Model


class Model
{
private:
     cv::Mat image;
     std::shared_ptr update_display_data_notification;
public:
    Model(){}
    void set_update_display_data_notification(std::shared_ptr notification);
    void open_file(std::string path);
    cv::Mat& get();
    cv::Mat& getOrigin();
    void notify();
    void save_file(std::string path);
    void save_bmp_file(std::string path);

    void alterBrightAndContrast(int nbright, int nContrast);
    void detect_face();
    void filterReminiscence(); //Filter No.1
 void reset();
    void rotate(double angle);
    void crop(int x1, int y1, int x2, int y2);
};

Model层本身又一个set一个notification的接口，这个notification用于通知ViewModel进行更新数据。

其他的就是针对数据的一些功能代码。

Command

本身可以写为纯虚类，我是写了一个成员变量是一个基类参数的指针，然后所有具体的command都是派生于此，提供exec()方法。


class Command
{
protected:
    std::shared_ptr params;
public:
    Command();
    void set_parameters(std::shared_ptr parameters){
        params = parameters;
    }
    virtual void exec() = 0;
};

Notification


class Notification
{
public:
    Notification();
    virtual void exec() = 0;
};



class UpdateDisplayDataNotification: public Notification{
private:
    std::shared_ptr viewmodel;
public:
    UpdateDisplayDataNotification(std::shared_ptr vm):viewmodel(vm){}
    void exec(){
        viewmodel->notified();
    }
};


class UpdateViewNotification: public Notification{
private:
    std::shared_ptr view;
public:
    UpdateViewNotification(std::shared_ptr v):view(v){}
    void exec(){
        view->update();
    }
};

Parameters


class Parameters
{
public:
    Parameters();
};


class PathParameters: public Parameters{
private:
    std::string path;
public:
    PathParameters(std::string _path):path(_path){
    }
    std::string get_path(){
        return path;
    }
};

以PathParameters为例表示了一般的新的参数的派生方法。

common

实现了cv::Mat与QImage之间的转换代码。

整体流程

在View层进行操作之后，会触发对应槽函数，该槽函数会准备好参数Parameter交给对应的Command，然后执行exec()这个command，exec会解出参数交给ViewModel层，ViewModel调用Model里对应的方法，进行数据操作，Model操作完之后会通知ViewModel更新显示数据，ViewModel会通知View刷新显示。

hexo公式显示

2017-07-01T03:25:10.000Z

啊昨天满心欢喜得搭好了hexo，配啊调啊一些奇怪的东西，最后终于开起来正常了，以为可以愉快得写博客了= =

后来才发现了太过naive了啊 = =。

公式显示我尝试换了pandoc的渲染，装了pandoc和hexo-renderer-pandoc，卸载了原装的hexo-renderer-marked，但是本地hexo s虽然显示正常，但是deploy过后网站上的就只有将$$转义成\[和\]的东西 = =

然后就又用hexo-math，它告诉我它已经deprecated了 = =

但是能显示我就感激不尽了 = =

然后就是markdown和mathJax的冲突了 = =

改了一发marked.js = =

感觉这样可移植性就变得糟糕了= =

但是现在怎么说看起来也算还好了吧 = =

最后用的办法是：

修改hexo的渲染源码: nodes_modules/marked/lib/marked.js:

去掉\\的额外转义
将em标签对应的符号中，去掉_,因为markdown中有*可以表示斜体，—就去掉了。

具体思路参考了使Marked.js与MathJax共存,
打开nodes_modules/marked/lib/marked.js: 第一步: 找到下面的代码:

1	escape: /^\\([\\`*{}\[\]()# +\-.!_>])/,

改为:

1	escape: /^\\([`*{}\[\]()# +\-.!_>])/,

这样就会去掉\\的转义了。第二步: 找到em的符号:

1	em: /^\b_((?:[^_]\|__)+?)_\b\|^\((?:\\\|[\s\S])+?)\(?!\*)/,

改为:

1	em:/^\((?:\\\|[\s\S])+?)\(?!\*)/,

去掉_的斜体含义,这样就解决了。这种方式指标不治本，因为保证不了还可能有其它的字符会冲突，这样的话，还需要返回去接着修改。

Han's Notebook

Semi-Supervised Learning 部分技巧简介

Entropy Minimization

Sharpening

Consistency Regularization

Regularization with stochastic perturbations

Label Guessing

Exponential Moving Average (EMA)

Virtual Adversarial Training (VAT)

Generic Regularization

Mixup

Other Related Things

Why $L_2$ loss

Warmup of $\lambda$

References

Notes about GCN Sampling

GraphSAGE

FastGCN

StochasticGCN

Adaptive Sampling

Cluster GCN

两道有趣的离散数学题目

1 实数集uncountable

1.1 有理数集countable

1.2 若集合A, B都countable，则$A \cup B$ countable

1.3 (0, 1)的无理数uncountable

2 Stolen Necklace Problem

2.1 Borsuk-Ulam Theorem

2.2 回到原题目

基于OpenCV 3的柱面全景拼接

实验目标

算法原理

柱面投影

目标

原理

特征抽取与匹配

目标

原理

计算变换，进行拼接

目标

原理

代码实现

接口

流程

柱面投影

特征点提取

特征点的匹配和筛选

计算homography并计算图像扩大行列

计算变换后的坐标并进行变换，拼接

实验结果

附

基本滤波器及图像傅里叶变换

实验内容

理论及实验细节及效果

均值滤波

原理

实现

结果

高斯滤波

原理

实现

结果

中值滤波

原理

实现

效果

双边滤波

原理

实现

结果

傅里叶变换完成图像的频域变换

原理

实现

效果

Principle of Programming Language 复习笔记

Intro

History

Syntax

Subprogram

JVM

操作系统复习笔记

计算机视觉复习笔记