tao of yue

Sneller, a SQL engine for analyzing unstructured logs

niyue — Wed, 25 Jun 2025 14:01:26 +0000

过去这些年，我陆陆续续阅读并收集了一些关于数据库和可观测性数据的资料。虽然期间也在内部做过一些英文资料的分享，但一直缺乏整理和公开发布的动力和时间。现在想想，即使这些内容还不够系统、也没有太多我自己的观点，还是值得陆续发布出来，也算是这些年积累下来的思考和学习的痕迹。

简介

Sneller是一个很有特点的数据系统，它用来处理日志数据:

schemaless的存储和计算，非常适合处理半结构化的JSON数据
计算引擎使用golang实现，但是大量使用了AVX-512汇编语言来实现算子中的性能要求特别高的部分，以至于这部份代码非常难读懂
存储使用了一种二进制JSON的格式，同时用了一种很特别的compression tiling的方法，把JSON中的字段进行了分桶存储，这样读取的时候可以做到类似列式访问的方式从磁盘读取更少的数据

Sneller

World’s fastest log analysis: λ + SQL + JSON + S3
SQL for JSON at scale: fast, simple, schemaless

Intro

Sneller is a high-performance SQL engine built to analyze petabyte-scale un-structured logs and other event data.

Sneller vs. other SQL solutions:

Sneller is designed to use cloud object storage as its only backing store.
Sneller’s SQL VM is implemented in AVX-512 assembly. Medium-sized compute clusters provide throughput in excess of terabytes per second.
Sneller is completely schemaless. No more ETL-ing your data! Heterogeneous JSON data can be ingested directly.
Sneller uses a hybrid approach between columnar and row-oriented data layouts to provide lightweight ingest, low storage footprint, and super fast scanning speeds.

Wy We Built A Schemaless SQL Engine

JSON has become one of the most common data interchange formats used in software engineering
bridges the gap between distributed columnar data-stores and document stores
- it provides a SQL-like interface for un-structured data (like billions of JSON records)
  - with the flexibility of a document store database and the performance of a modern distributed columnar database
Unlike typical columnar databases, the Sneller SQL engine is agnostic to the layout of the data ingested into records
- every record in our on-disk format is fully self-describing
  - Consequently, we can ingest and query JSON documents without any configuration dictating the expected layout of the input data

Other benefits of schemaless

On top of the improved flexibility, there are other operational benefits to avoiding a schema in your observability pipeline.
- Your logs or metrics won’t suddenly stop ingesting because someone accidentally deployed code that produces de-normalized data
- and you won’t have to worry about scheduling downtime because you need to run an expensive ALTER TABLE operation as part of a migration.
- Migrations become particularly troublesome when the tables in question have grown to petabytes in size; it often isn’t practical to re-write that much data

Compute

SQL VM

a bytecode-based virtual machine written almost entirely in AVX-512 assembly
our interpreter operates on flexibly-typed rows rather than on strictly-typed columnar data

design goal

One of the UX goals we have for Sneller is to provide consistent, predictable performance across a wide range of possible queries
Sneller Cloud’s pricing is based on the number of bytes scanned and not the total number of CPU cycles consumed by the query
we’d like the ratio between the number of CPU cycles consumed and the number of bytes scanned to remain roughly constant
- if a user adds a regular expression search to a WHERE clause in a query, we’d like that to consume only marginally more CPU time

example

SELECT SUM(y)
FROM my_table
WHERE x < 3

ITERATE my_table -> FILTER (x < 3) -> AGGREGATE SUM(y) -> output

Amazon Ion binary format [3]
- Each row of data is an ion structure composed of zero or more fields which themselves may be structures or lists (much like JSON records)

Evaluating x < 3 means locating the x field in each structure, unboxing it as a number, and then comparing it against the constant 3
We don’t typically know in advance that x will be a number, or even that x will be present at all, but we’ll see that the interpreter deals with data polymorphism just fine

expression evaluation

Expression AST

func NewFilter(e expr.Node, rest QuerySink) (*Filter, error)
    // NewFilter constructs a Filter from a boolean expression. The returned Filter
    // will write rows for which e evaluates to TRUE to rest.

SSA IR (single static assignment intermediate representation)
Steps
- convert the input AST into a Single Static Assignment-based intermediate representation
- convert SSA IR into a representation that is actually executable by our bytecode VM
  - SSA instructions generally map 1-to-1 to bytecode VM instructions
The bytecode VM just executes a linear sequence of instructions
- so our first order of business is computing a valid execution ordering of the SSA instructions
- A post-order traversal of the SSA program beginning at the return instruction will always produce a valid instruction ordering

AVX 512

the most important feature of AVX-512 as compared to AVX2 is the presence of “mask” (or “predicate”) registers.
Most AVX-512 instructions accept a mask register (k0 through k7) operand that causes the lanes corresponding to the zero bits of the mask to remain untouched by the instruction in question

vpaddd %zmm12,%zmm7,%zmm7{%k5}
The instruction above adds the sixteen 32-bit integers in zmm12 to the corresponding 32-bit integers in zmm7 and writes the results into zmm7, but only for the lanes where k5 is set. In other words, this instruction does not modify any lanes in zmm7 where k5 is unset.

AVX 512

if (x < 3) {
    x += 2;
} else {
    x -= 1;
}

;; assume we have broadcast
;;  1 into %zmm1,
;;  2 into %zmm2,
;;  3 into %zmm3
vpcmpltd %zmm3, %zmm0, %k1        ;; k1 = zmm0 < zmm3, per lane
knotw    %k1, %k2                 ;; k2 = ~k1
vpaddd   %zmm0, %zmm2, %zmm0{%k1} ;; zmm0 += 2 iff k1
vpsubd   %zmm1, %zmm0, %zmm0{%k2} ;; zmm0 -= 1 iff k2

Implementation targets
- Intel AVX-512
- ARM SVE/SVE2 and the RISC-V Vector Extension

Trampolines

Once we have an executable sequence of bytecode operations, we need a way of entering the VM from portable Go code.

Each of the physical operators implements a “trampoline” function (written in assembly) that
populates the right VM registers with the initial state for a group of up to sixteen rows
invokes the bytecode by jumping into the first virtual instruction
and then takes the return value from the VM and does something sensible with it
Trampoline routines can typically accept an arbitrarily large number of input rows.
We usually aim to process several hundred rows of data per call.
Importantly, this means that we spend basically zero time in actual portable Go code once we have compiled the bytecode;
the “inner loop” of the VM is implemented entirely in assembly.
This is a critical piece of the design, because it means the Go language does not meaningfully constrain the performance of the VM as compared to “faster” alternatives like C/C++ or Rust.
unpivot_accelerators_amd64.s

we have more than 250 bytecode operations, spanning everything from hash trie lookup to string matching to great-circle distance calculation

Storage

Columnar Compression Without Columns

uses object storage as its primary storage back-end is frequently going to have to move data across a network
- and network bandwidth is often a limiting factor for overall system performance

renting a 200Gbps-capable c6in.32xlarge instance
- download data from S3 at a maximum rate of 200 Gbps (23.28 GB/s)
- c6in.32xlarge machine has 128 CPU cores, and we can decompress zstd data at over 1 GB/s/core (in decompressed bytes) (128GB/s)
- a compression ratio of at least 5.49 (128 / 23.28)

Columnar Compression Without Columns

Since Sneller’s query engine is fundamentally row-oriented, and since Sneller supports arbitrary heterogeneous rows, we cannot employ exactly the same tricks as a “pure” columnar storage format
- Sneller allows users to provide entirely disjoint sets of fields in adjacent rows.
  - One thousand rows with ten fields each that are unique to the row would imply that there are ten thousand “columns” just for those one thousand rows!
our SQL virtual machine can process more than 4GB/s/core of raw ion data
- whereas zstd decompression typically runs at only about 1GB/s/core

Columnar Compression Without Columns – “Compression tiling”

In order to provide some of the performance benefits of compressed columnar storage, we use a technique we call “compression tiling”

Compression tiling

amazon ion [3]
Amazon Ion is a richly-typed, self-describing, hierarchical data serialization format offering interchangeable binary and text representations.
- The text format (a superset of JSON) is easy to read and author, supporting rapid prototyping.
- The binary representation is efficient to store, transmit, and skip-scan parse.

Compression tiling

zion format (zipped ion)
- 16 buckets
- Each block of data (a group of rows) has its top-level fields hashed into one of sixteen buckets
- each of the sixteen buckets of data is compressed separately

zion bucket
- Each zion “bucket” encodes both the field label (as an ion symbol) and the field value for each assigned field in each record
- Prepended to the sixteen compressed buckets of data is a compressed “shape” bitstream that describes how to traverse the buckets to reconstruct the original ion records.
- use the zstd general-purpose compression algorithm for compressing
  - all the buckets
  - the “shape” bitstream

Compression tiling – query against zion format

we can elide reconstruction of all the fields of each record that are not semantically important for the query
- which means that we can achieve up to a 16x reduction in the amount of time we spend decompressing data

zion example

text: { my_string: "hello", my_number: 3, my_bool: false }
binary: db8a8568656c6c6f8b21038c10

ion symbols
0x0a ==> my_string
0x0b ==> my_number
0x0c ==> my_bool
assume those symbols end up being hashed to buckets 5, 3, and 1

zion example

shape bitstream
- we’d write 033501
  - The first byte 0x03 indicates that we encoded a row with three fields
  - The next two bytes encode the buckets as individual nibbles, lsb-first
  - always round the encoded size of an individual “shape” into an even number of bytes
repeat the process above for every new row of ion data for the block
also include the ion symbol table in the shape bitstream
compress the shape bitstream and the buckets and concatenate them to form a complete compressed zion block

zion format

In practice, zion-compressed ion data with zstd-compressed buckets tends to be about 10% smaller than simply wrapping the ion data with zstd compression naïvely

Decoding zion

Decoding a zion block is just a matter of running all the encoding steps above in reverse
- After decompressing the shape bitstream and symbol table, we can map any requested fields (e.g. my_string) to symbol IDs, and we can hash those to determine which bucket(s) need to be decompressed
- we iterate the shape bitstream one item at a time and emit an ion structure composed of the field/value pairs encoded in each of the bucket(s) that we decompressed, taking care to omit any fields that we aren’t interested in reconstructing
- If we only need to produce one field, then we only have to decompress one bucket, and consequently we do (approximately) 1/16th of the decompression work

Decoding Performance

Our SQL engine is quite sensitive to the performance of the “reconstruction” process for ion data from zion blocks

The implementation of zion.Decoder.Decode uses one of a handful of assembly routines
We’ve managed to make this reassembly process quite fast (many GB/s/core) in practice.

Partitioning

Partitioning can be configured for a table to improve data locality, provide data isolation, and reduce the number of bytes that need to be scanned to satisfy a query

{
  "input": [
    {
      "pattern": "s3://example-bucket/logs/{region}/*.json.zst"
    }
  ],
  "partitions": [
    {
      "field": "region"
    }
  ]
}

logs/eu-west-1/access-log.json.zst
logs/eu-west-1/error-log.json.zst
logs/us-east-1/access-log.json.zst
logs/us-east-1/error-log.json.zst
logs/us-west-2/access-log.json.zst
logs/us-west-2/error-log.json.zst

SELECT COUNT(*) FROM logs WHERE region = 'us-west-2'

Partitioning on dates

{
  "input": [
    { "pattern": "s3://example-bucket/logs/{yyyy}/{mm}/{dd}/*.json.zst" }
  ],
  "partitions": [
    { "field": "date", "type": "date", "value": "$yyyy-$mm-$dd" }
  ]
}

Sneller SQL

Sneller SQL only supports part of the “DQL” portion of standard SQL (i.e. SELECT-FROM-WHERE statements, etc.).
- Sneller does not currently use SQL to perform database insert/update operations

Sneller SQL

Sneller SQL extends the concept of SQL “rows” and “columns” to “records” and “values”
- In other words, each “row” of data is a record of values, and records themselves are also values.
- A “table” is an un-ordered collection of records.

Instead of projecting “columns,” a Sneller SQL query projects record fields.

SELECT 1 AS x, 2 AS y, (SELECT 'z' AS z, NULL AS bar) AS sub

evaluates to:

{"x": 1, "y": 2, "sub": {"z": "z", "bar": null}}

Sneller SQL

Sneller SQL can handle tables that have records with wildly different schemas, as it does not assume that the result of a particular field selection must produce a particular datatype

Execution model

Since Sneller is designed to run as a “hosted” multi-tenant product, the query engine and query planner are designed so that queries will execute

within a (generous) fixed memory limit
and a linear amount of time with respect to the size of the input

Core types

Null
- Sneller SQL departs from ordinary SQL in that the value NULL compares equal to itself
- We chose to depart from the SQL standard here so that it would be possible to compare lists and structures with NULL fields using the = operator.

Core types

Missing
- MISSING is the notation for the absence of a value.
- Since Sneller SQL has functions and operator that only operate on certain data-types, some operations may not return a meaningful result.
- For example, the result of the expression 3 + ‘foo’ is MISSING, since we cannot add the integer 3 to the string ‘foo’
- Similarly, the result of a path expression foo.bar where the value foo is not a structure (or foo is a structure without any field called bar) is also MISSING
- When projecting columns for output, the Sneller SQL engine will omit labels for columns that produce MISSING. In other words, an expression that evaluates to {‘x’: ‘foo’, ‘y’: MISSING} is output as {‘x’: ‘foo’}

Subquery restrictions

Since the query engine implements sub-queries by buffering the intermediate query results, sub-queries are not allowed to have arbitrarily large result-sets.

The query planner will reject sub-queries that do not meet ONE of the following conditions:

The query has a LIMIT clause with a value of less than 10,000.
The query has a SELECT DISTINCT or GROUP BY clause.
The query is an aggregation with no corresponding GROUP BY clause (and thus has a result-set size of 1).
Additionally, the query execution engine will fail queries that produce too many intermediate results. (Currently this limit is 10,000 items.)

Ordering restriction

The ORDER BY clause may not operate on an unlimited number of rows, as it would require that the query engine buffer an unlimited number of rows in order to sort them.

The query engine will reject an ORDER BY clause that occurs without at least one of the following:

A LIMIT clause of 10000 elements or fewer
A GROUP BY clause

Querying multiple tables at once (‘++’ operator)

The operator ++ (double plus) allows to concatenate multiple sources into one. The operator is shorthand for UNION ALL; it allows to skip the common filter expression, selected columns, etc.

For example the following query will return results from three tables:

SELECT COUNT(*) FROM t1 ++ t2 ++ t3 WHERE location = 'Helsinki'

the company

Frank Wessels, Founder & CEO
Prior to founding Sneller, Frank was CTO at open source object storage startup MinIO
pricing
- simple pricing: $50 per PB scanned, 100x less expensive
sneller.io ==> sneller.ai

References

[1] https://github.com/SnellerInc/sneller
[2] zion format, https://sneller.ai/blog/zion-format/
[3] Amazon Ion binary format, https://amazon-ion.github.io/ion-docs/docs/binary.html
[4] Binary encoding, https://amazon-ion.github.io/ion-docs/docs/binary.html

调试宣言

niyue — Fri, 03 Feb 2023 00:20:44 +0000

Julia Evans是我很喜欢的一个漫画家。他画的漫画很多是关于编程的，所以比较感同身受。他的漫画技巧其实一点也不高超，画的都是我也能画出来的那种火柴棍小人。他的编程技能我估计也不是很高超，毕竟不是职业程序员。不过他是少数能把编程里面的一些问题和体验用漫画很好表达出来的一位创作家，按现在的说法可以算是复合型人才。

你可以在https://wizardzines.com找到他的很多作品，比如《Linux容器是如何工作的》这种。他的所有作品中，我觉得最感同身受同时也看了不止一次的漫画是一副叫做debugging manifesto的作品。甚至在工作中遇到困难时，看看这幅作品我觉得都有机会能够想出一些新的解决思路，因此，我自己动手把这幅作品画了一个中文改(抄)造(袭)版放在这里。

用ChatGPT做paper prototyping

niyue — Sun, 15 Jan 2023 01:25:50 +0000

ChatGPT是一个很有意思的技术，我最近看到一些人推荐说可以把chatGPT和软件开发流程做一些结合的想法，于是我也做了一些尝试。

我昨天把Kafka KIP-848整个设计文档都丢给ChatGPT读了一遍，然后开始问它问题，很多术语和可能的实现方向它都能立刻给出答案，结合自己的判断其实能很快深入实现层面。感觉和ChatGPT非常适合用来做快速的不用写代码只是方向探索性的原型设计和验证。

一次对话

比如我和它讨论文档里面的”Reconciliation Loops”，它能告诉我zookeeper/cassandra/etcd都用到了

当我有进一步实现层面的想法的时候，它也能快速告诉我可能的方案和问题

甚至当我有更复杂的改进想法的时候，它也能够应答告诉我这里可能碰到的一些问题

因为它的无所不知，所以你可以非常快的去提出一些可能的其他的设计和实现方案，比如我考虑是不是可以用关系数据库而不是etcd去实现这里的功能，它也能应答:

它甚至提出了使用SQLite的WAL mode而不是使用它之前说的触发器来解决这个问题，可以说是点子非常多了，这个对于原型设计过程中去找寻可能的思路是非常有帮助的。你甚至可以问它做出这种设计的原因是什么:

必要的时候也可以使用一些事实去验证它是不是在一本正经的胡说八道:

总结

结合ChatGPT的大量信息，我们能快速得到和验证很多开发过程中延伸的想法，只要我们保持自己的判断以及用额外的一些渠道去验证这些想法，其实可以把ChatGPT作为一个很好的快速原型的工具来使用。

disk mount conditioner模拟慢IO存储

niyue — Sat, 26 Nov 2022 09:13:30 +0000

1. 背景

工作中偶尔会碰到一些系统环境的存储设备IO特别慢，尤其是云上一些存储，一般读取吞吐量在200MB/s，但是我一直使用MacBook作为本地开发的机器，它的磁盘真是飞快，即使是2019年的机器也能达到读取 2000MB/s的吞吐量，所以要想在本地环境下测试一些慢IO的性能测试就不很容易做到。

大概一年前的时候我用Docker里面的--device-read-bps的功能实现了一个基于容器的办法来在macOS下面模拟慢IO的存储设备。--device-read-bps背后使用了Linux的cgroups去做到这一点，不过通过Docker的话用户界面更友好一些。我把代码放在了https://github.com/niyue/slowio，大致的工作原理是：

macOS下应用程序 <==POSIX API==> samba shared volume <==samba==> samba server container in Docker <==cgroup limit==> 磁盘

Docker会启动一个Samba的容器，这一容器受到了指定的读写IOPS以及读写带宽的限制
macOS客户端会把Samba共享出来的卷挂载上来
macOS上的应用访问Samba的共享卷的时候会受到通过Docker指定的读写的IOPS以及带宽的限制
page cache under Docker for Mac’s Linux host needs to be cleared if you want to accurately measure this limit

这么做虽然可以工作，但是对于macOS上面的使用还是太复杂了，不仅得使用Docker，在里面部署Samba，还有一些明显的问题:

要注意清除Docker for Mac中的page cache，以保证这个限制是正确应用的
Samba本身有自己的瓶颈(虽然一般模拟慢IO的时候不会超过这个瓶颈)
需要把文件拷贝到容器中去进行验证

2. dmc

我最近才发现macOS下面从High Sierra开始自带了一个叫做dmc (disk mount conditioner)的工具 [2]，能够非常容易的做到模拟慢IO的存储，这里介绍一下这个工具的使用。

这个dmc到底是啥呢?按照它的manpage的介绍 [2]，它是一个kernel里面内置的服务，通过它能够对特定的挂载点的磁盘IO访问进行降级，从而造成一种假象这个IO访问是在一种更慢的存储设备上进行的。它也能够让挂载点生成自己是一种不同的设备类型，列如SSD类型的磁盘可以被声称为HDD。一般存储设备的访问参数都会根据底层的设备类型而有所不同，这一设置也会对应变更各种参数，例如预读取的设置，磁盘IO的限流等。

它的功能使用其实很简单:

1. dmc list

dmc预置了8种不同的配置，通过dmc list可以列出。它们代表了一些典型的磁盘设备，对于我来说，我只要从中选择一种来使用就足够了。

> dmc list
  0: Faulty 5400 HDD
  1: 5400 HDD
  2: 7200 HDD
  3: Slow SSD
  4: SATA II SSD
  5: SATA III SSD
  6: PCIe 2 SSD
  7: PCIe 3 SSD

2. dmc show

对于每一种预置的设备来说，可以通过dmc show命令来查看它的参数。

❯ dmc show "7200 HDD"
Profile: 7200 HDD
 Type: HDD
 Access time: 17433 us
 Read throughput: 140 MB/s
 Write throughput: 140 MB/s
 I/O Queue Depth: 32
 Max Read Bytes: 33554432
 Max Write Bytes: 33554432
 Max Read Segments: 256
 Max Write Segments: 256

3. dmc start/stop

用户一般创建了一个目录作为挂载点之后，就可以通过dmc start以及dmc stop去设置这个挂载点的行为了。

sudo dmc start /tmp/data "7200 HDD"
sudo dmc stop /tmp/data

4. dmc status

设置完成之后，用户可以通过dmc status去查看某一挂载点的状态。

❯ dmc status /tmp/data
Disk Mount Conditioner: OFF
Profile: Custom
 Type: HDD
 Access time: 17433 us
 Read throughput: 140 MB/s
 Write throughput: 140 MB/s
 I/O Queue Depth: 32
 Max Read Bytes: 1048576
 Max Write Bytes: 1048576
 Max Read Segments: 256
 Max Write Segments: 256

就是这么简单。如果接下来想要验证下这个模拟是不是真的有效，可以通过使用fio [3]等存储测试的工具来验证。

3. 参考

[1] 华为云磁盘类型及性能介绍, https://support.huaweicloud.com/productdesc-evs/zh-cn_topic_0044524691.html

[2] dmc man page, https://manp.gs/mac/1/dmc

[3] fio, https://fio.readthedocs.io

尝试更多的写作

niyue — Sun, 12 Sep 2021 03:30:32 +0000

最近受了一些触动决定开始尝试更多的进行写作，不一定很长，但是希望能积累更多的想法[1]。

有挺多人[3][4][5]的一些写作其实单独看内容不是很特别，不过很长时间的持续其实还是能形成很不同的一个集合，从这个集合中其实能看到和认识很多关于这个人本身的内容。我作为一个基本不炒股的人居然订阅了一个炒股公众号[5]这个事情其实想想也是满奇怪的。

于是信用卡一刷把blog domain又重新恢复了，虽然wordpress对于我这种低频用户感觉还是蛮贵的。还购买了storyworthy这本书[2]想尝试下里面说的homework for life [6]，记录下每天值得回忆的一个故事。

[1] Write 5x more but write 5x less, https://critter.blog/2020/10/02/write-5x-more-but-write-5x-less/

[2] Storyworthy: Engage, Teach, Persuade, and Change Your Life through the Power of Storytelling, https://www.amazon.com/Storyworthy-Engage-Persuade-through-Storytelling/dp/1608685489
[3] http://chenlinux.com
[4] http://www.ruanyifeng.com/blog/
[5] 招财大牛猫, https://mp.weixin.qq.com/s/zPi2-hHP6X-1H_YFhyZVKg
[6] homework for life, https://youtu.be/WQWiLZ1M6xw?t=766

止正

niyue — Sun, 10 Feb 2013 13:10:07 +0000

给儿子起了个名字叫“止正”。写这篇文章的主要目的就是在以后有人，儿子或者其他人，问起为什么叫这个名字的时候可以很容易的直接发一个文章的链接给那个人解释。

我一直觉得起名字是个很重要的事情，毕竟这个代号要用一辈子，而且改起来很麻烦，所以在儿子实际出生之前大概一个月就开始想名字了。老爸是个程序员，而且很不可避免的把所有看到的钉子都用锤子来敲，于是打算写一个程序来找到一个合适的名字。这个事情我之前也做过，上次给侄女起名字的时候我写了一个Python的程序(顺便也学习了下Python)，主要是按照名字的比划数和平仄来生成的，侄女是6月14日出生的，最后经过程序生成和人工筛选我给起了个名字叫做“周睿竹”。一直还挺满意这个名字的，虽然最后没有被采纳。这次准备故技重施再起一个，因为儿子还没有出生，所以只准备考虑个平仄能起个念起来顺一些的名字就好了。

感谢王医生帮忙，已经知道是男孩，所以不用准备两套名字，但是考虑到以后二胎的可能(或者没可能)，还是希望能够有一定的扩展性，能够把二胎的名字也给考虑进去。同时二胎还不知道是男是女，所以二胎的名字还得想个男女都能用上或者都有合适的名字的，嗯，如果考虑到二胎的可能性其实也不这么大，大概就可以把我干的这个事情归类为传说中的over engineering…

上次给侄女起名字顺便学习了下Python，虽然这次是蛇年，但是希望能学点新的东西，于是开始正经的学习Clojure，希望能用Clojure写这么个程序。噼里啪啦下了一堆书扫了一阵，最后发现Clojure Koans的学习方法最有意思，刷刷刷把20个练习给过了一遍，就这么弄弄一个星期已经过掉了，真想用Clojure写这么个程序的时候发现读个文件也很纠结，估计等我写出来的时候儿子都已经满月掉了，只能放弃Clojure用熟悉些的Ruby来写这个程序。接下来大概用了一个晚上的时间就写了一个很简单的版本能够生成一堆的名字组合。

大的方向上，就是我准备起一个三个字的名字，中间的一个字不管几个小孩都是一样的，三个字的音调构成是平仄平、平平仄或者平仄仄，希望能够读起来有转折。也没指望程序能够直接生成一个我想要的名字，但希望能通过生成的组合有所启发。

首先，我找了一张3500个汉字的常用汉字表，包含了2500个常用字和1000个次常用字和它们的拼音。如果你去看国家制定的《通用规范汉字表》，就是那个里面的一级字表。其实如果你看到二级字表你就会发现已经不太适合做正常的中国人姓名了，很多字我念都念不来，三级字表就更加不用说了。

之后用程序把所有的汉字按照拼音进行归类，比如阴平的yi(yī)“一”、“医”、“壹”、“伊”和“衣”是归成一类的，去声的yi(yì)“易”、“意”、“义”和“益”是归成另外一类的。这样所有的读音相同的组合就会被归并，然后按照它们的平仄去生成对应的组合，读的顺的话可以从中任意挑一个喜欢的字来做名字。

可能是上次Python的程序的原因，那次我仅仅取了特定笔画数的字来进行组合，其中还有一个14画的字，所有组合看一边看起来也挺快的。同时也因为这个涉及到同音字有多少的问题，原来并没有太多考虑这个会有多少组合，程序结果一出来发现这个方法的结果实在太多了点，所有的3500个字大约被归类成为400多个平声的音调以及500多个仄声的声调，也就是平均大约每个声调有3个多同音字。单看平仄平的组合（我的首选音调组合）的话就有20w+个组合需要过滤，实在不是人力能够看过去的。于是把那个常用字表里面的3500个字先人工过了一边，删掉了大约一半左右不太适合的字（比如“搬”、“绑”、“磅”之类的），然后又加了一些限制组合，比如我的姓以n打头，名字的第一个字我就想避免也是n打头的，这样过后生成的组合还有10w+。大致人工看了几千个组合，觉得还是多太多，又想到了取名有“女诗经，男楚辞”的说法，于是又把所有的组合去楚辞里面找了一下是否有出处。为了效率上足够快能够处理这么多组合的查询，还对楚辞做了一个预处理，把楚辞中所有的连续两个字的词给计算出来。之后又改成只是使用楚辞过滤现有的汉字，之后又移除了楚辞的限制，这么整理了一阵，最后的效果还是不尽如人意。

之后更换了另外一种方法来找灵感。把所有的仄声的音调拿出来，大约500种，一个个过一边，看到一个合适的声调，再把所有可能搭配的平声和这个单独的仄声声调组合，大约也是500种。或者也可以反过来先平声再仄声，就这么找到了第一个组合“谦允”，本来打算使用“谦允”、“谦许”（男女均可）、“谦诺”这三个名字的。之后又觉得谦允读起来没有a、e、o这种开口音，所以不够正气，又在程序里面加入了开口音的检查，汉字组合至少要包含一个开口音才会被考虑。就这么又找到了“羽扬”、“印波”、“可元”等等组合。

再之后希望名的两个字里面有一定的含义，又通过在线新华字典去查汉字的词组（全人工的），然后人工的看是不是有合适的词可以作为名字，就这么又找到了“允正”这么个名字，这时的打算是使用“允正”、“允平”、“允圆”（女孩名）作为名字的。

这个之后又找了一些，但是始终没有定下来到底用哪个，结果儿子提前两个礼拜出生了。本来计划在癸巳年出生的，结果壬癸年就出啦了，原来老婆盘算的生辰八字五行啥的也不对了。在出生后更加努力的想名字，还得满足老婆希望名字里面有“火”属性的字来配上生辰八字。之后又查了一堆乌七八糟的汉字五行的网站，只有一个网站觉得很不错的，不像其他网站那么扯，这个网站有很多台湾人的名字的流行度和关联的数据，可以很容易的查找到台湾那边起名字的一些规律和可能的名字搭配。

“止正”这个名字是老婆说要“火”属性的字之后特地查的，我把号称属“火”的常用字给人工看了一遍，找了一些自己喜欢的，然后再根据这些字来搭配其他的声调，这个方法找出来了一堆的名字（当然不一定很好），比如“品正”/“品圆”/“品则”（这个也是我严肃考虑过的名字）、“显扬”、“方远”、“中泽”、“登延”等等，再最后经过了几轮的筛选我还是觉得“止正”这个名字比较好，不过老婆和我妈其实觉得“方远”读起来比较顺。

关于这个名字的含义，其实可以解释成很多含义:

举止端正，很显然的一个含义
停止正确，也即不正，也很显然的一个含义，对于这个解释，我也挺能接受的，有点小不正经也还好，原来考虑“品正”这个名字的时候，最怕的就是叫了名字叫做“品正”但其实品不正:(
止于正，有点佛法里面的依止正道的意思，我在网上查了下“止正”，基本查不出什么含义，倒是发现福建武夷山有个庵叫做“止正庵”
我自己的一个解释是这样的，止和正很相像，差一点名字就变成了正正，正正表示十，很完满的样子，止正是九，离十差那么一点，但是也挺不错的，不用追求极端。自己想想也觉得真是相当理科生的一个解释

另外，由“止正”衍生出来的名字我也想好了，如果是个男孩就叫“止匀”，如果是个女孩就叫做“止圆”，当然，这一切得建立在还有一个孩子的基础上。

打了医学出生证明那天才把这名字给最后敲定，也算是了结了一桩大事，希望以后儿子会喜欢这个名字，嘿嘿嘿

一些链接：

l2tpvpn puppet module for Ubuntu

niyue — Sat, 10 Nov 2012 15:31:10 +0000

之前用Microsoft的MSDN订阅里面附带的Windows Azure的额度建了个Ubuntu的服务器，主要就在上面搭了一个L2TP的VPN服务用来翻墙。SSH tunnel倒是很容易翻墙，但是iPad和iPhone之类的设备没办法直接用SOCKS的代理，所以只好整了这么个VPN来翻墙，Mac上用起来也很方便。

在11月1日凌晨的时候微软忽然发邮件说MSDN带的Windows Azure额度上限到了，微软自动关闭了Windows Azure的服务。我确实是设置了超出额度就自动停止的条件的，所以预期下个月能够自动恢复。到了11月1日中午的时候，收到了额度恢复的通知邮件，开始没太在意，觉得是预期中的事情。不过后来使用VPN的时候发现没法连接上了，再去Windows Azure Portal里面看发现原来创建的两台虚拟机都不存在了，这个完全出乎我的意料，服务停止居然把数据都一下删掉了。刚好又临近十八大，一堆网站都被封了，于是只好考虑重新搭一次VPN。

之前主要是参考「Ubuntu下为Android/iOS搭建L2TP/IPSec VPN代理服务器」搭的VPN，再做一次当然也没啥难的，只是觉得很没意思，最后考虑这次用Puppet来自动化这个过程。以前写过些小的web app，都需要涉及搭一些环境来部署，碰到过好几次由于环境造成的问题，后来看到过Puppet这类的工具感觉很适合用来解决这类问题，这次借这个VPN的搭建顺便学习下。

用下来发现Puppet还是一个不错的产品，文档写的也比较好，我主要就看了Learning Puppet这个文档，之后具体用的时候参考了下Puppet的Type Reference就完成了这个l2tpvpn的Puppet module。前后大致一共花了20个小时，最后其实还没做到开始计划的一条命令行完成整个部署的程度，不过对Puppet有了个大概的了解，同时整个搭建VPN的过程大致简化到了一个十分钟的过程，其中大概还有五分钟是用在等待Windows Azure Portal的操作上面的。

最后剩余的一些自动化工作有些繁琐，也没有什么太大干劲去完成，就简单写了个文档描述了一下，所有的东西都丢到Github上面去了，https://github.com/niyue/l2tpvpn，以后如果要在Amazon EC2或者Windows Azure里面搭这么个环境应该都是非常容易的。这个Puppet module还是很简单的一个module，主要用来学习了。有些服务和文件配置的依赖的顺序什么的没有太仔细推敲，除了client/server的配置，基本也算把使用Puppet的整个流程走了一遍，几个部署的时候需要的参数都自动获取了，基本只要填个VPN的用户名密码就可以了。

重新建虚拟机的同时还顺带用iperf测试了一下家里连接到Windows Azure的东亚和东南亚数据中心的带宽。大概查了一下，东亚的数据中心应该是在香港，东南亚的数据中心是在新加坡，很不科学的手工测试了几次，iperf出来的结果差不太多，连接到两个数据中心里面的虚拟机都是大约600~800kbps的带宽，感觉连到新加坡还略快一些。

总的来说:

以后可以尽量多的使用Puppet来管理配置，越复杂的配置以及越长期的配置管理收益越大
TODO: 使用Puppet来自动化管理更加复杂的配置文件，Puppet + Augeas, http://projects.puppetlabs.com/projects/1/wiki/Puppet_Augeas。目前的l2tpvpn为了能够达到idempotence，对/etc/ipsec.conf这个文件的修改可能会覆盖旧有的其他的修改，像我这种个人使用就搭一个VPN的情况不会有问题，但是如果更复杂的系统里面可能就会有问题
Vagrant, http://vagrantup.com/, 看起来有点意思，以后有机会可以试试
比起Amazon EC2的console，Windows Azure Portal的性能真是弱爆了，创建新的虚拟机和添加新的endpoint感觉比Amazon EC2里面的操作都慢了一个数量级
终于能在十八大期间继续VPN了

中兴F420

niyue — Wed, 08 Aug 2012 08:00:36 +0000

今天上海刮台风，躲在家里算在家工作，结果一不小心又岔开思路去折腾了一上午的路由器。

之前好久家里终于改了电信的光纤，当时给了一个中兴的A10 F420光猫路由和一个TP-LINK的TL-WR700N无线迷你路由器。TP-LINK的无线路由器设计的还挺好看的，像Apple的Mega Safe电源适配器。不过用了一段时间发现电信在设备上做了限制，每次最多只能4个设备连，随便算算家里能连Wi-Fi的设备就有8个(两台台式机、一台Macbook、两个iPhone、两个iPad、还有一个Wii)，偶尔公司的电脑还会带回来。虽然几台电脑一般不会一起用，但是iPad和iPhone基本都是一直开着的(没法连接的情况在买了iPad之后尤其明显)，所以这年头4台设备的限制感觉就像电信在给自己找麻烦（我打了老半天客服还叫了它的工作人员上门）。后来折腾了一阵后发现限制其实在TP-LINK的那个路由器上，光猫路由并没有这个限制，所以在光猫路由后面又接了一个自己的路由器算解决了问题，不过自从我知道那个光猫也有一个特殊的管理员密码可以做一些特殊的设置之后，心里就觉得很不爽，老觉得电信在想方设法阻挠我自由使用宽带服务。漫长的一个折腾光猫的过程就这么开始了。

1) 看到网上有人说F420是有开telnet的，自己试着连了一下，发现网上说的用户名密码不对，自己随便乱试了一个root/root居然还真连上了。

2) 连上之后看到了熟悉的命令行界面，我觉得大概没多久就能搞定了。结果发现F420上面的系统是BusyBox，1.01版的，经过定制之后，只有很少的命令能够使用，连vi和mv之类的命令都没有。其实我看了下，电信发的光猫里面的这个BusyBox的版本，连个正经能用的编辑器都没有，我只能在命令行里面cat来cat去的，简单的文件要改还能echo一下，稍复杂点的看也不好看也基本没法改。find和xargs也没有，导致grep也没太大用，最后只能cd+ls+cat到处瞎逛。

3) 后来有次发现系统里面居然还带了一个vsftpd，可以启动ftp的服务。不过里面带的vsftpd配置和其他Linux系统里面的不太一样，启动之后一直无法登陆进去。看了一坨的各个Linux上vsftpd的配置文档和其他一些人的配置，都没搞懂到底怎么让这个vsftpd能够正常访问。

4) 以上3步断断续续就折腾了两个多月时间（当然只是有时候心血来潮的时候会去折腾一下）。今天上午折腾的时候发现加一个莫名的-s的参数启动vsftpd就可以了，其他所有配置都没法生效。

5) 登陆进去高兴多没多久发现Cyberduck没法从FTP下载，但是居然可以上传，也不知道到底是什么权限问题。BusyBox里面连chmod也没有，真是有点无法下手。又捣腾了一阵换了一个FileZilla来连FTP，FileZilla里面的文件”Download”还是失败，但是“View/Edit”居然可以用，我对FTP到底咋工作的真是完全不了解…立刻把FileZilla的默认编辑器换成Sublime Text，至此终于能用上个称手的编辑器来查看和编辑文件了。

6) 看了一大堆的配置文件，确实发现/etc下面有个db_default_Jiangsu_cfg.xml里面有telecomadmin和它的密码，不过db_default_Shanghai_cfg.xml里面却没有。网上确实有一堆的说法，不过估计电信发放的路由器型号不太一样，而且相同型号里面的软件版本又不一样，大多说法看起来都没用。目录树下面有一个/userconfig/cfg/db_user_cfg.xml看起来应该是存储这些设置的地方，不过这个文件其实不是xml文件，而是一个加密处理过的二进制文件。之前的一些方法包括直接查看这个文件、或者对文件内容用base64解码都没有用了。在我家的这个F420里面，这个文件是一个文本文件，但是里面的内容都是16进制数，相比同一个目录下没有经过处理的db_default_cfg.xml，大概只有1/3的体积。折腾了一阵也没折腾出来，后来放弃了这条路。

7) 之后发现网上的另外一个方法能够奏效。修改/home/httpd/login.gch这个文件就可以了，是光猫路由的管理界面的代码，应该是个cgi的脚本啥的。应该是电信内部有人搞出来的吧，调用了一些很神奇的API之后直接可以查找到telecomadmin这个用户的密码。中国电信据说是会动态修改这个密码的，不过我也没太深究它到底是直接从本地加密的那个文件中查找的还是远程从电信的数据库里面查找出来的。

8) 用这个用户名密码登陆进去发现F420只是一个有线的路由器，没有无线功能，所以要想用Wi-Fi的话后面总是要接一个无线路由器/AP的。和别人帖出来的F460不一样，管理界面里面也没有关于最大用户连接数的限制，根本没有WLAN的管理（其实要说有的话应该就是4，因为只有4个有线的接口）。还有些DDNS什么之类的应用，感觉还有些用，但管理配置界面实在太差，根本不知道要填的到底是什么东西。到这里折腾基本完毕，4-8步骤又花费掉了近一个上午的时间。

其实这次最后没折腾出来什么结果（本来希望能够开启F420的无线功能然后不用另外一个路由器的，结果发现根本没这功能），但还是简单小结一下方法，指不定以后换个新的光猫还能用上:

1) Log into your modem (router), a BusyBox

telnet 192.168.1.1 (your-router-ip), root/root

2) Start FTP in BusyBox

/bin/vsftpd -s &

3) Connect to FTP using FileZilla, root/root (or anonymously)

4) Back up /home/httpd/login.gch

cp /home/httpd/login.gch /home/httpd/login.gch.bak

5) Navigate to “/home/httpd/login.gch” with FileZilla, “View/Edit” that file like the patch below, and then upload the saved edition back to your router.

--- /home/httpd/login.gch 2012-07-03 14:05:16.000000000 +0800

+++ login.gch 2012-07-03 14:03:37.000000000 +0800

@@ -152,6 +152,17 @@

set_language("langcn.conf");

langclass = "login_title_centeren";

}

+var CK_HANDLE = create_paralist();

+var login_name = "telecomadmin";

+set_para(CK_HANDLE, "Username", login_name);

+qeury_list_bycond("OBJ_USERINFO_ID", "IGD", CK_HANDLE);

+destroy_paralist(CK_HANDLE);

+CK_HANDLE = create_paralist();

+var CK_IDENTITY = query_identity(0);

+get_inst(CK_HANDLE, "OBJ_USERINFO_ID", CK_IDENTITY);

+var now_pwd = get_para(CK_HANDLE, "Password");

+now_pwd = delMoreSlash(now_pwd);

+destroy_paralist(CK_HANDLE);

%>

@@ -239,4 +250,5 @@

+<%=now_pwd;%>

6) Login to your router, http://192.168.1.1, the username is ‘telecomadmin’, and password should be displayed in the upper left corner of login page.

这里是我觉得最有用的两篇文章，讲的虽然都是F460，但是在我这里都适用:

http://dohkoos.name/how-to-crack-zte-zxa10-f460-v30-for-superuser-password.html

http://laygle.com/2012/07/zte-f460-v3-0-for-an-administrator-password-and-open-the-routing-method/

Connecting the dots

niyue — Fri, 07 Oct 2011 04:31:08 +0000

18年前，我六年级的时候，人生第一次接触到电脑。少年宫的老师告诉我这个电脑叫做“金苹果二型”。我当时不知道Steve Jobs是谁，也不知道什么Mac和PC。

那种电脑外壳有一个彩色的苹果图案，我只知道它有一种功能，可以用来写BASIC程序。那时写了一堆的小程序，通过电脑外置的一个软盘驱动器存在五寸盘里。福建的家里应该还保存这这些软盘，还有好多张甚至都没有格式化过（那个时候我记得是1块多钱一张盘，其实还是挺贵的），不过现在已经很难找到驱动器把它们读出来了。有一个迄今为止我还印象很深刻的一个程序，就是在电脑屏幕上显示出超大号的数字，从1一直到10。
玩了大半个学期之后，老师说换新电脑了，新的电脑型号叫做“386”。我再去用的时候发现已经不能写BASIC程序了，有好多新的功能，不过都不知道干嘛用的，老师叫我们用它来练习打字，我当时的主要时间花在研究SHIFT和CTRL这两个键到底按哪个能把哪个菜单调出来上面了。老师还叫我们问家长是不是要买这种电脑，这样在家里可以自己学习电脑。在那个万元户还很NB的年代，这个玩意实在太贵了，家里根本买不起，而且还不知道学了以后可以用来干嘛。

六年级结束之后，我很长时间一直没有接触电脑。渐渐地这些事情都淡忘了。到了大三，买了人生第一台电脑，是一台PC兼容机，自己去徐家汇的太平洋装的。这台电脑陪伴了我六年时间，写作业，上网，做项目，重装了不知道多少次Windows/Linux。在实验室的时候，还给它接了两个显示器，CRT的，辐射死我了。一直到六年后的某一天，我发现我拿它来跑Ruby的程序的时候，我的思考都得停下来等它的时候，我知道必须得换一台电脑了。那是工作的第一年，我也终于有了钱准备买一台笔记本电脑。大概是由于看到过一位同学买过苹果的iBook，又或者是受Noah的影响，我很兴冲冲的就去徐家汇那里的苹果家园买了现在的这台MacBook。那个时候，上海还没有Apple Store，我也没有查什么Mac buyer’s guide看看是不是最合适的购买的时机（事实上，苹果在一个多月之后就发布了新款的MacBook）。在接下来的时间内，我很愉快的使用着这台电脑，不过我也还是没有把它和18年前的那台金苹果二型联系起来。后来渐渐开始看MacRumors, Apple Intelligence等等网站的消息。一直过了一年多之后，我某天看到关于邓小平关于“计算机的普及要从娃娃抓起”的一个文章，那张Apple ][的图片才让我想起来了18年前的那段经历。

之后又买了iPhone 3GS，给老婆买了iPhone 4，看了2007年以后每次的Apple special event或者WWDC的keynote，注册了iOS developer program。最关键的，我一直觉得现在这个行业和工作正是我所高兴从事的，回想起来，这和18年前的Apple ][也不无关系。

you can’t connect the dots looking forward; you can only connect them looking backwards. So you have to trust that the dots will somehow connect in your future. You have to trust in something — your gut, destiny, life, karma, whatever. This approach has never let me down, and it has made all the difference in my life.

http://news.stanford.edu/news/2005/june15/jobs-061505.html

S.J., you are the hero.

Move to wordpress.com

niyue — Fri, 07 Oct 2011 02:33:26 +0000

Blog又一次搬家了。写其实没写什么内容，但是来回捣腾倒是花了不少功夫。

折腾这种事情其实还挺痛苦的，每次都得纠结好一阵才能下定决心去做这一大堆的挑选服务、备份、付款、迁移导入数据什么之类的工作。不过这次实在也没什么办法了只好换个地方。2007年的时候在九州未来租了一个虚拟主机，当时就选了半天，本来想考虑国外的一些服务提供商，不过一则国外的价钱贵，二则怕服务提供商被GFW封掉。最后选定用九州未来主要就是因为它便宜。￥12.95一个月，虽然服务内容和质量都不甚高，但是对于我这种没有什么要求的客户也算可以了。但就以我这种不太常用的频率，使用期间也碰到了两次长时间的下线。一次听说是因为数据中心有服务器内容有问题，所以被party给查封了。下线了好多天时间，最后终于又上线，但是连个补偿措施也没有。后来又有一次下线，问了客服也只能叫我等，等到最后虽然解决了但是也没有一个说法。不过因为实在懒得折腾，所以还一直忍着勉强用用。原来付费的时候，我一般都是半年一付，主要是因为考虑到这种产品的价钱应该是越来越便宜的。每单位网络、存储、计算资源的价格都是在随时间下降的，所以并不想付太长期的费用。像Amazon的EC2的价格其实一直是在往下走的(比如http://aws.amazon.com/about-aws/whats-new/2009/10/27/announcing-lower-amazon-ec2-instance-pricing)。但是九州未来在2011年的调价彻底让我没想法了。原来只需要￥12.95每月，现在涨价到￥24.95每月。再加上人民币对美元的升值，这个价钱在国外也可以买到了。不知道是因为国内通货膨胀的原因还是它们的运营的成本确实没法控制了。但是在价钱这一最主要的优势没有了之后我只能另选其他的服务提供商。

最后决定直接用wordpress.com提供的服务，虽然它定制程度很低，没有办法host我除了blog之外的那些资源，甚至还可能经常被GFW封掉，但是管理起来起码简单，服务稳定并且不用太多考虑升级或者备份的问题，并且能很容易的把我原来的wordpress blog给迁移过来。然后就是上面的一串导出导入的过程，最后购买了wordpress.com的自定义的域名映射。Phew~

RSS Feed在主页上也没有办法定制，不过我更新了feedburner和feedsky，如果订阅的话可以用下面的地址：
Feedburner: http://feeds.feedburner.com/niyue
Feedsky: http://feed.feedsky.com/niyue