Sunday, March 19, 2017

ClickHouse

"ClickHouse is a columnar DBMS for OLAP.

Columnar-oriented databases are better suited to OLAP scenarios (at least 100 times better in processing speed for most queries), for the following reasons:

1. For I/O.

1.1. For an analytical query, only a small number of table columns need to be read. In a column-oriented database, you can read just the data you need. For example, if you need 5 columns out of 100, you can expect a 20-fold reduction in I/O.

1.2. Since data is read in packets, it is easier to compress. Data in columns is also easier to compress. This further reduces the I/O volume.

1.3. Due to the reduced I/O, more data fits in the system cache.

For example, the query "count the number of records for each advertising platform" requires reading one "advertising platform ID" column, which takes up 1 byte uncompressed. If most of the traffic was not from advertising platforms, you can expect at least 10-fold compression of this column. When using a quick compression algorithm, data decompression is possible at a speed of at least several gigabytes of uncompressed data per second. In other words, this query can be processed at a speed of approximately several billion rows per second on a single server. This speed is actually achieved in practice.

 2. For CPU.

Since executing a query requires processing a large number of rows, it helps to dispatch all operations for entire vectors instead of for separate rows, or to implement the query engine so that there is almost no dispatching cost. If you don't do this, with any half-decent disk subsystem, the query interpreter inevitably stalls the CPU. It makes sense to both store data in columns and process it, when possible, by columns.

Distinctive features of ClickHouse

1. True column-oriented DBMS.
2. Data compression.
3. Disk storage of data.
4. Parallel processing on multiple cores.
5. Distributed processing on multiple servers.
6. SQL support.
7. Vector engine.
8. Real-time data updates.
9. Indexes.
10. Suitable for online queries.
11. Support for approximated calculations.
12. Support for nested data structures. Support for arrays as data types.
13. Support for restrictions on query complexity, along with quotas.
14. Data replication and support for data integrity on replicas."

https://clickhouse.yandex/

https://github.com/yandex/ClickHouse

No comments:

Post a Comment