PostgreSQL的流式实时统计应用

帮
助
中
心

网站公告

新闻动态

域名注册

虚拟主机

企业邮箱

数据库

云服务器

备案指南

mysql数据库问题

首页 » 帮助中心 » 数据库 » mysql数据库问题

PostgreSQL的流式实时统计应用

发布日期：2016-5-2 22:5:15

　　PipelineDB是基于PostgreSQL研发的一种流式关系数据库(0.8.1基于9.4.4)，这种数据库的特点是自动处理流式数据，不存储原始数据，只存储处理后的数据，所以非常适合当下流行的实时流式数据处理，例如网站流量统计，IT服务的监控统计，APPStore的浏览统计等等。 http://...

　　PipelineDB是基于PostgreSQL研发的一种流式关系数据库(0.8.1基于9.4.4)，这种数据库的特点是自动处理流式数据，不存储原始数据，只存储处理后的数据，因此非常适合当下流行的实时流式数据处理，比如APPStore的浏览统计，网站流量统计，IT服务的监控统计等等。

　　http://www.postgresql.org/about/news/1596/

　　PipelineDB, an open-source relational streaming-SQL database, publicly released version (0.7.7) today and made the product available as open-source via their website and GitHub. PipelineDB is based on, and is wire compatible with, PostgreSQL 9.4 and has added functionality including continuous SQL queries, probabilistic data structures, sliding windowing, and stream-table joins. For a full description of PipelineDB and its capabilities see their technical documentation.

　　PipelineDB’s fundamental abstraction is what is called a continuous view. These are much like regular SQL views, except that their defining SELECT queries can include streams as a source to read from. The most important property of continuous views is that they only store their output in the database. That output is then continuously updated incrementally as new data flows through streams, and raw stream data is discarded once all continuous views have read it. Let's look at a canonical example:

　　CREATE CONTINUOUS VIEW v AS SELECT COUNT(*) FROM stream

　　Only one row would ever physically exist in PipelineDB for this continuous view, and its value would simply be incremented for each new event ingested.

　　For more information on PipelineDB as a company, product and for examples and benefits, please check out their first blog post on their new website.

　　举例：

　　创建动态流视图，不需要对表进行定义，这类似活生生的NoSQL。参考代码如下所示：

　　pipeline=# CREATE CONTINUOUS VIEW v0 AS SELECT COUNT(*) FROM stream;

　　CREATE CONTINUOUS VIEW

　　pipeline=# CREATE CONTINUOUS VIEW v1 AS SELECT COUNT(*) FROM stream;

　　CREATE CONTINUOUS VIEW

　　激活流视图

　　pipeline=# ACTIVATE;

　　ACTIVATE 2

　　往流写入数据，参考代码如下所示：

　　pipeline=# INSERT INTO stream (x) VALUES (1);

　　INSERT 0 1

　　pipeline=# SET stream_targets TO v0;

　　SET

　　pipeline=# INSERT INTO stream (x) VALUES (1);

　　INSERT 0 1

　　pipeline=# SET stream_targets TO DEFAULT;

　　SET

　　pipeline=# INSERT INTO stream (x) VALUES (1);

　　INSERT 0 1

　　-- 如果不想接收流数据了，停止即可

　　pipeline=# DEACTIVATE;

　　DEACTIVATE 2

　　查询流视图，餐阿卡欧代码如下所示：

　　pipeline=# SELECT count FROM v0;

　　count

　　-------

　　(1 row)

　　pipeline=# SELECT count FROM v1;

　　count

　　-------

　　(1 row)

　　pipeline=#

　　在本地虚拟机进行试用

　　安装，参考代码如下所示：

　　初始化数据库，参考代码如下所示：

　　pdb@digoal-> psql -V

　　psql (PostgreSQL) 9.4.4

　　pdb@digoal-> cd /usr/lib/pipelinedb/bin/

　　pdb@digoal-> ll

　　total 13M

　　-rwxr-xr-x 1 root root 62K Sep 18 01:01 clusterdb

　　-rwxr-xr-x 1 root root 62K Sep 18 01:01 createdb

　　-rwxr-xr-x 1 root root 66K Sep 18 01:01 createlang

　　-rwxr-xr-x 1 root root 63K Sep 18 01:01 createuser

　　-rwxr-xr-x 1 root root 44K Sep 18 01:02 cs2cs

　　-rwxr-xr-x 1 root root 58K Sep 18 01:01 dropdb

　　-rwxr-xr-x 1 root root 66K Sep 18 01:01 droplang

　　-rwxr-xr-x 1 root root 58K Sep 18 01:01 dropuser

　　-rwxr-xr-x 1 root root 776K Sep 18 01:01 ecpg

　　-rwxr-xr-x 1 root root 28K Sep 18 00:57 gdaladdo

　　-rwxr-xr-x 1 root root 79K Sep 18 00:57 gdalbuildvrt

　　-rwxr-xr-x 1 root root 1.3K Sep 18 00:57 gdal-config

　　-rwxr-xr-x 1 root root 33K Sep 18 00:57 gdal_contour

　　-rwxr-xr-x 1 root root 188K Sep 18 00:57 gdaldem

　　-rwxr-xr-x 1 root root 74K Sep 18 00:57 gdalenhance

　　-rwxr-xr-x 1 root root 131K Sep 18 00:57 gdal_grid

　　-rwxr-xr-x 1 root root 83K Sep 18 00:57 gdalinfo

　　-rwxr-xr-x 1 root root 90K Sep 18 00:57 gdallocationinfo

　　-rwxr-xr-x 1 root root 42K Sep 18 00:57 gdalmanage

　　-rwxr-xr-x 1 root root 236K Sep 18 00:57 gdal_rasterize

　　-rwxr-xr-x 1 root root 25K Sep 18 00:57 gdalserver

　　-rwxr-xr-x 1 root root 77K Sep 18 00:57 gdalsrsinfo

　　-rwxr-xr-x 1 root root 49K Sep 18 00:57 gdaltindex

　　-rwxr-xr-x 1 root root 33K Sep 18 00:57 gdaltransform

　　-rwxr-xr-x 1 root root 158K Sep 18 00:57 gdal_translate

　　-rwxr-xr-x 1 root root 168K Sep 18 00:57 gdalwarp

　　-rwxr-xr-x 1 root root 41K Sep 18 01:02 geod

　　-rwxr-xr-x 1 root root 1.3K Sep 18 00:51 geos-config

　　lrwxrwxrwx 1 root root 4 Oct 15 10:47 invgeod -> geod

　　lrwxrwxrwx 1 root root 4 Oct 15 10:47 invproj -> proj

　　-rwxr-xr-x 1 root root 20K Sep 18 01:02 nad2bin

　　-rwxr-xr-x 1 root root 186K Sep 18 00:57 nearblack

　　-rwxr-xr-x 1 root root 374K Sep 18 00:57 ogr2ogr

　　-rwxr-xr-x 1 root root 77K Sep 18 00:57 ogrinfo

　　-rwxr-xr-x 1 root root 283K Sep 18 00:57 ogrlineref

　　-rwxr-xr-x 1 root root 47K Sep 18 00:57 ogrtindex

　　-rwxr-xr-x 1 root root 30K Sep 18 01:01 pg_config

　　-rwxr-xr-x 1 root root 30K Sep 18 01:01 pg_controldata

　　-rwxr-xr-x 1 root root 33K Sep 18 01:01 pg_isready

　　-rwxr-xr-x 1 root root 39K Sep 18 01:01 pg_resetxlog

　　-rwxr-xr-x 1 root root 183K Sep 18 01:02 pgsql2shp

　　lrwxrwxrwx 1 root root 4 Oct 15 10:47 pipeline -> psql

　　-rwxr-xr-x 1 root root 74K Sep 18 01:01 pipeline-basebackup

　　lrwxrwxrwx 1 root root 9 Oct 15 10:47 pipeline-config -> pg_config

　　-rwxr-xr-x 1 root root 44K Sep 18 01:01 pipeline-ctl

　　-rwxr-xr-x 1 root root 355K Sep 18 01:01 pipeline-dump

　　-rwxr-xr-x 1 root root 83K Sep 18 01:01 pipeline-dumpall

　　-rwxr-xr-x 1 root root 105K Sep 18 01:01 pipeline-init

　　-rwxr-xr-x 1 root root 50K Sep 18 01:01 pipeline-receivexlog

　　-rwxr-xr-x 1 root root 56K Sep 18 01:01 pipeline-recvlogical

　　-rwxr-xr-x 1 root root 153K Sep 18 01:01 pipeline-restore

　　-rwxr-xr-x 1 root root 6.2M Sep 18 01:01 pipeline-server

　　lrwxrwxrwx 1 root root 15 Oct 15 10:47 postmaster -> pipeline-server

　　-rwxr-xr-x 1 root root 49K Sep 18 01:02 proj

　　-rwxr-xr-x 1 root root 445K Sep 18 01:01 psql

　　-rwxr-xr-x 1 root root 439K Sep 18 01:02 raster2pgsql

　　-rwxr-xr-x 1 root root 62K Sep 18 01:01 reindexdb

　　-rwxr-xr-x 1 root root 181K Sep 18 01:02 shp2pgsql

　　-rwxr-xr-x 1 root root 27K Sep 18 00:57 testepsg

　　-rwxr-xr-x 1 root root 63K Sep 18 01:01 vacuumdb

　　pdb@digoal-> pipeline-init -D $PGDATA -U postgres -E UTF8 --locale=C -W

　　pdb@digoal-> cd $PGDATA

　　pdb@digoal-> ll

　　total 108K

　　drwx------ 5 pdb pdb 4.0K Oct 15 10:57 base

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 global

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_clog

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_dynshmem

　　-rw------- 1 pdb pdb 4.4K Oct 15 10:57 pg_hba.conf

　　-rw------- 1 pdb pdb 1.6K Oct 15 10:57 pg_ident.conf

　　drwx------ 4 pdb pdb 4.0K Oct 15 10:57 pg_logical

　　drwx------ 4 pdb pdb 4.0K Oct 15 10:57 pg_multixact

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_notify

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_replslot

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_serial

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_snapshots

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_stat

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_stat_tmp

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_subtrans

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_tblspc

　　drwx------ 2 pdb pdb 4.0K Oct 15 10:57 pg_twophase

　　-rw------- 1 pdb pdb 4 Oct 15 10:57 PG_VERSION

　　drwx------ 3 pdb pdb 4.0K Oct 15 10:57 pg_xlog

　　-rw------- 1 pdb pdb 88 Oct 15 10:57 pipelinedb.auto.conf

　　-rw------- 1 pdb pdb 23K Oct 15 10:57 pipelinedb.conf

　　和流处理相关的参数，例如设置内存大小，是否同步，合并的batch，工作进程数等等。

　　pipelinedb.conf

　　#------------------------------------------------------------------------------

　　# CONTINUOUS VIEW OPTIONS

　　#------------------------------------------------------------------------------

　　# size of the buffer for storing unread stream tuples

　　#tuple_buffer_blocks = 128MB

　　# synchronization level for combiner commits; off, local, remote_write, or on

　　#continuous_query_combiner_synchronous_commit = off

　　# maximum amount of memory to use for combiner query executions

　　#continuous_query_combiner_work_mem = 256MB

　　# maximum memory to be used by the combiner for caching; this is independent

　　# of combiner_work_mem

　　#continuous_query_combiner_cache_mem = 32MB

　　# the default fillfactor to use for continuous views

　　#continuous_view_fillfactor = 50

　　# the time in milliseconds a continuous query process will wait for a batch

　　# to accumulate

　　# continuous_query_max_wait = 10

　　# the maximum number of events to accumulate before executing a continuous query

　　# plan on them

　　#continuous_query_batch_size = 10000

　　# the number of parallel continuous query combiner processes to use for

　　# each database

　　#continuous_query_num_combiners = 2

　　# the number of parallel continuous query worker processes to use for

　　# each database

　　#continuous_query_num_workers = 2

　　# allow direct changes to be made to materialization tables?

　　#continuous_query_materialization_table_updatable = off

　　# inserts into streams should be synchronous?

　　#synchronous_stream_insert = off

　　# continuous views that should be affected when writing to streams.

　　# it is string with comma separated values for continuous view names.

　　#stream_targets = ''

　　启动数据库，可以看到原生是支持postgis的，参考代码如下所示：

　　可以看到pipelinedb加入了hll,bloom,tdigest,cmsketch算法，还有很多需要发掘，例如窗口查询的流视图，支持grouping set,等等。

　　在我自己的笔记本中的虚拟机中的性能测试：

　　创建5个动态流视图，动态流视图就是不需要建立基表的流视图。参考代码如下所示：

　　CREATE CONTINUOUS VIEW v0 AS SELECT COUNT(*) FROM stream;

　　CREATE CONTINUOUS VIEW v1 AS SELECT sum(x::int),count(*),avg(y::int) FROM stream;

　　CREATE CONTINUOUS VIEW v001 AS SELECT sum(x::int),count(*),avg(y::int) FROM stream1;

　　CREATE CONTINUOUS VIEW v002 AS SELECT sum(x::int),count(*),avg(y::int) FROM stream2;

　　CREATE CONTINUOUS VIEW v003 AS SELECT sum(x::int),count(*),avg(y::int) FROM stream3;

　　激活流统计，参考代码如下所示：

　　activate;

　　查看数据字典，参考代码如下所示：

　　select relname from pg_class where relkind='C';

　　批量插入测试，参考代码如下所示：

　　pdb@digoal-> vi test.sql

　　insert into stream(x,y,z) select generate_series(1,1000),1,1;

　　insert into stream1(x,y,z) select generate_series(1,1000),1,1;

　　insert into stream2(x,y,z) select generate_series(1,1000),1,1;

　　insert into stream3(x,y,z) select generate_series(1,1000),1,1;

　　测试结果，注意这里需要使用simple或extended ，如果用prepared会导致只有最后一条SQL起作用。现在不清楚是pipelinedb还是pgbench的BUG。参考代码如下所示：

　　pdb@digoal-> /opt/pgsql/bin/pgbench -M extended -n -r -f ./test.sql -P 1 -c 10 -j 10 -T 100000

　　progress: 1.0 s, 133.8 tps, lat 68.279 ms stddev 58.444

　　progress: 2.0 s, 143.9 tps, lat 71.623 ms stddev 53.880

　　progress: 3.0 s, 149.5 tps, lat 66.452 ms stddev 49.727

　　progress: 4.0 s, 148.3 tps, lat 67.085 ms stddev 55.484

　　progress: 5.1 s, 145.7 tps, lat 68.624 ms stddev 67.795

　　每秒入库约58万条记录，并完成5个流视图的统计。

　　若用物理机的话，估计可以到500万每秒的级别。后面有时间再试试。

　　由于都在内存中完成，所以速度非常快。

　　pipelinedb使用了worker进程来处理数据合并。

　　压测时的top如下所示：

　　在写入 10 亿流数据后，数据库的大小依旧只有13MB，由于流数据都在内存中，处理完就丢弃了。参考代码如下所示：

　　[参考]以下是参考：

　　https://github.com/pipelinedb/pipelinedb

　　https://www.pipelinedb.com/

后面会更新一些关于mysql的相关知识，关注mysql的敬请期待。

上一条: PostgreSQL的黑科技

下一条: PostgreSQL 的秒杀场景优化

相关问题		热门问题
MYSQL的随机查询 PostgreSQL 的秒杀场景优化 PostgreSQL的流式实时统计应用 PostgreSQL的黑科技 PostgreSQL 作为图数据库存储引擎深度解析OceanBase Mongodb与Mysql的优缺点 MongoDB 的指南 PostgreSQL 9.5新特性 SQL 语句 Where 1=1 and 在 ...		Navicat批量导出数据库表到Excel表格... Redis的简介与数据类型存储深度解析OceanBase PrestoDB在京东的应用实践三种 NoSQL 数据库的比较关于RDS MySQL IOPS 使用率高的原... 阿里巴巴高级技术专家沈春辉：选择HBase是一... 通过Mysql-Front客户端解决虚拟主机D... redis cluster的使用经验造成RDS for mysql 有时出现CPU...

新手上路		支付方式	快速通道		服务与支持
域名常见问题	主机常见问题	在线支付	域名信息查询	备案信息查询	帮助中心
邮箱常见问题	云服务器问题	线下汇款	域名控制面板	主机控制面板	网络违法举报
数据库问题	备案问题		万网代备案系统		互联网不良信息举报

业务QQ： 11611616 673768899 673768855		联系电话： 023-61066666 66887777 89082222
离线联系： 13452888882 13452888883 13452888886		备案专线： 023-60887777 备案专员QQ：673768866
联系地址：重庆市九龙坡区石桥铺一城精英国际40层17号 Copyright © 重庆典名科技有限公司 023dns.com All Rights Reserved