PostgreSQL 作为图数据库存储引擎
发布日期:2016-5-2 22:5:31
CayLey是GO语言写的一个图数据库引擎,支持内置查询编辑器,RESTful API和可视化,支持MQL与JAVASCRIPT查询接口,后端存储支持文件格式,PostgreSQL,mongodb,LevelDB,Bolt,mysql。模块化设计,扩展后端存储非常容易。 本文将以PostgreSQL为例,演示一下CayLey的使用。如图1所示: 图1 安装go: yum install -y go 执行以下所示的命令,克隆cayley与依赖,参考代码已给出: mkdir -p ~/cayley && cd ~/cayley export GOPATH=`pwd` export PATH=$PATH:~/cayley/bin mkdir -p bin pkg src/github.com/google cd src/github.com/google git clone https://github.com/google/cayley cd cayley go get github.com/tools/godep godep restore go build ./cmd/cayley 样本数据如下所示: $ ll data -rw-rw-r--. 1 postgres postgres 26M Jan 17 21:45 30kmoviedata.nq.gz -rw-rw-r--. 1 postgres postgres 463 Jan 17 21:45 testdata.nq $ gunzip 30kmoviedata.nq.gz cayley使用帮助,参考代码如下所示: $ ./cayley --help No command --help Usage: cayley COMMAND [flags] Commands: init Create an empty database. load Bulk-load a quad file into the database. http Serve an HTTP endpoint on the given host and port. dump Bulk-dump the database into a quad file. repl Drop into a REPL of the given query language. version Version information. Flags: -alsologtostderr=false: log to standard error as well as files -assets="": Explicit path to the HTTP assets. -config="": Path to an explicit configuration file. -db="memstore": Database Backend. -dbpath="/tmp/testdb": Path to the database. -dump="dbdump.nq": Quad file to dump the database to (".gz" supported, "-" for stdout). -dump_type="quad": Quad file format ("json", "quad", "gml", "graphml"). -format="cquad": Quad format to use for loading ("cquad" or "nquad"). -host="127.0.0.1": Host to listen on (defaults to all). -ignoredup=false: Don't stop loading on duplicated key on add -ignoremissing=false: Don't stop loading on missing key on delete -init=false: Initialize the database before using it. Equivalent to running `cayley init` followed by the given command. -load_size=10000: Size of quadsets to load -log_backtrace_at=:0: when logging hits line file:N, emit a stack trace -log_dir="": If non-empty, write log files in this directory -logstashtype="": enable logstash logging and define the type -logstashurl="172.17.42.1:5042": logstash url and port -logtostderr=false: log to standard error instead of files -port="64210": Port to listen on. -prof="": Output profiling file. -quads="": Quad file to load before going to REPL. -query_lang="gremlin": Use this parser as the query language. -read_only=false: Disable writing via HTTP. -replication="single": Replication method. -stderrthreshold=0: logs at or above this threshold go to stderr -timeout=30s: Elapsed time until an individual query times out. -v=0: log level for V logs -vmodule=: comma-separated list of pattern=N settings for file-filtered logging 假设已有一个PostgreSQL数据库,如下所示: IP : 192.168.150.132 PORT : 1921 DBNAME : postgres USER : digoal PWD : digoal_pwd 初始化,参考代码如下所示: ./cayley init -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable" 导入数据,参考代码如下所示: ./cayley load -quads="data/" -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable" 50亿测试数据约2TB。 开启repl或http接口服务。参考代码如下所示: ./cayley repl -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210" 或者 ./cayley http -db=sql -dbpath="postgres://digoal:digoal_pwd@192.168.150.132:1921/postgres?sslmode=disable" -host="0.0.0.0" -port="64210" 使用http接口的图例,如图2所示: 图2 Query Shape,如图3所示: 图3 后端是PostgreSQL时,Cayley自动将MQL或JAVASCRIPT自动转换成SQL到数据库查询,并返回结果。 对于PostgreSQL作为后端的场景,优化的手段: (1)使用GPU加速HASH JOIN和数据扫描。 (2) 使用分区表,减少无用块扫描。 (3)其他通用的PG优化手段 若数据量大到单库的计算资源和IO资源性能支撑不住,可以用Greenplum来实现分布式查询。 查询接口: Javascript/Gremlin API documentation 1.图对象 根据节点ID,检索,返回路径 2.路径对象 路径相交,节点匹配等 3.查询路径对象 数值转换,等。 后面会更新一些mysql的相关文章,敬请关注。 上一条: 深度解析OceanBase 下一条: PostgreSQL的黑科技
|