• 1
  • 2
  • 3
  • 4
  • 5
mysql数据库问题 首 页  »  帮助中心  »  数据库  »  mysql数据库问题
PostgreSQL 的秒杀场景优化
发布日期:2016-5-2 23:5:1

    对同一条记录的多次更新请求是秒杀场景的典型瓶颈,然后只有一个或少量请求是成功的,其他请求是以失败或者更新不到告终。 比如,Iphone的1元秒杀,若我只放出1台Iphone,我们把它看成一条记录,秒杀开始后,谁先抢到(更新这条记录的锁),谁就算秒杀成功。 比如: 使用一个标记位来表示这条记

  秒杀场景的典型瓶颈在于对同一条记录的多次更新请求,然后只有一个或少量请求是成功的,其他请求是以失败或者更新不到告终。

  比如,Iphone的1元秒杀,如果我只放出1台Iphone,我们把它看成一条记录,秒杀开始后,谁先抢到(更新这条记录的锁),谁就算秒杀成功。

  例如:

  使用一个标记位来表示这条记录是否已经被更新,或者记录更新的次数(几台Iphone)。参考代码如下所示:

  update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5; -- 假设可以秒杀5台

  这种方法的以下弊端:

  (1)获得锁的用户在处理这条记录时,可能成功,也可能失败,或者可能需要很长时间,(例如数据库响应慢)在它结束事务前,其他会话只能等着。

  (2)等待是非常不科学的,因为对于没有获得锁的用户,等待是在浪费时间。

  因此一般的优化处理方法是先使用for update nowait的方式来避免等待,即如果无法即可获得锁,那么就不等待。

  比如,以下参考代码:

  begin;

  select 1 from tbl where id=pk for update nowait; -- 如果用户无法即刻获得锁,则返回错误。从而这个事务回滚。

  update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5;

  end;

  上面所示的这种方法可以减少用户的等待时间,因为无法即刻获得锁后就直接返回了。

  但这种方法也存在一定的弊端,对于一个商品,如果可以秒杀多台的话,我们用1条记录来存储多台,降低了秒杀的并发性。

  由于我们用的是行锁。

  解决这个问题办法有很多,最终就是要提高并发性,例如以下方法:

  1. 分段秒杀,把商品数量打散,拆成多个段,从而提高并发处理能力。

  总体来说,优化的思路是减少锁等待时间,避免串行,尽量并行。

  优化到这里就结束了吗?显然没有,以上所说的方法任意数据库都可以做到,若就这样结束怎么体现PostgreSQL的特性呢?

  PostgreSQL还提供了一个锁类型,advisory锁,这种锁比行锁更加轻量,支持会话级别和事务级别。(但是需要注意ID是全局的,否则会相互干扰,也就是说,所有参与秒杀或者需要用到advisory lock的ID需要在单个库内保持全局唯一)

  入一下示例:

  update tbl set xxx=xxx,upd_cnt=upd_cnt+1 where id=pk and upd_cnt+1<=5 and pg_try_advisory_xact_lock(:id);

  最后必须要对比一下for update nowait与advisory lock的性能。

  下面是在一台本地虚拟机上的测试。

  新建一张秒杀表,参考代码如下所示:

  postgres=# \d t1

  Table "public.t1"

  Column | Type | Modifiers

  --------+---------+-----------

  id | integer | not null

  info | text |

  Indexes:

  "t1_pkey" PRIMARY KEY, btree (id)

  只有一条记录,不断的被更新,参考代码如下所示:

  postgres=# select * from t1;

  id | info

  ----+-------------------------------

  1 | 2015-09-14 09:47:04.703904+08

  (1 row)

  压测for update nowait的方式,参考代码如下所示:

  CREATE OR REPLACE FUNCTION public.f1(i_id integer)

  RETURNS void

  LANGUAGE plpgsql

  AS $function$

  declare

  begin

  perform 1 from t1 where id=i_id for update nowait;

  update t1 set info=now()::text where id=i_id;

  exception when others then

  return;

  end;

  $function$;

  postgres@digoal-> cat test1.sql

  \setrandom id 1 1

  select f1(:id);

  压测advisory lock的方式:

  postgres@digoal-> cat test.sql

  \setrandom id 1 1

  update t1 set info=now()::text where id=:id and pg_try_advisory_xact_lock(:id);

  清除压测统计数据,参考代码如下所示:

  postgres=# select pg_stat_reset();

  pg_stat_reset

  ---------------

  (1 row)

  postgres=# select * from pg_stat_all_tables where relname='t1';

  -[ RECORD 1 ]-------+-------

  relid | 184731

  schemaname | public

  relname | t1

  seq_scan | 0

  seq_tup_read | 0

  idx_scan | 0

  idx_tup_fetch | 0

  n_tup_ins | 0

  n_tup_upd | 0

  n_tup_del | 0

  n_tup_hot_upd | 0

  n_live_tup | 0

  n_dead_tup | 0

  n_mod_since_analyze | 0

  last_vacuum |

  last_autovacuum |

  last_analyze |

  last_autoanalyze |

  vacuum_count | 0

  autovacuum_count | 0

  analyze_count | 0

  autoanalyze_count | 0

  压测结果,参考代码如下所示:

  postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test1.sql -c 20 -j 20 -T 60

  ......

  transaction type: Custom query

  scaling factor: 1

  query mode: prepared

  number of clients: 20

  number of threads: 20

  duration: 60 s

  number of transactions actually processed: 792029

  latency average: 1.505 ms

  latency stddev: 4.275 ms

  tps = 13196.542846 (including connections establishing)

  tps = 13257.270709 (excluding connections establishing)

  statement latencies in milliseconds:

  0.002625 \setrandom id 1 1

  1.502420 select f1(:id);

  postgres=# select * from pg_stat_all_tables where relname='t1';

  -[ RECORD 1 ]-------+-------

  relid | 184731

  schemaname | public

  relname | t1

  seq_scan | 0

  seq_tup_read | 0

  idx_scan | 896963 // 大多数是无用功

  idx_tup_fetch | 896963 // 大多数是无用功

  n_tup_ins | 0

  n_tup_upd | 41775

  n_tup_del | 0

  n_tup_hot_upd | 41400

  n_live_tup | 0

  n_dead_tup | 928

  n_mod_since_analyze | 41774

  last_vacuum |

  last_autovacuum |

  last_analyze |

  last_autoanalyze |

  vacuum_count | 0

  autovacuum_count | 0

  analyze_count | 0

  autoanalyze_count | 0

  postgres@digoal-> pgbench -M prepared -n -r -P 1 -f ./test.sql -c 20 -j 20 -T 60

  ......

  transaction type: Custom query

  scaling factor: 1

  query mode: prepared

  number of clients: 20

  number of threads: 20

  duration: 60 s

  number of transactions actually processed: 1392372

  latency average: 0.851 ms

  latency stddev: 2.475 ms

  tps = 23194.831054 (including connections establishing)

  tps = 23400.411501 (excluding connections establishing)

  statement latencies in milliseconds:

  0.002594 \setrandom id 1 1

  0.848536 update t1 set info=now()::text where id=:id and pg_try_advisory_xact_lock(:id);

  postgres=# select * from pg_stat_all_tables where relname='t1';

  -[ RECORD 1 ]-------+--------

  relid | 184731

  schemaname | public

  relname | t1

  seq_scan | 0

  seq_tup_read | 0

  idx_scan | 1368933 // 大多数是无用功

  idx_tup_fetch | 1368933 // 大多数是无用功

  n_tup_ins | 0

  n_tup_upd | 54957

  n_tup_del | 0

  n_tup_hot_upd | 54489

  n_live_tup | 0

  n_dead_tup | 1048

  n_mod_since_analyze | 54957

  last_vacuum |

  last_autovacuum |

  last_analyze |

  last_autoanalyze |

  vacuum_count | 0

  autovacuum_count | 0

  analyze_count | 0

  autoanalyze_count | 0

  我们注意到,不管用哪种方法,都会浪费掉很多次的无用功扫描。

  为了解决无用扫描的问题,可以使用以下所示的函数。(当然,还有更好的方法是对用户透明。)

  CREATE OR REPLACE FUNCTION public.f(i_id integer)

  RETURNS void

  LANGUAGE plpgsql

  AS $function$

  declare

  a_lock boolean := false;

  begin

  select pg_try_advisory_xact_lock(i_id) into a_lock;

  if a_lock then

  update t1 set info=now()::text where id=i_id;

  end if;

  exception when others then

  return;

  end;

  $function$;

  transaction type: Custom query

  scaling factor: 1

  query mode: prepared

  number of clients: 20

  number of threads: 20

  duration: 60 s

  number of transactions actually processed: 1217195

  latency average: 0.973 ms

  latency stddev: 3.563 ms

  tps = 20283.314001 (including connections establishing)

  tps = 20490.143363 (excluding connections establishing)

  statement latencies in milliseconds:

  0.002703 \setrandom id 1 1

  0.970209 select f(:id);

  postgres=# select * from pg_stat_all_tables where relname='t1';

  -[ RECORD 1 ]-------+-------

  relid | 184731

  schemaname | public

  relname | t1

  seq_scan | 0

  seq_tup_read | 0

  idx_scan | 75927

  idx_tup_fetch | 75927

  n_tup_ins | 0

  n_tup_upd | 75927

  n_tup_del | 0

  n_tup_hot_upd | 75902

  n_live_tup | 0

  n_dead_tup | 962

  n_mod_since_analyze | 75927

  last_vacuum |

  last_autovacuum |

  last_analyze |

  last_autoanalyze |

  vacuum_count | 0

  autovacuum_count | 0

  analyze_count | 0

  autoanalyze_count | 0

  除了吞吐率的提升,我们其实还看到真实的处理数(更新次数)也有提升,因此不仅仅是降低了等待延迟,实际上也提升了处理能力。

  最后提供一个物理机上的数据参考,使用128个并发连接,同时对一条记录进行更新:

  不做任何优化的并发处理能力,参考代码如下所示:

  transaction type: Custom query

  scaling factor: 1

  query mode: prepared

  number of clients: 128

  number of threads: 128

  duration: 100 s

  number of transactions actually processed: 285673

  latency average: 44.806 ms

  latency stddev: 45.751 ms

  tps = 2855.547375 (including connections establishing)

  tps = 2855.856976 (excluding connections establishing)

  statement latencies in milliseconds:

  0.002509 \setrandom id 1 1

  44.803299 update t1 set info=now()::text where id=:id;

  使用for update nowait的并发处理能力:

  transaction type: Custom query

  scaling factor: 1

  query mode: prepared

  number of clients: 128

  number of threads: 128

  duration: 100 s

  number of transactions actually processed: 6663253

  latency average: 1.919 ms

  latency stddev: 2.804 ms

  tps = 66623.169445 (including connections establishing)

  tps = 66630.307999 (excluding connections establishing)

  statement latencies in milliseconds:

  0.001934 \setrandom id 1 1

  1.917297 select f1(:id);

  使用advisory lock后的并发处理能力,参考代码如下所示:

  transaction type: Custom query

  scaling factor: 1

  query mode: prepared

  number of clients: 80

  number of threads: 80

  duration: 60 s

  number of transactions actually processed: 13883387

  latency average: 0.344 ms

  latency stddev: 0.535 ms

  tps = 231197.323122 (including connections establishing)

  tps = 231376.427515 (excluding connections establishing)

  statement latencies in milliseconds:

  0.344042 select f(1);

  此时的perf top

  PerfTop: 23883 irqs/sec kernel:32.2% exact: 0.0% [1000Hz cycles], (all, 32 CPUs)

  --------------------------------------------------------------------------------

  samples pcnt function DSO

  _______ _____ ____________________________ ______________________________________________

  10645.00 3.5% GetSnapshotData /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  7963.00 2.6% AllocSetAlloc /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  4720.00 1.6% _int_malloc /lib64/libc-2.12.so

  4270.00 1.4% __schedule [kernel.kallsyms]

  4234.00 1.4% fmgr_info_cxt_security /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  4217.00 1.4% LWLockAcquire /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  3958.00 1.3% hash_search_with_hash_value /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  3656.00 1.2% __GI_vfprintf /lib64/libc-2.12.so

  3572.00 1.2% update_blocked_averages [kernel.kallsyms]

  3338.00 1.1% PostgresMain /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  3267.00 1.1% __switch_to [kernel.kallsyms]

  3095.00 1.0% __strlen_sse42 /lib64/libc-2.12.so

  2996.00 1.0% memcpy /lib64/libc-2.12.so

  2930.00 1.0% _int_free /lib64/libc-2.12.so

  2568.00 0.8% LWLockRelease /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  2446.00 0.8% SearchCatCache /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  2178.00 0.7% ExecInitExpr /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  2053.00 0.7% hash_any /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  2035.00 0.7% __GI___libc_malloc /lib64/libc-2.12.so

  2009.00 0.7% _raw_spin_lock_irqsave [kernel.kallsyms]

  1804.00 0.6% exec_stmt /u02/digoal/soft_bak/pgsql9.5/lib/plpgsql.so

  1764.00 0.6% __memset_sse2 /lib64/libc-2.12.so

  1717.00 0.6% pqParseInput3 /u02/digoal/soft_bak/pgsql9.5/lib/libpq.so.5.8

  1696.00 0.6% do_select [kernel.kallsyms]

  1686.00 0.6% __strcpy_ssse3 /lib64/libc-2.12.so

  1685.00 0.6% update_curr [kernel.kallsyms]

  1619.00 0.5% enqueue_entity [kernel.kallsyms]

  1607.00 0.5% pfree/u02/digoal/soft_bak/pgsql9.5/bin/postgres

  1598.00 0.5% doCustom /u02/digoal/soft_bak/pgsql9.5/bin/pgbench

  1594.00 0.5% idle_cpu [kernel.kallsyms]

  1589.00 0.5% update_cfs_rq_blocked_load [kernel.kallsyms]

  1554.00 0.5% lapic_next_deadline [kernel.kallsyms]

  1512.00 0.5% update_cfs_shares[kernel.kallsyms]

  1491.00 0.5% MemoryContextCreate /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  1482.00 0.5% _raw_spin_lock [kernel.kallsyms]

  1423.00 0.5% palloc /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  1419.00 0.5% __GI___sigsetjmp /lib64/libc-2.12.so

  1412.00 0.5% __cfree /lib64/libc-2.12.so

  1399.00 0.5% unix_stream_recvmsg [kernel.kallsyms]

  1393.00 0.5% __fget_light [kernel.kallsyms]

  1359.00 0.4% ResourceOwnerReleaseInternal /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  1351.00 0.4% AllocSetFree /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  1277.00 0.4% unix_stream_sendmsg [kernel.kallsyms]

  1246.00 0.4% __memcmp_sse4_1 /lib64/libc-2.12.so

  1240.00 0.4% plpgsql_exec_function /u02/digoal/soft_bak/pgsql9.5/lib/plpgsql.so

  1225.00 0.4% expression_tree_walker /u02/digoal/soft_bak/pgsql9.5/bin/postgres

  1160.00 0.4% exec_stmt_block /u02/digoal/soft_bak/pgsql9.5/lib/plpgsql.so

  使用advisory lock,性能相比不做任何优化性能提升了约66倍,相比for update nowait性能提升了约1.8倍。

  这种优化能快速告诉用户是否能秒杀到此类商品,而不需要等待其他用户更新结束后才知道。所以大大降低了RT,提高了吞吐率。

  最后提一下9.5的新特性, select ,,, for update ,,, skip locked.

  http://blog.163.com/digoal@126/blog/static/163877040201551552017215/

  如果能做到UPDATE语法里面,就完美了,直接跳过无法获得锁的行。并发能力瞬间提升,也不用advisory了。

  [参考]

  1. http://www.postgresql.org/docs/9.5/static/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS

  后面会更新一些mysql的相关文章,关注mysql的朋友敬请期待。