hive 响应慢问题定位

情景描述:

大数据集群,目前有两套hiveserver2和metastore的集群,通过nginx指向进行流量互切,发现流量打到哪个metastore集群,哪个集群就特别卡顿。

那么先来回顾一下hive整个调用流程和框架

image-20201227123352163

Hive 提供的另外一个shell 客户端,也就是我们常用的hive 命令的客户端它的设计是直接启动了一个org.apache.hadoop.hive.cli.CliDriver的进程,这个进程其实主要包含了两块内容一个是提供给我们交互的cli ,另外一个就是我们的Driver 驱动引擎,这样的设计导致如果我们有多个客户端的情况下,我们就需要有多个Driver

image-20201226204741803

但是我们通过HiveServer2连接的时候我们就可以共享Driver,一方面可以简化客户端的设计降低资源损耗,另外一方面还能降低对MetaStore 的压力,减少连接的个数。

原因分析:

目前来看,变慢的原因应该是出现在hs2服务,metastore服务,具体业务,网络,服务器等原因。

1、先从简单的硬件分析入手,验证网络和服务器,这块省略验证过程

2、验证hs2服务,利用排除的方式,通过hive cli进行多次验证,发现也有缓慢的情况,正常应该1秒内返回,所以先不定位hs2的情况

3、验证metastore服务,还是先从直观简单的分析,看能否找出些现象,先看日志:

3.1 既然通过cli和hs2访问都会慢,先从简单的cli发起请求,可以通过加debug参数,发起查询

beeline –verbose=true –showNestedErrs=true –debug=true 看一下客户端是否有明显异常

3.2 cd /var/log/hive 查看一下是否有大量客户端访问

cat hadoop-cmf-hive-HIVEMETASTORE-data-hadoop-16-2.192.168.0.1.log.out | grep audit | grep -v “ugi=hue” | awk -F “ip=” ‘{print $2}’ | awk ‘{print $1}’ | sort | uniq -c | sort -nr | head

3.3 查看服务GC情况 jstat -gcutil pid interval(ms)

3.4 查看服务端日志

Error: Error while compiling statement: [Error 10308]: Attempt to acquire compile lock timed out. (state=,code=10308)
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: [Error 10308]: Attempt to acquire compile lock timed out.
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:241)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:227)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:255)
at org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
at org.apache.hive.beeline.Commands.execute(Commands.java:1180)
at org.apache.hive.beeline.Commands.sql(Commands.java:1094)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1180)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1013)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:922)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: [Error 10308]: Attempt to acquire compile lock timed out.
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:187)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:271)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:337)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:439)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:416)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:282)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:503)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

这里显示有获取metastore连接超时的异常,关键的日志是compiling statement: [Error 10308]: Attempt to acquire compile lock timed out. 编译时候获取编译索失败

顺着这个思路,查看一下代码:

关键字:Completed compiling command(queryId,参考https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/lock/CompileLock.java

分析结论如下:

并且hivethrift每次访问都会初始化metastore 重新初始化元数据,HIVESEVER2提交SQL阻塞tryAcquireCompileLock原因:

a、HIVE1.1 只支持串行编译SQL,hiveserver2并发接受到SQL请求后,在complile阶段变为串行执行。当compilie编译慢时,引起阻塞SQL的提交。

b、compile的慢的原因:complile阶段,会通过hivemetastore访问mysql。目前是hiveserver2的请求打入到同一个metastore,流量上来后,hivemetastore访问mysql速度下降。

解决方案:hiveserver2的请求分摊所有hivemetastore上

发表评论

邮箱地址不会被公开。 必填项已用*标注