hive 响应慢问题定位

情景描述：

大数据集群，目前有两套hiveserver2和metastore的集群，通过nginx指向进行流量互切，发现流量打到哪个metastore集群，哪个集群就特别卡顿。

那么先来回顾一下hive整个调用流程和框架

Hive 提供的另外一个shell 客户端，也就是我们常用的hive 命令的客户端它的设计是直接启动了一个org.apache.hadoop.hive.cli.CliDriver的进程，这个进程其实主要包含了两块内容一个是提供给我们交互的cli ，另外一个就是我们的Driver 驱动引擎，这样的设计导致如果我们有多个客户端的情况下，我们就需要有多个Driver

但是我们通过HiveServer2连接的时候我们就可以共享Driver,一方面可以简化客户端的设计降低资源损耗，另外一方面还能降低对MetaStore 的压力，减少连接的个数。

原因分析：

目前来看，变慢的原因应该是出现在hs2服务，metastore服务，具体业务，网络，服务器等原因。

1、先从简单的硬件分析入手，验证网络和服务器，这块省略验证过程

2、验证hs2服务，利用排除的方式，通过hive cli进行多次验证，发现也有缓慢的情况，正常应该1秒内返回，所以先不定位hs2的情况

3、验证metastore服务，还是先从直观简单的分析，看能否找出些现象，先看日志：

3.1 既然通过cli和hs2访问都会慢，先从简单的cli发起请求，可以通过加debug参数，发起查询

beeline –verbose=true –showNestedErrs=true –debug=true 看一下客户端是否有明显异常

3.2 cd /var/log/hive 查看一下是否有大量客户端访问

cat hadoop-cmf-hive-HIVEMETASTORE-data-hadoop-16-2.192.168.0.1.log.out | grep audit | grep -v “ugi=hue” | awk -F “ip=” ‘{print $2}’ | awk ‘{print $1}’ | sort | uniq -c | sort -nr | head

3.3 查看服务GC情况 jstat -gcutil pid interval(ms)

3.4 查看服务端日志

Error: Error while compiling statement: [Error 10308]: Attempt to acquire compile lock timed out. (state=,code=10308)
org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: [Error 10308]: Attempt to acquire compile lock timed out.
at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:241)
at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:227)
at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:255)
at org.apache.hive.beeline.Commands.executeInternal(Commands.java:989)
at org.apache.hive.beeline.Commands.execute(Commands.java:1180)
at org.apache.hive.beeline.Commands.sql(Commands.java:1094)
at org.apache.hive.beeline.BeeLine.dispatch(BeeLine.java:1180)
at org.apache.hive.beeline.BeeLine.execute(BeeLine.java:1013)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:922)
at org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:226)
at org.apache.hadoop.util.RunJar.main(RunJar.java:141)
Caused by: org.apache.hive.service.cli.HiveSQLException: Error while compiling statement: [Error 10308]: Attempt to acquire compile lock timed out.
at org.apache.hive.service.cli.operation.Operation.toSQLException(Operation.java:400)
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:187)
at org.apache.hive.service.cli.operation.SQLOperation.runInternal(SQLOperation.java:271)
at org.apache.hive.service.cli.operation.Operation.run(Operation.java:337)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:439)
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:416)
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:282)
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:503)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313)
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:56)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:286)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

这里显示有获取metastore连接超时的异常，关键的日志是compiling statement: [Error 10308]: Attempt to acquire compile lock timed out. 编译时候获取编译索失败

顺着这个思路，查看一下代码：

关键字：Completed compiling command(queryId，参考https://github.com/apache/hive/blob/master/ql/src/java/org/apache/hadoop/hive/ql/lock/CompileLock.java

分析结论如下：

并且hivethrift每次访问都会初始化metastore 重新初始化元数据，HIVESEVER2提交SQL阻塞tryAcquireCompileLock原因：

a、HIVE1.1 只支持串行编译SQL，hiveserver2并发接受到SQL请求后，在complile阶段变为串行执行。当compilie编译慢时，引起阻塞SQL的提交。

b、compile的慢的原因：complile阶段，会通过hivemetastore访问mysql。目前是hiveserver2的请求打入到同一个metastore,流量上来后，hivemetastore访问mysql速度下降。

解决方案：hiveserver2的请求分摊所有hivemetastore上

近期文章

近期评论

文章归档

分类目录

功能

近期文章

近期评论

文章归档

分类目录

功能

发表评论取消回复

近期文章

近期评论

文章归档

分类目录

功能

发表评论 取消回复

相关文章

Spark提交任务流程

hive任务结束了，但是hive终端或者命令行没退出

Mac 安装Hadoop

数据库智能运维探索与实践（转载）

发表评论取消回复