I use BigDataLite for a lot of my sandboxing work. This is a OVA
provided by Oracle which can be run on VirtualBox, VMWare, etc and has the Cloudera Hadoop platform (CDH) along with all of Oracle’s Big Data goodies including Big Data Discovery and Big Data Spatial and Graph (BDSG).
Something that kept tripping me up during my work with BDSG was that HBase would become unavailable. Not being an HBase expert and simply using it as a data store for my property graph data, I wrote it off as mistakes on my part. But, the issue kept reoccuring enough for me to dig into it.
[oracle@bigdatalite ~]$ sudo service hbase-master status;sudo service hbase-regionserver status;sudo service hbase-thrift status;sudo service zookeeper-server status
HBase master daemon is not running [FAILED]
hbase-regionserver is not running.
HBase thrift daemon is running [ OK ]
zookeeper-server is running
Turns out that HBase throws its toys out when I suspend the VM. I don’t know if it’s the clock jumping too much, or simply a session expiring and it not exiting gracefully. I don’t know if this is a VirtualBox fault, host machine (Mac), Hbase, or even Zookeeper; nor do I especially care now that I’ve found the cause and know to look for it whilst doing sandbox work ;-)
This is the log from the HBase master:
2017-01-20 09:19:05,430 ERROR [master/bigdatalite.localdomain/127.0.0.1:60000] zookeeper.ZooKeeperWatcher: master:60000-0x159b8f3800e0014, quorum=localhost:2181, baseZNode=/hbase Received unexpected KeeperException, re-throwing exception
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:623)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267)
at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1150)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1092)
at java.lang.Thread.run(Thread.java:745)
2017-01-20 09:19:05,431 ERROR [master/bigdatalite.localdomain/127.0.0.1:60000] master.ActiveMasterManager: master:60000-0x159b8f3800e0014, quorum=localhost:2181, baseZNode=/hbase Error deleting our own master address node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/master
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.getData(RecoverableZooKeeper.java:359)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.getData(ZKUtil.java:623)
at org.apache.hadoop.hbase.zookeeper.MasterAddressTracker.getMasterAddress(MasterAddressTracker.java:148)
at org.apache.hadoop.hbase.master.ActiveMasterManager.stop(ActiveMasterManager.java:267)
at org.apache.hadoop.hbase.master.HMaster.stopServiceThreads(HMaster.java:1150)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1092)
at java.lang.Thread.run(Thread.java:745)
2017-01-20 09:19:05,432 INFO [master/bigdatalite.localdomain/127.0.0.1:60000] hbase.ChoreService: Chore service for: bigdatalite.localdomain,60000,1484898484413_splitLogManager_ had [] on shutdown
2017-01-20 09:19:05,432 INFO [master/bigdatalite.localdomain/127.0.0.1:60000] flush.MasterFlushTableProcedureManager: stop: server shutting down.
2017-01-20 09:19:05,432 INFO [master/bigdatalite.localdomain/127.0.0.1:60000] ipc.RpcServer: Stopping server on 60000
2017-01-20 09:19:05,434 INFO [RpcServer.listener,port=60000] ipc.RpcServer: RpcServer.listener,port=60000: stopping
2017-01-20 09:19:05,440 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopped
2017-01-20 09:19:05,440 INFO [RpcServer.responder] ipc.RpcServer: RpcServer.responder: stopping
2017-01-20 09:19:20,449 ERROR [master/bigdatalite.localdomain/127.0.0.1:60000] zookeeper.RecoverableZooKeeper: ZooKeeper delete failed after 4 attempts
2017-01-20 09:19:20,449 WARN [master/bigdatalite.localdomain/127.0.0.1:60000] regionserver.HRegionServer: Failed deleting my ephemeral node
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /hbase/rs/bigdatalite.localdomain,60000,1484898484413
at org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.delete(ZooKeeper.java:873)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.delete(RecoverableZooKeeper.java:178)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1236)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.deleteNode(ZKUtil.java:1225)
at org.apache.hadoop.hbase.regionserver.HRegionServer.deleteMyEphemeralNode(HRegionServer.java:1431)
at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:1100)
at java.lang.Thread.run(Thread.java:745)
2017-01-20 09:19:20,511 INFO [master/bigdatalite.localdomain/127.0.0.1:60000] regionserver.HRegionServer: stopping server bigdatalite.localdomain,60000,1484898484413; zookeeper connection closed.
2017-01-20 09:19:20,511 INFO [master/bigdatalite.localdomain/127.0.0.1:60000] regionserver.HRegionServer: master/bigdatalite.localdomain/127.0.0.1:60000 exiting
The solution, crude as it is, is just to turn it off and on again - HBase, that is:
sudo service hbase-master restart;sudo service hbase-regionserver restart;sudo service hbase-thrift restart
Happy days.
And if anyone can tell me the proper resolution to this (other than not suspending my VM), I’m all ears!