HBase Cheatsheet

HBase Architecture & Data Model

Core Components

HRegionServer	Hosts and manages HRegions; handles read/write requests.
HMaster	Assigns regions to RegionServers, handles schema changes, and performs administrative tasks.
ZooKeeper	Maintains configuration information, naming, and distributed synchronization.
HDFS	Hadoop Distributed File System; stores HBase data persistently.

Data Model

HBase is a distributed, scalable, big data store.

It’s designed to store and retrieve data within large tables. Data is stored as key-value pairs.

Table: Contains rows.
Row Key: Unique identifier for a row.
Column Family: Groups related columns.
Column Qualifier: Identifies a column within a column family.
Version: Each cell value has a version timestamp.

Key Concepts

Regions	Tables are split into regions. A region contains a subset of rows. Regions are the unit of distribution and scalability.
Store	Each region contains one or more stores. A store contains a MemStore and zero or more StoreFiles (HFiles).
MemStore	In-memory buffer that stores recent writes.
HFile	Sorted key-value pairs stored on HDFS.

Basic HBase Operations (CLI)

Table Management

create 'table_name', 'column_family1', 'column_family2' - Creates a new table.

list - Lists all tables.

describe 'table_name' - Describes the table schema.

disable 'table_name' - Disables a table.

enable 'table_name' - Enables a table.

drop 'table_name' - Drops a table (must be disabled first).

exists 'table_name' - Checks if table exists.

Data Manipulation

put 'table_name', 'row_key', 'column_family:qualifier', 'value' - Inserts or updates a cell value.

get 'table_name', 'row_key' - Retrieves all columns for a given row.

get 'table_name', 'row_key', 'column_family:qualifier' - Retrieves a specific cell.

scan 'table_name' - Scans the entire table.

scan 'table_name', {STARTROW => 'row_key1', STOPROW => 'row_key2'} - Scans a range of rows.

delete 'table_name', 'row_key', 'column_family:qualifier' - Deletes a specific cell.

deleteall 'table_name', 'row_key' - Deletes all cells in a row.

HBase Shell Commands

Namespace Management

create_namespace 'namespace_name' - Creates a namespace.

list_namespace - Lists all namespaces.

describe_namespace 'namespace_name' - Describes a namespace.

alter_namespace 'namespace_name', {METHOD => 'set', 'PROPERTY_NAME' => 'property_value'} - Alters a namespace property.

drop_namespace 'namespace_name' - Drops a namespace.

Advanced Scan Operations

scan 'table_name', {COLUMNS => ['col_family1', 'col_family2:qualifier']} - Scans specific column families or columns.

scan 'table_name', {LIMIT => 10} - Limits the number of rows returned.

scan 'table_name', {REVERSED => true} - Scans in reverse order.

scan 'table_name', {FILTER => "RowFilter(=, 'binary:row_key')"} - Applies a filter to the scan.

scan 'table_name', {VERSIONS => 5} - Retrieves the last 5 versions of each cell.

count 'table_name' - Counts the number of rows in a table.

Configuration

HBase configuration is managed through hbase-site.xml. Key properties include:

hbase.rootdir: HDFS directory for HBase data.
hbase.zookeeper.quorum: List of ZooKeeper servers.
hbase.cluster.distributed: Set to true for distributed mode.

HBase Java API

Connecting to HBase

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.client.Connection;
import org.apache.hadoop.hbase.client.ConnectionFactory;

Configuration config = HBaseConfiguration.create();
config.set("hbase.zookeeper.quorum", "localhost"); // Replace with your ZooKeeper quorum

try (Connection connection = ConnectionFactory.createConnection(config)) {
    // Use the connection
} catch (Exception e) {
    e.printStackTrace();
}

Basic Operations (Java)

import org.apache.hadoop.hbase.TableName;
import org.apache.hadoop.hbase.client.*;
import org.apache.hadoop.hbase.util.Bytes;

TableName tableName = TableName.valueOf("mytable");

try (Table table = connection.getTable(tableName)) {
    // Put data
    Put put = new Put(Bytes.toBytes("row1"));
    put.addColumn(Bytes.toBytes("cf1"), Bytes.toBytes("qual1"), Bytes.toBytes("value1"));
    table.put(put);

    // Get data
    Get get = new Get(Bytes.toBytes("row1"));
    Result result = table.get(get);
    byte[] value = result.getValue(Bytes.toBytes("cf1"), Bytes.toBytes("qual1"));
    String valueStr = Bytes.toString(value);
    System.out.println("Value: " + valueStr);

    // Scan data
    Scan scan = new Scan();
    try (ResultScanner scanner = table.getScanner(scan)) {
        for (Result row : scanner) {
            // Process each row
        }
    }
} catch (Exception e) {
    e.printStackTrace();
}

Browse / HBase Cheatsheet

HBase Cheatsheet

HBase Cheatsheet

HBase Architecture & Data Model

Core Components

Data Model

Key Concepts

Basic HBase Operations (CLI)

Table Management

Data Manipulation

HBase Shell Commands

Namespace Management

Advanced Scan Operations

Configuration

HBase Java API

Connecting to HBase

Basic Operations (Java)