Google BigTable

  • Post author:
  • Post category:General
Google BigTable

 

The name Google is synonymous with web technology. For every web user across the world it is a household name. Popular applications of Google like Search, Analytics , Maps and Gmail are extensively used in all parts of the world by millions. The key behind its storage management capabilities is a NoSQL database set up called BigTable. The compressed, high performance, and proprietary data storage system is built on Google File System, Chubby Lock Service, SSTable (log-structured storage like LevelDB) and a few other Google technologies.

Bigtable maps two arbitrary string values (row key and column key) and timestamp (hence three-dimensional mapping) into an associated arbitrary byte array. It is not a relational database and can be better defined as a sparse, distributed multi-dimensional sorted map. It is designed to scale into the petabyte range across hundreds or thousands of machines without encountering any need for reconfiguration.

Each table has multiple dimensions, like fields for time, versions and garbage collection.  The tables are optimized for Google File System (GFS) by splitting into multiple tablets. Segments of the table are split along a chosen row, so that the tablet will be ~200 megabytes in size. When sizes threaten to grow beyond a specified limit, the tablets are compressed using the algorithm BMDiff. The Zippy compression algorithm is publicly known and open-sourced as Snappy [16]. This has less space-optimal variation of LZ77 and is more efficient in terms of computing time. The locations in the GFS of tablets are recorded as database entries in multiple special tablets, which are called “META1” tablets. META1 tablets are found by querying the single “META0” tablet, which typically resides on a server of its own. It is often queried by clients to identify the location of the “META1” tablet, which itself has the answer to the question of where the actual data is located. Like GFS’s master server, the META0 server is not generally a bottleneck since the processor time and bandwidth necessary to discover and transmit META1 locations is minimal and clients aggressively cache locations to minimize queries.

The database offers features such as massive scalability, which is designed to handle massive workloads at consistent low latency and high throughput, so it’s a great choice for both operational and analytical applications, including IoT, user analytics, and financial data analysis. Bigtable offers low latency and high throughput at any scale or application type. You can use Bigtable as the storage engine for large-scale, low-latency applications as well as throughput-intensive data processing and analytics.

Bigtable has the provision to scale to hundreds of petabytes automatically, and can smoothly handle millions of operations per second. Changes to the deployment configuration happen immediately, so there is no downtime during reconfiguration. Bigtable integrates easily with popular Big Data tools like Hadoop, as well as Google Cloud Platform products like Cloud Dataflow and Dataproc. Plus, Bigtable supports the open-source, industry-standard HBase API, which makes it easy for development teams to work on.

A great approach by Google is to make the database endeavour to be experienced in starter version format so that the technology can be understood in the arena of hosting. They also provide hosting enhancements to the hosting world with cloud set up for clients who require proprietary hosting scenario of Google. The technological aspects of BigTable needs to be analysed and incorporated in BigData technologies so that a feasible open source platform can be opened to the hosting scenario in near future.

Leave a Reply