Before getting into why and what of Time-Series Database, let us first understand what Time-Series data is all about and its significance.
To define it, a time series is a sequence of information that attaches a time period to each value.
Time series data is a collection of observations (behavior) for a single subject (entity) at different time intervals that can be used to predict the future.
E.g. BP, Sugar, (2 behaviors) for a patient (single entity) collected the first day of every month (multiple intervals)
Time series data can be useful for:
- Tracking Patient vitals in real-time using IoT devices
- Tracking of weather data daily, hourly or weekly
- Tracking of Infrastructure performance
The relevance of time as an axis makes time series data distinct from other types of data. Time series are very frequently plotted via run charts. Time series data has the following 3 characteristics namely Data Writing (Smooth, Continuous, high throughput), Data Query (Read by Time range, Recent Data, Multi-precision Query, Data Mining), Data Storage (Cold and Hot data separation, Multi-precision storage)
Time series data is of two types:
- Measurements that are gathered at regular time intervals (metrics). E.g. Health Monitoring
- Measurements that are gathered at irregular time intervals (events). E.g. Logs and Traces
Why Time-Series Database?
The Time series database has grown in popularity in recent years due to sudden explosion in the amount of data being created from multiple sources.
Time-Series data applications like IT Infrastructure Monitoring, Application / Micro Services/ Container Monitoring, IoT Analytics, and Financial Data analysis are snowballing.
In recent years it was the fastest-growing type of database in the enterprise, largely because of two main reasons usability and scalability.
Usability — Having built-in functions and features to analyze trends readily available at the data-layer
Scale — Time-series data accumulates very quickly, and normal databases are not designed to handle that massive scale, from performance improvements, including higher ingest rates and faster queries at scale
What is Time-Series Database?
Time series databases contain tooling that makes it possible to aggregate data into predetermined time periods.
It also eliminates certain data streams as required and optimizes the storage using various compression algorithms
Time-series databases opt for horizontally scaling data. It is primarily associated with a timestamp and a separate primary key
Find below some of the most popular Time series DB based on their ranking:
Let us look at the most populated Time-series databases:
InfluxDB is an open-source time-series database and a part of a comprehensive platform that offers a highly scalable data ingestion and storage engine, which is very efficient at collecting, storing, querying, visualizing, and taking action on data streams in real-time
It is written in Golang which makes it fast, high-availability storage and retrieval of your time series data in fields such as operations monitoring, application metrics, Internet of Things sensor data, and real-time analytics.
It also outperforms the Elastic search platform both in query time and disk size.
The relational and columnar database kdb+ is well known for exceptionally fast analytics on large-scale datasets in motion and at rest.
KDB+ is unique as an in-memory, time-series database it enables data to be ingested and made immediately available for queries.
This makes it ideal for industrial IoT applications for ingesting, storing, processing, and analyzing time-series data — including IoT sensor data used in manufacturing and financial market data.
KDB+ is so fast because of the following reasons:
- Has built-in query and programming language
- Vector-oriented database
- With a small footprint of 800 KB size
- It is optimized for data storage
TimescaleDB is an Open-source relational database built on top of PostgreSQL that makes SQL scalable for time-series data.
It supports the full range of SQL functionality including time-based aggregates, joins, sub-queries, window functions, and secondary indexes
It runs queries 10–100X faster than PostgreSQL, MongoDB and it can scale to petabytes horizontally and writes millions of data points per second.
With the growing trend of time-series databases, it will be appropriate to consider time-series databases as part of the technology stack while solutioning. Choosing the right Time-series database should be based on the business perspective. But from the developers' point of view, whoever is familiar with SQL can quickly start with TimescaleDB since it is developed on top of PostgreSQL.