Exploring the Data Ingestion Layer

The data ingestion layer is responsible for consuming messages from the messaging layer and performing the required transformation to ingest them in the lambda layer such that the transformed output conforms to the expected storage or processing formats. This layer must also make sure that the messages are consumed in a consistent way, such that no message is lost and every message is processed at least once.

This layer is expected to have multiple consumers/threads for parallel consumption of messages. Every such consumer in this layer must be stateless and must have fast streaming capability. These streams must be drawn from the messaging layer and the generated output must also be streamed into the lambda layer. The data ingestion layer must ensure that the rate of message consumption is always more than or equal to the message ingestion rates, such that there is no latency to process the messages/events. A lower processing rate or latency in this layer will result in a pile-up of messages in the messaging layer and hence would compromise the near-real-time processing capability of the messages/events. This layer should also support a fast consumption approach for recovery from such pile-ups if required.

Hence there is an implicit need that this layer is always in near real time with minimum latency such that there are no messages are piled up in the messaging layer. In order to be near real time, this layer must have capability to continuously consume the messages/events and have enough resiliency for fail-over.

The message consumers here play a vital role of delivering the messages to the lambda layer for further processing. Hence the internal components of message consumers is similar to the data acquisition layer with the differentiation that the message consumers are aware of the message format from the messaging layer (source) and the format in which the messages need to be delivered to the lambda layer (destination). The message consumption may be done in micro-batches to achieve the required resource optimization and achieve better system efficiency.

Figure 07: Message consumers

The message consumers, however, may need to push the output stream for both batch as well as speed layer processing in the lambda layer.