- Data Lake for Enterprises
- Tomcy John Pankaj Misra
- 70字
- 2021-07-02 22:47:09
DStreams
Streams represent discrete sets of RDDs (Resilient Distributed Datasets) for both input and output data streams. Spark streaming provides many of the Streams as part of the Spark streaming framework, while various frameworks supporting Spark streaming, provide their own implementations of RDDs that can be used for DStreams.
These DStreams are divided into micro-batches before getting submitted to the core Spark Engine for processing:
Figure 09: Spark streaming streams