Skip to content
Tags

Java 8 Streams

August 10, 2013

Streams were introduced in Java 8, as a way to operate on streams of values without modifying the underlying data storage. For example, when you filter a data set, you create a new stream rather than removing any data from the source. Streams are functional in style, because each intermediate stream operation produces another stream until a result operation is called. All intermediate operations are lazily evaluate which allows for optimizations to be me. All of the java.util collections now support streaming via new default methods added to the java.util.Collection interface. In addition to java.util classes the following classes also support streaming: java.io.BufferedReader, java.lang.CharSequence, java.nio.file.Files.

Streams can be used to create pipelines of operations that operate on streams producing new streams until a terminating operation is invoked like forEach or findFirst. A pipeline of operations lends itself to parallel processing, but can still be used for sequential processing. The pipeline creation function are the same whether they are for a parallel stream or a sequential stream. Note: for parallel streams to be consistent and deterministic with their serial stream counterparts, they must be constructed with stateless operations (like map, filter). Some example stateful operations are distinct and count because they require the entire data stream to be processed.

Streams don’t specifically have an order defined, but do carry over the order behavior of the underlying data source. For example, a stream created from an ArrayList will stream the data in the order that it appears in the List, while a HashMap does not have a specific traversal order and neither will the stream. Streams can impose order or de-order by calling those functions on the stream sorted() and unordered(). In some cases, removing an ordering may improve parallel performance as it removes a constraint.

Parallel streams can safely be used on non-threadsafe data sources so long as there is no interference during the stream processing. The safest way to ensure non-interference is to make sure the data structure is not modified during processing. The approach of not modifying the underlying data, instead only transforming it with functions is exactly the functional style that enables parallelism. So stateless operations should absolutely be favored over stateful.

Reductions are easily parallelized if the operations are associative. Operations like calculating the sum of numbers can easily be parallelized as each processor can sum it’s subset and then finally those subset results can be summed for the final answer.

Advertisements

From → Java

Leave a Comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: