The Java 8 Streams API offers functional-style operations, and a simple way to execute such operations in parallel. To jump right into an example, the following two test methods show the difference between sequential and parallel execution of the the println() method over each of the elements of the stream.
As is expected, the default sequential execution will print the elements of the stream, which in this example are integers from 1 to 14, in natural order. By using the parallel() method to create a parallel stream and call the same print method, the only noticeable difference is that they are printed out of order.
To see how the parallel stream behaves, the last test applies a different method to each of the elements of the stream. In this example, the time spent by the work() method on each element is linearly proportional to its value. That is, for the element of value 1 it spends 100ms, for 2 its 200 ms and so on. The “work” it does, is simply to sleep for intervals of 100 ms, and the rest of the code is dedicated to printing and formatting the table below. However, it serves the purpose of demonstrating how parallel execution behaves, and how it relates to the underlying CPU(s).
In the result table, each element of the stream is represented by a row. Each row shows the value of the element, the time-slots the work() method was executing, and to the right the time in milliseconds when work started and finished. Each 100 ms slot is represented by a hash. As can be seen, the first row was element 9, thus it marked off nine slots, and the starting time was at 6 ms, and finish at 909 ms.
Furthermore, since this was run on a machine with four CPU cores, the stream will execute four calls in parallel. This can be seen by both the hashes and the start times of the first four rows. Next, when element 2 (fourth row) finishes at 207 ms, a new element is immediately started (element 3, fifth row).
In this example, the total number of 100 ms “units of work” can be found by the formula for the triangular number where n = 14, or 14 * (14 + 1) / 2 = 105. Meaning that, sequential execution would have taken 10.5 seconds, while four parallel CPUs managed in 3 seconds.
In the second table below, the same code is executed on a dual core CPU, and it is clear that now only two methods execute in parallel. That will of course lead to a longer overall runtime, of about 5.4 seconds for this example. This could lead to a discussion on task and scheduling optimisation, however it goes beyond this article, and what is possible with the simple parallel Stream construct.