Files list parallel stream

I was really confused by the output from the files list stream

16:25:29.032 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.031 [main] INFO StressTest - compute from parallel file with collection
16:25:29.032 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.038 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel file with collection
16:25:29.032 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel file with collection
16:25:29.039 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel file with collection
16:25:29.037 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.037 [main] INFO StressTest - compute from parallel file with collection
16:25:29.039 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel file with collection
16:25:29.039 [main] INFO StressTest - parallel compute for file names with collect
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.042 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-3] INFO StressTest - compute from parallel string with collection
16:25:29.042 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-1] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [ForkJoinPool.commonPool-worker-2] INFO StressTest - compute from parallel string with collection
16:25:29.043 [main] INFO StressTest - parallel compute for file names
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.045 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.048 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.049 [main] INFO StressTest - compute from parallel string
16:25:29.050 [main] INFO StressTest - compute from parallel string
16:25:29.050 [main] INFO StressTest - compute from parallel string
16:25:29.050 [main] INFO StressTest - compute from parallel string
16:25:29.050 [main] INFO StressTest - parallel compute for files
16:25:29.051 [main] INFO StressTest - compute from parallel file
16:25:29.051 [main] INFO StressTest - compute from parallel file
16:25:29.051 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.052 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.053 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file
16:25:29.054 [main] INFO StressTest - compute from parallel file

Process finished with exit code 0

corresponding to code

        try {
            Files.list(Paths.get("...."))
                    .parallel()
                    .filter(Files::isRegularFile)
                    .map(Path::toFile)
                    .filter(file -> file.getName().startsWith(Constants.AB_MODEL))
                    .collect(Collectors.toList())
                    .parallelStream()
                    .forEach(s -> {
                        log.info("compute from parallel file with collection");
                    });
        } catch (IOException e) {
            e.printStackTrace();
        }

        log.info("parallel compute for file names with collect");
        try {
            Files.list(Paths.get("...."))
                    .parallel()
                    .filter(Files::isRegularFile)
                    .map(Path::toFile)
                    .filter(file -> file.getName().startsWith(Constants.AB_MODEL))
                    .map(file -> file.getName())
                    .collect(Collectors.toList())
                    .parallelStream()
                    .forEach(s -> {
                        log.info("compute from parallel string with collection");
                    });
        } catch (IOException e) {
            e.printStackTrace();
        }

        log.info("parallel compute for file names");
        try {
            Files.list(Paths.get("...."))
                    .parallel()
                    .filter(Files::isRegularFile)
                    .map(Path::toFile)
                    .filter(file -> file.getName().startsWith(Constants.AB_MODEL))
                    .map(file -> file.getName())
                    .forEach(s -> {
                        log.info("compute from parallel string");
                    });
        } catch (IOException e) {
            e.printStackTrace();
        }

        log.info("parallel compute for files");
        try {
            Files.list(Paths.get("...."))
                    .parallel()
                    .filter(Files::isRegularFile)
                    .map(Path::toFile)
                    .filter(file -> file.getName().startsWith(Constants.AB_MODEL))
                    .forEach(s -> {
                        log.info("compute from parallel file");
                    });
        } catch (IOException e) {
            e.printStackTrace();
        }

so the parallel from Files.list is resulting in a single thread to process all files (~50 files).

Unless there is a collect to do a parallel stream again, then it will split into the common pool.

After a lot of investigate and research, turns out JCP has a really not crafted implementations on parallel:

source: http://mail.openjdk.java.net/pipermail/core-libs-dev/2015-July/034539.html

so basically when it’s doing splitIterator, it was using Long.MAX_VALUE:

                     IteratorSpliterator (est. MAX_VALUE elements)
                           |                    |
ArraySpliterator (est. 1024 elements)   IteratorSpliterator (est. MAX_VALUE elements)
                                           |        |
                           /---------------/        |
                           |                        |
ArraySpliterator (est. 2048 elements)   IteratorSpliterator (est. MAX_VALUE elements)
                                           |        |
                           /---------------/        |
                           |                        |
ArraySpliterator (est. 3072 elements)   IteratorSpliterator (est. MAX_VALUE elements)
                                           |        |
                           /---------------/        |
                           |                        |
ArraySpliterator (est. 856 elements)    IteratorSpliterator (est. MAX_VALUE elements)
                                                    |
                                        (split returns null: refuses to split anymore)

source: https://stackoverflow.com/questions/34341656/why-is-files-list-parallel-stream-performing-so-much-slower-than-using-collect

code: https://github.com/openjdk/jdk/blob/d234388042e00f6933b4d723614ef42ec41e0c25/src/java.base/share/classes/java/util/Spliterators.java#L1784

            do { a[j] = i.next(); } while (++j < n && i.hasNext());

One thought on “Files list parallel stream

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s