Word count

Here’s another neat streams snippet, which in three lines handles what would take at least two loops in pre-Java 8 code. The task at hand is to count the number of occurrences of each unique word in a file. Given the content shown below, this Bash command line would solve it.

cat words | tr ' ' '\n' | sort | uniq -c

File content:

one two three four
two three four
three four four

Shell output:

  4 four
  1 one
  3 three
  2 two

The Java code follows a similar flow as seen in the pipes above: Read the file, split each line by space, and flatten the result from all lines into a single stream. Finally, the collect() function is used with the groupingBy() helper, to map each token (or word) in the stream (the identity) to its count.

    Map<String, Long> map = Files.lines(path)
        .flatMap(line -> Stream.of(line.split(" ")))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

The Java map will contain the following key-value pairs. Here, the words are also accidentally sorted alphabetically. However, the order is not guaranteed by collect() function, since it returns a HashMap.

  {four=4, one=1, three=3, two=2}

The full listing:

WordCount.java

GitHub Raw

/* Copyright rememberjava.com. Licensed under GPL 3. See http://rememberjava.com/license */
package com.rememberjava.lambda;

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Map;
import java.util.function.Function;
import java.util.stream.Collectors;
import java.util.stream.Stream;

import org.junit.Test;

public class WordCount {

    // TODO: Fix path
    //@Test
  public void disabled_countWords() throws IOException {
    Path path = Paths.get("com/rememberjava/lambda/words");

    Map<String, Long> map = Files.lines(path)
        .flatMap(line -> Stream.of(line.split(" ")))
        .collect(Collectors.groupingBy(Function.identity(), Collectors.counting()));

    System.out.println(map);
  }

  @Test
  public void test_dummy() {
  }
}