Wednesday, November 26, 2014

java - Producing a sorted wordcount with Spark - Code Review Stack Exchange


java - Producing a sorted wordcount with Spark - Code Review Stack Exchange

My method using Java 8

As addendum I'll show how I would identify your problem in question and show you how I would do it.

Input: An input file, consisting of words. Output: A list of the words sorted by frequency in which they occur.

Map<String, Long> occurenceMap = Files.readAllLines(Paths.get("myFile.txt"))          .stream()          .flatMap(line -> Arrays.stream(line.split(" ")))          .collect(Collectors.groupingBy(i -> i, Collectors.counting()));  List<String> sortedWords = occurenceMap.entrySet()          .stream()          .sorted(Comparator.comparing((Map.Entry<String, Long> entry) -> entry.getValue()).reversed())          .map(Map.Entry::getKey)          .collect(Collectors.toList());

This will do the following steps:

  1. Read all lines into a List<String> (care with large files!)
  2. Turn it into a Stream<String>.
  3. Turn that into a Stream<String> by flat mapping every String to a Stream<String> splitting on the blanks.
  4. Collect all elements into a Map<String, Long> grouping by the identity (i -> i) and using as downstream Collectors.counting() such that the map-value will be its count.
  5. Get a Set<Map.Entry<String, Long>> from the map.
  6. Turn it into a Stream<Map.Entry<String, Long>>.
  7. Sort by the reverse order of the value of the entry.
  8. Map the results to a Stream<String>, you lose the frequency information here.
  9. Collect the stream into a List<String>.

Beware that the line .sorted(Comparator.comparing((Map.Entry<String, Long> entry) -> entry.getValue()).reversed()) should really be .sorted(Comparator.comparing(Map.Entry::getValue).reversed(), but type inference is having issues with that and for some reason it will not compile.

I hope the Java 8 way can give you interesting insights.


Read full article from java - Producing a sorted wordcount with Spark - Code Review Stack Exchange

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.