java - Producing a sorted wordcount with Spark - Code Review Stack Exchange
My method using Java 8
As addendum I'll show how I would identify your problem in question and show you how I would do it.
Input: An input file, consisting of words. Output: A list of the words sorted by frequency in which they occur.
Map<String, Long> occurenceMap = Files.readAllLines(Paths.get("myFile.txt")) .stream() .flatMap(line -> Arrays.stream(line.split(" "))) .collect(Collectors.groupingBy(i -> i, Collectors.counting())); List<String> sortedWords = occurenceMap.entrySet() .stream() .sorted(Comparator.comparing((Map.Entry<String, Long> entry) -> entry.getValue()).reversed()) .map(Map.Entry::getKey) .collect(Collectors.toList()); This will do the following steps:
- Read all lines into a
List<String>(care with large files!) - Turn it into a
Stream<String>. - Turn that into a
Stream<String>by flat mapping everyStringto aStream<String>splitting on the blanks. - Collect all elements into a
Map<String, Long>grouping by the identity (i -> i) and using as downstreamCollectors.counting()such that the map-value will be its count. - Get a
Set<Map.Entry<String, Long>>from the map. - Turn it into a
Stream<Map.Entry<String, Long>>. - Sort by the reverse order of the value of the entry.
- Map the results to a
Stream<String>, you lose the frequency information here. - Collect the stream into a
List<String>.
Beware that the line .sorted(Comparator.comparing((Map.Entry<String, Long> entry) -> entry.getValue()).reversed()) should really be .sorted(Comparator.comparing(Map.Entry::getValue).reversed(), but type inference is having issues with that and for some reason it will not compile.
I hope the Java 8 way can give you interesting insights.
Read full article from java - Producing a sorted wordcount with Spark - Code Review Stack Exchange
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.