Saturday, November 22, 2014

Spark: Parse CSV file and group by column value | Java Code Geeks


Spark: Parse CSV file and group by column value | Java Code Geeks

November 22, 2014 7:42 pm Spark: Parse CSV file and group by column value I've found myself working with large CSV files quite frequently and realising that my existing toolset didn't let me explore them quickly I thought I'd spend a bit of time looking at Spark to see if it could help.           $ ls -alh ~/Downloads/Crimes_-_2001_to_present.csv -rw-r--r--@ 1 markneedham staff 1.0G 16 Nov 12:14 /Users/markneedham/Downloads/Crimes_-_2001_to_present.csv   $ wc -l ~/Downloads/Crimes_-_2001_to_present.csv 4193441 /Users/markneedham/Downloads/Crimes_-_2001_to_present.csv We can get a rough idea of the contents of the file by looking at the first row along with the header: $ head -n 2 ~/Downloads/Crimes_-_2001_to_present.csv ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location 9464711,HX114160,01/14/2014 05:00:00 AM,028XX E 80TH ST,0560,

Read full article from Spark: Parse CSV file and group by column value | Java Code Geeks

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.