Spark: Parse CSV file and group by column value | Java Code Geeks
November 22, 2014 7:42 pm Spark: Parse CSV file and group by column value I've found myself working with large CSV files quite frequently and realising that my existing toolset didn't let me explore them quickly I thought I'd spend a bit of time looking at Spark to see if it could help. $ ls -alh ~/Downloads/Crimes_-_2001_to_present.csv -rw-r--r--@ 1 markneedham staff 1.0G 16 Nov 12:14 /Users/markneedham/Downloads/Crimes_-_2001_to_present.csv $ wc -l ~/Downloads/Crimes_-_2001_to_present.csv 4193441 /Users/markneedham/Downloads/Crimes_-_2001_to_present.csv We can get a rough idea of the contents of the file by looking at the first row along with the header: $ head -n 2 ~/Downloads/Crimes_-_2001_to_present.csv ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location 9464711,HX114160,01/14/2014 05:00:00 AM,028XX E 80TH ST,0560,Read full article from Spark: Parse CSV file and group by column value | Java Code Geeks
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.