Tuesday, March 13, 2018

Simple note for Julia DataFrame

Overview

On this article, I'll write down the Julia's basic dataframe manipulation.
I just started to use Julia. I want to continue to use this language as DS and ML tool. For that, at first I'll focus on the basic dataframe manipulation.
Here, the version of Julia is 0.6.2.



How to make DataFrame

By using DataFrames module, we can manipulate data in almost same way as Python's pandas and R.

using DataFrames, CSV

Make new DataFrame

By DataFrame(), we can make DataFrame from arrays.

df = DataFrame(x = [1,2,3], y = [4,5,6])

Read file

To read the csv/tsv file, there are three functions.
  • readcsv
  • readtable from DataFrames.jl
  • CSV.read from CSV.jl
Here, I'll write down just about readtable.

df = readtable("iris.csv", header=true)

Write file

To write the data out

CSV.write("output.csv", df)

How to access

By columns

We can get the subsets by specifying the column’s name or number.

df[:,[:SepalLength, :SepalWidth]]
df[:,1]

By rows

By specifying the row number

df[1,:SepalLength]

By conditions

On the following code, I'm getting the subsets of the dataframe under the condition that the value of Species is setosa.

df[df[:Species] .== "setosa", :]

Sort

sort(df, cols = (:PetalLength))