Skip to content

Update documentation #119

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions docs/Project.toml
Original file line number Diff line number Diff line change
@@ -1,3 +1,10 @@
[deps]
Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4"
SplitApplyCombine = "03a91e81-4c3e-53e1-a0a4-9c0c8f19dd66"
TypedTables = "9d95f2ec-7b3d-5a63-8d20-e2491e220bb9"

[sources]
TypedTables = {path = ".."}

[compat]
Documenter = "1"
6 changes: 4 additions & 2 deletions docs/make.jl
Original file line number Diff line number Diff line change
@@ -1,4 +1,6 @@
using Documenter, TypedTables
using Documenter
using TypedTables
using Documenter.Remotes: GitHub

makedocs(;
modules=[TypedTables],
Expand Down Expand Up @@ -29,7 +31,7 @@ makedocs(;
],
"API reference" => "man/reference.md"
],
repo="https://github.com/JuliaData/TypedTables.jl/blob/{commit}{path}#L{line}",
repo=GitHub("JuliaData/TypedTables.jl"),
sitename="TypedTables.jl",
)

Expand Down
7 changes: 6 additions & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
```@meta
DocTestSetup = quote
using TypedTables
end
```
# TypedTables.jl

*Simple, fast, column-based storage for data analysis in Julia.*
Expand Down Expand Up @@ -26,7 +31,7 @@ That's it!

Here's a table:

```julia
```jldoctest
julia> using TypedTables

julia> t = Table(a = [1, 2, 3], b = [2.0, 4.0, 6.0])
Expand Down
2 changes: 1 addition & 1 deletion docs/src/man/acceleratedarrays.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ The *AcceleratedArrays* package exists to provide a way of attaching secondary a

This system allows for an extensible set of acceleration indices - such as accelerated spatial lookup using a spatial search tree, or an inverted index for searching for words in text fields.

Note: by default, the `innerjoin` operation will construct a hash-based index to perform a join on two unindexed data sources, meaning most basic data operations can be achieved at reasonable speeds.
Note: by default, the `innerjoin` operation will construct a hash-based index to perform a join on two unindexed data sources, meaning most basic data operations can be achieved at reasonable speeds.
17 changes: 11 additions & 6 deletions docs/src/man/dicttable.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
```@meta
DocTestSetup = quote
using TypedTables
end
```
# DictTable

`DictTable` is similar to `Table` except that instead of being an `AbstractArray` it is
Expand All @@ -7,7 +12,7 @@ The advantage of this is that rows can be indexed by a semantically-important ke
case is that the first column of a table is a unique, primary-key column. When you construct
a `DictTable` in with arrays it will assume the first column is the primary key.

```julia
```jldoctest dicttable
julia> using TypedTables

julia> t = DictTable(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37])
Expand All @@ -21,16 +26,16 @@ DictTable with 1 column and 3 rows:

As mentioned, rows can be indexed by the value of the primary key.

```julia
```jldoctest dicttable
julia> t["Alice"]
(name = "Alice", age = 25)
```

The columns themselves are dictionaries that can be also be indexed by primary key.

```julia
```jldoctest dicttable
julia> t.age
3-element Dictionaries.Dictionary{String, Int64}
3-element Dictionaries.Dictionary{String, Int64}:
"Alice" │ 25
"Bob" │ 42
"Charlie" │ 37
Expand All @@ -42,10 +47,10 @@ julia> t.age["Alice"]
With the design of *Dictionaries.jl*, these dictionaries are able to share `Indices` so that
this has very little overhead (even with many columns).

```julia
```jldoctest dicttable
julia> keys(t.age) === t.name
true
```

Note that it is not *required* that the first column is the primary key. The `DictTable`
constructor can accept arbitrary dictionaries as columns (so long as the keys agree).
constructor can accept arbitrary dictionaries as columns (so long as the keys agree).
31 changes: 18 additions & 13 deletions docs/src/man/filter.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
```@meta
DocTestSetup = quote
using TypedTables
end
```
# Finding data

Frequently, we need to find data (i.e. rows of the table) that matches certain criteria, and there are multiple mechanisms for achieving this in Julia. Here we will briefly review `map`, `findall` and `filter` as options.
Expand All @@ -6,7 +11,7 @@ Frequently, we need to find data (i.e. rows of the table) that matches certain c

Following the previous section, we can identify row satisfying an arbitrary predicate using the `map` function. Note that "predicate" is just a name for function that takes an input and returns either `true` or `false`.

```julia
```jldoctest finding
julia> t = Table(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37])
Table with 2 columns and 3 rows:
name age
Expand All @@ -16,15 +21,15 @@ Table with 2 columns and 3 rows:
3 │ Charlie 37

julia> is_old = map(row -> row.age > 40, t)
3-element Array{Bool,1}:
false
true
false
3-element Vector{Bool}:
0
1
0
```

Finally, we can use "logical" (i.e. Boolean) indexing to extract the rows where the predicate is `true`.

```julia
```jldoctest finding
julia> t[is_old]
Table with 2 columns and 1 row:
name age
Expand All @@ -39,9 +44,9 @@ The `map(predicate, table)` approach will allocate one `Bool` for each row in th

If we wish to locate the indices of the rows where the predicate returns `true`, we can use Julia's `findall` function.

```julia
```jldoctest finding
julia> inds = findall(row -> row.age > 40, t)
1-element Array{Int64,1}:
1-element Vector{Int64}:
2

julia> t[inds]
Expand All @@ -57,7 +62,7 @@ This method may be less resource intensive (result in less memory allocated) if

Finally, if we wish to directly `filter` the table and obtain the rows of interest, we can do that as well.

```julia
```jldoctest finding
julia> filter(row -> row.age > 40, t)
Table with 2 columns and 1 row:
name age
Expand All @@ -71,7 +76,7 @@ Internally, the `filter` method may rely on one of the implementations above.

Julia's "generator" syntax also allows for filtering operations using `if`.

```
```jldoctest finding
julia> Table(row for row in t if row.age > 40)
Table with 2 columns and 1 row:
name age
Expand All @@ -88,9 +93,9 @@ As mentioned in other sections, it is frequently worthwhile to preselect the col

One simple example of such a transformation is to first project to the column(s) of interest, followed by using `map` or `findall` to identify the indices of the rows where `predicate` is `true`, and finally to use `getindex` or `view` to obtain the result of the full table.

```julia
```juliajldoctest finding
julia> inds = findall(age -> age > 40, t.age)
1-element Array{Int64,1}:
1-element Vector{Int64}:
2

julia> t[inds]
Expand All @@ -100,4 +105,4 @@ Table with 2 columns and 1 row:
1 │ Bob 42
```

Easy, peasy!
Easy, peasy!
9 changes: 7 additions & 2 deletions docs/src/man/flextable.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
```@meta
DocTestSetup = quote
using TypedTables
end
```
# FlexTable

This package defines a second tabular container type, `FlexTable`, that is designed to be a more **flex**ible **table**.
Expand All @@ -15,7 +20,7 @@ Amongst other things, using `FlexTable` might allow you to more easily port your

A column can be added by using the `.` operator (also known as `setproperty!`).

```julia
```jldoctest flextable
julia> ft = FlexTable(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37])
FlexTable with 2 columns and 3 rows:
name age
Expand All @@ -37,7 +42,7 @@ FlexTable with 3 columns and 3 rows:

The same syntax is used to replace a column.

```julia
```jldoctest flextable
julia> ft.sex = ["female", "male", "male"];

julia> ft
Expand Down
85 changes: 46 additions & 39 deletions docs/src/man/group.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
```@meta
DocTestSetup = quote
using TypedTables
end
```
# Grouping data

It is frequently useful to break data appart into different *groups* for processing - a paradigm frequently referred to a the split-apply-combine methodology.
Expand All @@ -10,7 +15,7 @@ In a powerful environment such as Julia, that fully supports nested containers,

To demonstrate the power of grouping, this time we'll add some more rows and columns to our example data.

```julia
```jldoctest grouping
julia> t = Table(firstname = ["Alice", "Bob", "Charlie", "Adam", "Eve", "Cindy", "Arthur"], lastname = ["Smith", "Smith", "Smith", "Williams", "Williams", "Brown", "King"], age = [25, 42, 37, 65, 18, 33, 54])
Table with 3 columns and 7 rows:
firstname lastname age
Expand All @@ -25,40 +30,42 @@ Table with 3 columns and 7 rows:
```

Let's get familiar with the *basic* usage of `group` on standard (non-tabular) arrays. For example, let's group people's first name by their first letter.
```julia
```jldoctest grouping
julia> using SplitApplyCombine

julia> group(first, t.firstname)
Dict{Char,Array{String,1}} with 4 entries:
'C' => ["Charlie", "Cindy"]
'A' => ["Alice", "Adam", "Arthur"]
'E' => ["Eve"]
'B' => ["Bob"]
4-element Dictionaries.Dictionary{Char, Vector{String}}:
'A' │ ["Alice", "Adam", "Arthur"]
'B' │ ["Bob"]
'C' │ ["Charlie", "Cindy"]
'E' │ ["Eve"]
```
The groups are returned as a `Dict` where they indices (or keys) of the dictionary are the first character of people's firstname string. The values of the `Dict` are arrays listing the matching firstnames.

Next, we may want to group up data coming from a table (not just a single column). For example, we may want to group firstnames by lastname.

```julia
```jldoctest grouping
julia> group(getproperty(:lastname), getproperty(:firstname), t)
Dict{String,Array{String,1}} with 4 entries:
"King" => ["Arthur"]
"Williams" => ["Adam", "Eve"]
"Brown" => ["Cindy"]
"Smith" => ["Alice", "Bob", "Charlie"]
4-element Dictionaries.Dictionary{String, Vector{String}}:
"Smith" │ ["Alice", "Bob", "Charlie"]
"Williams" ["Adam", "Eve"]
"Brown" ["Cindy"]
"King" │ ["Arthur"]
```
Note that the returned structure is still not a `Table` at all - it is a dictionary (`Dict`) with the unique `lastname` values as keys, returing (non-tabular) arrays of `firstname`.

If instead, our grouping elements are `rows`, the group will be a table. For example, we can just drop the `getproperty(:firstname)` projection to get:

```julia
```jldoctest grouping
julia> groups = group(getproperty(:lastname), t)
Groups{String,Any,Table{NamedTuple{(:firstname, :lastname, :age),Tuple{String,String,Int64}},1,NamedTuple{(:firstname, :lastname, :age),Tuple{Array{String,1},Array{String,1},Array{Int64,1}}}},Dict{String,Array{Int64,1}}} with 4 entries:
"King" => Table with 3 columns and 1 row:
"Williams" => Table with 3 columns and 2 rows:
"Brown" => Table with 3 columns and 1 row:
"Smith" => Table with 3 columns and 3 rows:
4-element Dictionaries.Dictionary{String, Table{@NamedTuple{firstname::String, lastname::String, age::Int64}, 1, @NamedTuple{firstname::Vector{String}, lastname::Vector{String}, age::Vector{Int64}}}}:
"Smith" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir
"Williams" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir
"Brown" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir
"King" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir
```
The results are only summarized (for compactness), but can be easily accessed.
```julia
```jldoctest grouping
julia> groups["Smith"]
Table with 3 columns and 3 rows:
firstname lastname age
Expand All @@ -74,18 +81,18 @@ There are additional functions provided to do grouping while copying less data.

A `groupinds` function let's you identify the indices of the rows belonging to certain groups.

```julia
```jldoctest grouping
julia> lastname_inds = groupinds(t.lastname)
Dict{String,Array{Int64,1}} with 4 entries:
"King" => [7]
"Williams" => [4, 5]
"Brown" => [6]
"Smith" => [1, 2, 3]
4-element Dictionaries.Dictionary{String, Vector{Int64}}:
"Smith" │ [1, 2, 3]
"Williams" [4, 5]
"Brown" [6]
"King" │ [7]
```

We can then use these indices to perform calculations on each group of data, for example the mean age per lastname grouping.

```julia
```jldoctest grouping
julia> using Statistics

julia> Dict(lastname => mean(t.age[inds]) for (lastname, inds) in lastname_inds)
Expand All @@ -104,22 +111,22 @@ Sometimes we can perform a split-apply-combine strategy by streaming just once o

For example, we can sum up the ages corresponding to each family name.

```julia
```jldoctest grouping
julia> groupreduce(getproperty(:lastname), getproperty(:age), +, t)
Dict{String,Int64} with 4 entries:
"King" => 54
"Williams" => 83
"Brown" => 33
"Smith" => 104
4-element Dictionaries.Dictionary{String, Int64}:
"Smith" │ 104
"Williams" 83
"Brown" 33
"King" │ 54
```

*SplitApplyCombine* provides related functions `groupsum`, `groupprod`, and so-on. One particularly handy function for summarizing data by giving counts of unique values is `groupcount`.

```julia
```jldoctest grouping
julia> groupcount(t.lastname)
Dict{String,Int64} with 4 entries:
"King" => 1
"Williams" => 2
"Brown" => 1
"Smith" => 3
4-element Dictionaries.Dictionary{String, Int64}:
"Smith" │ 3
"Williams" 2
"Brown" 1
"King" │ 1
```
Loading
Loading