diff --git a/docs/Project.toml b/docs/Project.toml index 086208b..1bffed4 100644 --- a/docs/Project.toml +++ b/docs/Project.toml @@ -1,3 +1,10 @@ [deps] Documenter = "e30172f5-a6a5-5a46-863b-614d45cd2de4" +SplitApplyCombine = "03a91e81-4c3e-53e1-a0a4-9c0c8f19dd66" TypedTables = "9d95f2ec-7b3d-5a63-8d20-e2491e220bb9" + +[sources] +TypedTables = {path = ".."} + +[compat] +Documenter = "1" diff --git a/docs/make.jl b/docs/make.jl index 7cdd881..b6f66e5 100644 --- a/docs/make.jl +++ b/docs/make.jl @@ -1,4 +1,6 @@ -using Documenter, TypedTables +using Documenter +using TypedTables +using Documenter.Remotes: GitHub makedocs(; modules=[TypedTables], @@ -29,7 +31,7 @@ makedocs(; ], "API reference" => "man/reference.md" ], - repo="https://github.com/JuliaData/TypedTables.jl/blob/{commit}{path}#L{line}", + repo=GitHub("JuliaData/TypedTables.jl"), sitename="TypedTables.jl", ) diff --git a/docs/src/index.md b/docs/src/index.md index 135abb8..122fda7 100644 --- a/docs/src/index.md +++ b/docs/src/index.md @@ -1,3 +1,8 @@ +```@meta +DocTestSetup = quote + using TypedTables +end +``` # TypedTables.jl *Simple, fast, column-based storage for data analysis in Julia.* @@ -26,7 +31,7 @@ That's it! Here's a table: -```julia +```jldoctest julia> using TypedTables julia> t = Table(a = [1, 2, 3], b = [2.0, 4.0, 6.0]) diff --git a/docs/src/man/acceleratedarrays.md b/docs/src/man/acceleratedarrays.md index e3f9706..4e47cc5 100644 --- a/docs/src/man/acceleratedarrays.md +++ b/docs/src/man/acceleratedarrays.md @@ -8,4 +8,4 @@ The *AcceleratedArrays* package exists to provide a way of attaching secondary a This system allows for an extensible set of acceleration indices - such as accelerated spatial lookup using a spatial search tree, or an inverted index for searching for words in text fields. -Note: by default, the `innerjoin` operation will construct a hash-based index to perform a join on two unindexed data sources, meaning most basic data operations can be achieved at reasonable speeds. \ No newline at end of file +Note: by default, the `innerjoin` operation will construct a hash-based index to perform a join on two unindexed data sources, meaning most basic data operations can be achieved at reasonable speeds. diff --git a/docs/src/man/dicttable.md b/docs/src/man/dicttable.md index 4c04f37..a34d8cc 100644 --- a/docs/src/man/dicttable.md +++ b/docs/src/man/dicttable.md @@ -1,3 +1,8 @@ +```@meta +DocTestSetup = quote + using TypedTables +end +``` # DictTable `DictTable` is similar to `Table` except that instead of being an `AbstractArray` it is @@ -7,7 +12,7 @@ The advantage of this is that rows can be indexed by a semantically-important ke case is that the first column of a table is a unique, primary-key column. When you construct a `DictTable` in with arrays it will assume the first column is the primary key. -```julia +```jldoctest dicttable julia> using TypedTables julia> t = DictTable(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37]) @@ -21,16 +26,16 @@ DictTable with 1 column and 3 rows: As mentioned, rows can be indexed by the value of the primary key. -```julia +```jldoctest dicttable julia> t["Alice"] (name = "Alice", age = 25) ``` The columns themselves are dictionaries that can be also be indexed by primary key. -```julia +```jldoctest dicttable julia> t.age -3-element Dictionaries.Dictionary{String, Int64} +3-element Dictionaries.Dictionary{String, Int64}: "Alice" │ 25 "Bob" │ 42 "Charlie" │ 37 @@ -42,10 +47,10 @@ julia> t.age["Alice"] With the design of *Dictionaries.jl*, these dictionaries are able to share `Indices` so that this has very little overhead (even with many columns). -```julia +```jldoctest dicttable julia> keys(t.age) === t.name true ``` Note that it is not *required* that the first column is the primary key. The `DictTable` -constructor can accept arbitrary dictionaries as columns (so long as the keys agree). \ No newline at end of file +constructor can accept arbitrary dictionaries as columns (so long as the keys agree). diff --git a/docs/src/man/filter.md b/docs/src/man/filter.md index b854496..bb251cd 100644 --- a/docs/src/man/filter.md +++ b/docs/src/man/filter.md @@ -1,3 +1,8 @@ +```@meta +DocTestSetup = quote + using TypedTables +end +``` # Finding data Frequently, we need to find data (i.e. rows of the table) that matches certain criteria, and there are multiple mechanisms for achieving this in Julia. Here we will briefly review `map`, `findall` and `filter` as options. @@ -6,7 +11,7 @@ Frequently, we need to find data (i.e. rows of the table) that matches certain c Following the previous section, we can identify row satisfying an arbitrary predicate using the `map` function. Note that "predicate" is just a name for function that takes an input and returns either `true` or `false`. -```julia +```jldoctest finding julia> t = Table(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37]) Table with 2 columns and 3 rows: name age @@ -16,15 +21,15 @@ Table with 2 columns and 3 rows: 3 │ Charlie 37 julia> is_old = map(row -> row.age > 40, t) -3-element Array{Bool,1}: - false - true - false +3-element Vector{Bool}: + 0 + 1 + 0 ``` Finally, we can use "logical" (i.e. Boolean) indexing to extract the rows where the predicate is `true`. -```julia +```jldoctest finding julia> t[is_old] Table with 2 columns and 1 row: name age @@ -39,9 +44,9 @@ The `map(predicate, table)` approach will allocate one `Bool` for each row in th If we wish to locate the indices of the rows where the predicate returns `true`, we can use Julia's `findall` function. -```julia +```jldoctest finding julia> inds = findall(row -> row.age > 40, t) -1-element Array{Int64,1}: +1-element Vector{Int64}: 2 julia> t[inds] @@ -57,7 +62,7 @@ This method may be less resource intensive (result in less memory allocated) if Finally, if we wish to directly `filter` the table and obtain the rows of interest, we can do that as well. -```julia +```jldoctest finding julia> filter(row -> row.age > 40, t) Table with 2 columns and 1 row: name age @@ -71,7 +76,7 @@ Internally, the `filter` method may rely on one of the implementations above. Julia's "generator" syntax also allows for filtering operations using `if`. -``` +```jldoctest finding julia> Table(row for row in t if row.age > 40) Table with 2 columns and 1 row: name age @@ -88,9 +93,9 @@ As mentioned in other sections, it is frequently worthwhile to preselect the col One simple example of such a transformation is to first project to the column(s) of interest, followed by using `map` or `findall` to identify the indices of the rows where `predicate` is `true`, and finally to use `getindex` or `view` to obtain the result of the full table. -```julia +```juliajldoctest finding julia> inds = findall(age -> age > 40, t.age) -1-element Array{Int64,1}: +1-element Vector{Int64}: 2 julia> t[inds] @@ -100,4 +105,4 @@ Table with 2 columns and 1 row: 1 │ Bob 42 ``` -Easy, peasy! \ No newline at end of file +Easy, peasy! diff --git a/docs/src/man/flextable.md b/docs/src/man/flextable.md index a922453..62331d6 100644 --- a/docs/src/man/flextable.md +++ b/docs/src/man/flextable.md @@ -1,3 +1,8 @@ +```@meta +DocTestSetup = quote + using TypedTables +end +``` # FlexTable This package defines a second tabular container type, `FlexTable`, that is designed to be a more **flex**ible **table**. @@ -15,7 +20,7 @@ Amongst other things, using `FlexTable` might allow you to more easily port your A column can be added by using the `.` operator (also known as `setproperty!`). -```julia +```jldoctest flextable julia> ft = FlexTable(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37]) FlexTable with 2 columns and 3 rows: name age @@ -37,7 +42,7 @@ FlexTable with 3 columns and 3 rows: The same syntax is used to replace a column. -```julia +```jldoctest flextable julia> ft.sex = ["female", "male", "male"]; julia> ft diff --git a/docs/src/man/group.md b/docs/src/man/group.md index 9f6afa4..fca6bea 100644 --- a/docs/src/man/group.md +++ b/docs/src/man/group.md @@ -1,3 +1,8 @@ +```@meta +DocTestSetup = quote + using TypedTables +end +``` # Grouping data It is frequently useful to break data appart into different *groups* for processing - a paradigm frequently referred to a the split-apply-combine methodology. @@ -10,7 +15,7 @@ In a powerful environment such as Julia, that fully supports nested containers, To demonstrate the power of grouping, this time we'll add some more rows and columns to our example data. -```julia +```jldoctest grouping julia> t = Table(firstname = ["Alice", "Bob", "Charlie", "Adam", "Eve", "Cindy", "Arthur"], lastname = ["Smith", "Smith", "Smith", "Williams", "Williams", "Brown", "King"], age = [25, 42, 37, 65, 18, 33, 54]) Table with 3 columns and 7 rows: firstname lastname age @@ -25,40 +30,42 @@ Table with 3 columns and 7 rows: ``` Let's get familiar with the *basic* usage of `group` on standard (non-tabular) arrays. For example, let's group people's first name by their first letter. -```julia +```jldoctest grouping +julia> using SplitApplyCombine + julia> group(first, t.firstname) -Dict{Char,Array{String,1}} with 4 entries: - 'C' => ["Charlie", "Cindy"] - 'A' => ["Alice", "Adam", "Arthur"] - 'E' => ["Eve"] - 'B' => ["Bob"] +4-element Dictionaries.Dictionary{Char, Vector{String}}: + 'A' │ ["Alice", "Adam", "Arthur"] + 'B' │ ["Bob"] + 'C' │ ["Charlie", "Cindy"] + 'E' │ ["Eve"] ``` The groups are returned as a `Dict` where they indices (or keys) of the dictionary are the first character of people's firstname string. The values of the `Dict` are arrays listing the matching firstnames. Next, we may want to group up data coming from a table (not just a single column). For example, we may want to group firstnames by lastname. -```julia +```jldoctest grouping julia> group(getproperty(:lastname), getproperty(:firstname), t) -Dict{String,Array{String,1}} with 4 entries: - "King" => ["Arthur"] - "Williams" => ["Adam", "Eve"] - "Brown" => ["Cindy"] - "Smith" => ["Alice", "Bob", "Charlie"] +4-element Dictionaries.Dictionary{String, Vector{String}}: + "Smith" │ ["Alice", "Bob", "Charlie"] + "Williams" │ ["Adam", "Eve"] + "Brown" │ ["Cindy"] + "King" │ ["Arthur"] ``` Note that the returned structure is still not a `Table` at all - it is a dictionary (`Dict`) with the unique `lastname` values as keys, returing (non-tabular) arrays of `firstname`. If instead, our grouping elements are `rows`, the group will be a table. For example, we can just drop the `getproperty(:firstname)` projection to get: -```julia +```jldoctest grouping julia> groups = group(getproperty(:lastname), t) -Groups{String,Any,Table{NamedTuple{(:firstname, :lastname, :age),Tuple{String,String,Int64}},1,NamedTuple{(:firstname, :lastname, :age),Tuple{Array{String,1},Array{String,1},Array{Int64,1}}}},Dict{String,Array{Int64,1}}} with 4 entries: - "King" => Table with 3 columns and 1 row:… - "Williams" => Table with 3 columns and 2 rows:… - "Brown" => Table with 3 columns and 1 row:… - "Smith" => Table with 3 columns and 3 rows:… +4-element Dictionaries.Dictionary{String, Table{@NamedTuple{firstname::String, lastname::String, age::Int64}, 1, @NamedTuple{firstname::Vector{String}, lastname::Vector{String}, age::Vector{Int64}}}}: + "Smith" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir… + "Williams" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir… + "Brown" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir… + "King" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir… ``` The results are only summarized (for compactness), but can be easily accessed. -```julia +```jldoctest grouping julia> groups["Smith"] Table with 3 columns and 3 rows: firstname lastname age @@ -74,18 +81,18 @@ There are additional functions provided to do grouping while copying less data. A `groupinds` function let's you identify the indices of the rows belonging to certain groups. -```julia +```jldoctest grouping julia> lastname_inds = groupinds(t.lastname) -Dict{String,Array{Int64,1}} with 4 entries: - "King" => [7] - "Williams" => [4, 5] - "Brown" => [6] - "Smith" => [1, 2, 3] +4-element Dictionaries.Dictionary{String, Vector{Int64}}: + "Smith" │ [1, 2, 3] + "Williams" │ [4, 5] + "Brown" │ [6] + "King" │ [7] ``` We can then use these indices to perform calculations on each group of data, for example the mean age per lastname grouping. -```julia +```jldoctest grouping julia> using Statistics julia> Dict(lastname => mean(t.age[inds]) for (lastname, inds) in lastname_inds) @@ -104,22 +111,22 @@ Sometimes we can perform a split-apply-combine strategy by streaming just once o For example, we can sum up the ages corresponding to each family name. -```julia +```jldoctest grouping julia> groupreduce(getproperty(:lastname), getproperty(:age), +, t) -Dict{String,Int64} with 4 entries: - "King" => 54 - "Williams" => 83 - "Brown" => 33 - "Smith" => 104 +4-element Dictionaries.Dictionary{String, Int64}: + "Smith" │ 104 + "Williams" │ 83 + "Brown" │ 33 + "King" │ 54 ``` *SplitApplyCombine* provides related functions `groupsum`, `groupprod`, and so-on. One particularly handy function for summarizing data by giving counts of unique values is `groupcount`. -```julia +```jldoctest grouping julia> groupcount(t.lastname) -Dict{String,Int64} with 4 entries: - "King" => 1 - "Williams" => 2 - "Brown" => 1 - "Smith" => 3 +4-element Dictionaries.Dictionary{String, Int64}: + "Smith" │ 3 + "Williams" │ 2 + "Brown" │ 1 + "King" │ 1 ``` diff --git a/docs/src/man/io.md b/docs/src/man/io.md index e049fd9..931ca50 100644 --- a/docs/src/man/io.md +++ b/docs/src/man/io.md @@ -1,3 +1,8 @@ +```@meta +DocTestSetup = quote + using TypedTables +end +``` # Input and output Input and output of `Table` and `FlexTable` are mostly handled through externally-defined interfaces. @@ -6,11 +11,11 @@ Input and output of `Table` and `FlexTable` are mostly handled through externall One can convert an `AbstractArray` of `NamedTuple`s to a `Table` using a simple constructor. -``` +```jldoctest io julia> v = [(name="Alice", age=25), (name="Bob", age=42), (name= "Charlie", age=37)] -3-element Array{NamedTuple{(:name, :age),Tuple{String,Int64}},1}: - (name = "Alice", age = 25) - (name = "Bob", age = 42) +3-element Vector{@NamedTuple{name::String, age::Int64}}: + (name = "Alice", age = 25) + (name = "Bob", age = 42) (name = "Charlie", age = 37) julia> t = Table(v) @@ -23,12 +28,12 @@ Table with 2 columns and 3 rows: ``` In this way, we have converted a row-based storage container to a column-based storage container. -One can convert back to row-based storage by `collect`ing the results in an `Array`. -```julia +One can convert back to row-based storage by `collect`ing the results in an `Array`. +```jldoctest io julia> collect(t) -3-element Array{NamedTuple{(:name, :age),Tuple{String,Int64}},1}: - (name = "Alice", age = 25) - (name = "Bob", age = 42) +3-element Vector{@NamedTuple{name::String, age::Int64}}: + (name = "Alice", age = 25) + (name = "Bob", age = 42) (name = "Charlie", age = 37) ``` @@ -63,7 +68,7 @@ Charlie,37 We can load this file from disk using the `CSV.File` constructor. -```julia +```julia-repl julia> using TypedTables, CSV julia> csvfile = CSV.File("input.csv") @@ -74,17 +79,17 @@ Tables.Schema: ``` Note that *CSV* has inferred the column types from the data, but by default allows for `missing` data. This can be controlled via the `allowmissing` keyword argument (as either `:all`, `:none` or `:auto`). -```julia +```julia-repl julia> CSV.File("input.csv", allowmissing=:none) CSV.File("/home/ferris/example.csv", rows=3): Tables.Schema: :name String - :age Int64 + :age Int64 ``` Either of these can finally be converted to a `Table`. -```julia +```julia-repl julia> Table(csvfile) Table with 2 columns and 3 rows: name age @@ -96,7 +101,7 @@ Table with 2 columns and 3 rows: Similarly, the *CSV.jl* package supports writing tables with `CSV.write` function. -```julia +```julia-repl julia> CSV.write("output.csv", t) "output.csv" -``` \ No newline at end of file +``` diff --git a/docs/src/man/join.md b/docs/src/man/join.md index a7a3c78..f1d1995 100644 --- a/docs/src/man/join.md +++ b/docs/src/man/join.md @@ -1,3 +1,8 @@ +```@meta +DocTestSetup = quote + using TypedTables +end +``` # Joining data The methods defined so far work on single data sources (tables) at-a-time. Sometimes, we need to *join* information together from multiple tables. @@ -10,9 +15,11 @@ Suppose `table1` has `n` rows, and `table2` has `m` rows. We can create a new ta The easiest way to do this is with the `SplitApplyCombine.product`. For a quick primer, `out = product(f, a, b)` returns an array `out` such that `out[i, j] = f(a, b)`. For example, let's take all combinations of the sums of `[1, 2, 3]` and `[10, 20, 30, 40]`. -```julia +```jldoctest +julia> using SplitApplyCombine + julia> product(+, [1, 2, 3], [10, 20, 30, 40]) -3×4 Array{Int64,2}: +3×4 Matrix{Int64}: 11 21 31 41 12 22 32 42 13 23 33 43 @@ -20,9 +27,9 @@ julia> product(+, [1, 2, 3], [10, 20, 30, 40]) One can also use `tuple` to simply collect both sets of data. -```julia +```jldoctest; setup = :(using TypedTables, SplitApplyCombine) julia> product(tuple, [1, 2, 3], [10, 20, 30, 40]) -3×4 Array{Tuple{Int64,Int64},2}: +3×4 Matrix{Tuple{Int64, Int64}}: (1, 10) (1, 20) (1, 30) (1, 40) (2, 10) (2, 20) (2, 30) (2, 40) (3, 10) (3, 20) (3, 30) (3, 40) @@ -30,7 +37,9 @@ julia> product(tuple, [1, 2, 3], [10, 20, 30, 40]) (Note that `tuple` is the *only* option for the similar function `Iterators.product`). Let's try this with a table. This time, for two tables with *distinct* column names, we can use the `merge` function to merge the rows into single `NamedTuple`s - for example, take this list of all pairings of firstnames and lastnames. -```julia +```jldoctest join +julia> using SplitApplyCombine + julia> t1 = Table(firstname = ["Alice", "Bob", "Charlie"]) Table with 1 column and 3 rows: firstname @@ -78,7 +87,7 @@ Finally, also note that there is a `productview` function for performing this op One can feed in multiple inputs into a generator, and Julia will automatically take the Cartesian product of all inputs. For example: -```julia +```jldoctest join julia> t3 = Table(merge(row1, row2) for row1 in t1, row2 in t2) Table with 2 columns and 12 rows: firstname lastname @@ -102,7 +111,7 @@ Table with 2 columns and 12 rows: In a nutshell: the relational "join" operation is simply the above Cartesian product followed by a filtering operation. Generally, the filtering operation will depend on information coming from *both* input data sets - for example, that the values in a particular column must match exactly. (Any filtering that depends only on information from one input table can be done more efficiently *before* the join operation). For a simple example, let's look for all pairings of firstnames and lastnames that have an equal number of characters. For efficiency, we combine this with `productview`. -```julia +```jldoctest join julia> filter(row -> length(row.firstname) == length(row.lastname), t3) Table with 2 columns and 2 rows: firstname lastname @@ -123,7 +132,7 @@ In fact, using the array index as the primary key can be the most efficient way As an example, let's take a simplistic `customers` and `orders` database. -```julia +```jldoctest join-2 julia> customers = Table(name = ["Alice", "Bob", "Charlie"], address = ["12 Beach Street", "163 Moon Road", "6 George Street"]) Table with 2 columns and 3 rows: name address @@ -143,7 +152,7 @@ Table with 2 columns and 4 rows: ``` To get the customer for each order is just a simple indexing operation. -```julia +```jldoctest join-2 julia> customers[orders.customer_id] Table with 2 columns and 4 rows: name address @@ -156,7 +165,7 @@ Table with 2 columns and 4 rows: ``` We can denormalize the orders and their customers to a single table by performing a `merge` on each row (in this case using broadcasting dot-syntax for brevity). -```julia +```jldoctest join-2 julia> merge.(customers[orders.customer_id], orders) Table with 4 columns and 4 rows: name address customer_id items @@ -170,7 +179,9 @@ Table with 4 columns and 4 rows: We can perform these operation lazily for cost *O*(1) using `view` and `mapview` - after which the data can be processed further. -```julia +```jldoctest join-2 +julia> using SplitApplyCombine + julia> mapview(merge, view(customers, orders.customer_id), orders) Table with 4 columns and 4 rows: name address customer_id items @@ -189,7 +200,7 @@ We now turn out attention to the relational join, implemented via *SplitApplyCom The `innerjoin` function is flexible, able to join any iterable data source via any comparing predicate, and perform an arbitrary mapping of the matching results. Using `?`, we can view its documentation at the REPL: -```julia +```julia-repl help?> innerjoin search: innerjoin @@ -203,7 +214,7 @@ search: innerjoin ≡≡≡≡≡≡≡≡≡ julia> innerjoin(iseven, iseven, tuple, ==, [1,2,3,4], [0,1,2]) - 6-element Array{Tuple{Int64,Int64},1}: + 6-element Matrix{Tuple{Int64,Int64}}: (1, 1) (2, 0) (2, 2) @@ -216,7 +227,7 @@ Let's examine this. Assume the inputs `left` and `right` are `Table`s. We may wa As an example, we modify our `customers` table to explicitly include the customer's `id`, similarly to above. -```julia +```jldoctest join-2 julia> customers = Table(id = 1:3, name = ["Alice", "Bob", "Charlie"], address = ["12 Beach Street", "163 Moon Road", "6 George Street"]) Table with 3 columns and 3 rows: id name address @@ -225,6 +236,8 @@ Table with 3 columns and 3 rows: 2 │ 2 Bob 163 Moon Road 3 │ 3 Charlie 6 George Street +julia> using SplitApplyCombine + julia> innerjoin(getproperty(:id), getproperty(:customer_id), customers, orders) Table with 5 columns and 4 rows: id name address customer_id items @@ -243,7 +256,7 @@ See the section on Acceleration Indices for methods of (a) attaching secondary a As a final example, generators provide a convenient syntax for filtering Cartesian products of collections - that is, to perform an inner join! -```julia +```jldoctest join-2 julia> Table(merge(customer, order) for customer in customers, order in orders if customer.id == order.customer_id) Table with 5 columns and 4 rows: id name address customer_id items @@ -262,31 +275,25 @@ Currently *SplitApplyCombine* and *TypedTables* do not provide what in SQL is ca Such a query can be alternatively modeled as a hybrid group/join operation. *SplitApplyCombine* provides `leftgroupjoin` to perform precisely this. This is similar to LINQ's `GroupJoin` method. Let us investigate this query with the same data as for `innerjoin`, above. -```julia +```jldoctest join-2 julia> groups = leftgroupjoin(getproperty(:id), getproperty(:customer_id), customers, orders) -Dict{Int64,Table{NamedTuple{(:id, :name, :address, :customer_id, :items),Tuple{Int64,String,String,Int64,String}},1,NamedTuple{(:id, :name, :address, :customer_id, :items),Tuple{Array{Int64,1},Array{String,1},Array{String,1},Array{Int64,1},Array{String,1}}}}} with 3 entries: - 2 => Table with 5 columns and 2 rows:… - 3 => Table with 5 columns and 2 rows:… - 1 => Table with 5 columns and 0 rows:… +3-element Dictionaries.Dictionary{Int64, Vector{@NamedTuple{id::Int64, name::String, address::String, customer_id::Int64, items::String}}}: + 1 │ @NamedTuple{id::Int64, name::String, address::String, customer_id::Int64, … + 2 │ @NamedTuple{id::Int64, name::String, address::String, customer_id::Int64, … + 3 │ @NamedTuple{id::Int64, name::String, address::String, customer_id::Int64, … julia> groups[1] -Table with 5 columns and 0 rows: - id name address customer_id items - ┌────────────────────────────────────── +@NamedTuple{id::Int64, name::String, address::String, customer_id::Int64, items::String}[] julia> groups[2] -Table with 5 columns and 2 rows: - id name address customer_id items - ┌──────────────────────────────────────────── - 1 │ 2 Bob 163 Moon Road 2 Socks - 2 │ 2 Bob 163 Moon Road 2 Tie +2-element Vector{@NamedTuple{id::Int64, name::String, address::String, customer_id::Int64, items::String}}: + (id = 2, name = "Bob", address = "163 Moon Road", customer_id = 2, items = "Socks") + (id = 2, name = "Bob", address = "163 Moon Road", customer_id = 2, items = "Tie") julia> groups[3] -Table with 5 columns and 2 rows: - id name address customer_id items - ┌───────────────────────────────────────────────────── - 1 │ 3 Charlie 6 George Street 3 Shirt - 2 │ 3 Charlie 6 George Street 3 Underwear +2-element Vector{@NamedTuple{id::Int64, name::String, address::String, customer_id::Int64, items::String}}: + (id = 3, name = "Charlie", address = "6 George Street", customer_id = 3, items = "Shirt") + (id = 3, name = "Charlie", address = "6 George Street", customer_id = 3, items = "Underwear") ``` -As you can see - 3 groups were identified, according to the distinct keys in the `id` column of `customers`. While the first customer had no associated orders, note that an empty group has nonetheless been created. Much like SQL's `LEFT OUTER JOIN` command, `leftgroupjoin` lets us handle the case that no matching data is found. While SQL achieves this by noting there is `missing` data in the columns associated with the right table, here we use a set of nested containers (dictionaries of tables of rows) to denote the relationship. \ No newline at end of file +As you can see - 3 groups were identified, according to the distinct keys in the `id` column of `customers`. While the first customer had no associated orders, note that an empty group has nonetheless been created. Much like SQL's `LEFT OUTER JOIN` command, `leftgroupjoin` lets us handle the case that no matching data is found. While SQL achieves this by noting there is `missing` data in the columns associated with the right table, here we use a set of nested containers (dictionaries of tables of rows) to denote the relationship. diff --git a/docs/src/man/map.md b/docs/src/man/map.md index fdd5af8..1fb83f8 100644 --- a/docs/src/man/map.md +++ b/docs/src/man/map.md @@ -1,3 +1,8 @@ +```@meta +DocTestSetup = quote + using TypedTables +end +``` # Mapping rows of data Some operations on your data will act by mapping each row of data in a table to a value, or even to new rows (in the case of relational operations). In either case, you are mapping an element of table (which is an array whose elements are rows) to create a new array of computed elements (whose elements may or may not be rows, and thus may or may not be a `Table`). @@ -8,7 +13,7 @@ In Julia, the idiomatic way to perform such an operation is with the `map` funct One very simple example of this is extracting a column, let's say the column called `name` from a table of people's names and ages. -```julia +```jldoctest map julia> t = Table(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37]) Table with 2 columns and 3 rows: name age @@ -18,30 +23,30 @@ Table with 2 columns and 3 rows: 3 │ Charlie 37 julia> map(row -> row.name, t) -3-element Array{String,1}: - "Alice" - "Bob" +3-element Vector{String}: + "Alice" + "Bob" "Charlie" ``` This has returned and standard Julia array, which will be a *copy* of the array of the `name` column. We could also do a more complicated calculation. -```julia +```jldoctest map julia> is_old = map(row -> row.age > 40, t) -3-element Array{Bool,1}: - false - true - false +3-element Vector{Bool}: + 0 + 1 + 0 ``` Depending on your definition of "old", we have identified two younger people and one older person - though I suspect that Bob may have a different definition of old than Alice does. One can also `map` rows, which are `NamedTuple`s, to new `NamedTuples`, which will naturally result in a new tabular structure. Here is an example where we simply copy the names into a new table (but change the column name to `firstname`): -```julia +```jldoctest map julia> map(row -> (firstname = row.name,), t) Table with 1 column and 3 rows: firstname - ┌──────────── + ┌────────── 1 │ Alice 2 │ Bob 3 │ Charlie @@ -51,7 +56,7 @@ Internally, this is leveraging Julia's `similar` interface for constructing new Putting this all together, we can create a brand-new table using `map` to manipulate both columns. -```julia +```jldoctest map julia> map(row -> (name = row.name, is_old = row.age > 40), t) Table with 2 columns and 3 rows: name is_old @@ -65,22 +70,21 @@ Table with 2 columns and 3 rows: One can easily use `for` loops to iterate over your data and perform whatever mapping is required. For example, this loop takes the `first` character of the elements of the `name` column. -```julia +```jldoctest map julia> function firstletter(t::Table) - out = Vector{Char}(undef, length(t)) - - for i in 1:length(t) - out[i] = first(t.name[i]) - end - - return out -end + out = Vector{Char}(undef, length(t)) + for i in 1:length(t) + out[i] = first(t.name[i]) + end + return out + end +firstletter (generic function with 1 method) julia> firstletter(t) -3-element Array{Char,1}: - 'A' - 'B' - 'C' +3-element Vector{Char}: + 'A': ASCII/Unicode U+0041 (category Lu: Letter, uppercase) + 'B': ASCII/Unicode U+0042 (category Lu: Letter, uppercase) + 'C': ASCII/Unicode U+0043 (category Lu: Letter, uppercase) ``` Julia will use the type information it knows about `t` to create fast, compiled code. (Pro tip: to make the above loop *optimal*, adding an `@inbounds` annotation on the same line before the `for` loop will remove redundant array bounds checking and make the loop execute faster). @@ -94,7 +98,7 @@ Julia syntax provide for compact syntax for generators and comprehensions to def Tables can be constructed from `Geneartor`s, allowing for some pretty neat syntax. -```julia +```jldoctest map julia> Table((name=row.name, isold=row.age>40) for row in t) Table with 2 columns and 3 rows: name isold @@ -119,9 +123,9 @@ When we want to perform more complex tasks, such as `group` or `innerjoin`, we m Given a `row`, a field is extracted with the `row.name` syntax - which Julia transforms to the function call `getproperty(row, :name)`. This package defines `getproperty(:name)` as returning a new, single-argument *function* that takes a `row` and returns `row.name`. Thus, one way of projecting a table down to a single column is to use the `getproperty` function, like so: -``` +```jldoctest map julia> map(getproperty(:name), t) -3-element Array{String,1}: +3-element Vector{String}: "Alice" "Bob" "Charlie" @@ -135,15 +139,15 @@ A naive implementation of this would be to iterate the rows and *then* project d If we wish to get more than one column, to subset our data or to create a multi-column group or join key, we can use the `getproperties` function, which works like `getproperty` but accepts a tuple of `Symbol`s for the column names. This works well on rows or tables. -``` +```jldoctest julia> getproperties((a=1, b=2, c=3), (:a, :c)) (a = 1, c = 3) ``` By specifying just column names you can get the a curried function, as for `getproperty`. Even with just a single column selected, this function preserves the column names, in contrast to `getproperty`. For example: -``` +```jldoctest map julia> map(getproperties((:name,)), t) -Table with 2 columns and 3 rows: +Table with 1 column and 3 rows: name ┌──────── 1 │ Alice @@ -156,7 +160,7 @@ Table with 2 columns and 3 rows: Sometimes one just wants to remove one or more columns from a table, which we can do easily enough for rows or tables using `deleteproperty` and `deleteproperties`. -``` +```jldoctest map julia> deleteproperty(t, :age) Table with 1 column and 3 rows: name @@ -178,7 +182,7 @@ Table with 1 column and 3 rows: To help create arbitrary computations using data from multiple columns, the `@Compute` convenience macro is provided. Variables starting with `$` will be taken as column names. -``` +```jldoctest map julia> map(@Compute($age > 40), t) 3-element Vector{Bool}: 0 @@ -192,7 +196,7 @@ The macro is able to pass along information about which columns are necessary to The `@Select` macro goes one step further, allowing you to assemble multiple columns of data in a single step. Columns can be copied by name, and new columns can be computed. -``` +```jldoctest map julia> map(@Select(name, age, is_old = $age > 40), t) Table with 3 columns and 3 rows: name age is_old @@ -208,9 +212,9 @@ Once again, only the subset of columns required for each computation is iterated Since tables are just arrays, the broadcast operation is defined and behaves similarly to `map`. -``` +```jldoctest map julia> f = @Select(name, age, is_old = $age > 40) -(::TypedTables.Select{(:name, :age, :is_old), Tuple{TypedTables.GetProperty{:name}, TypedTables.GetProperty{:age}, TypedTables.Compute{(:age,), var"#9#10"}}}) (generic function with 1 method) +(::TypedTables.Select{(:name, :age, :is_old), Tuple{TypedTables.GetProperty{:name}, TypedTables.GetProperty{:age}, TypedTables.Compute{(:age,), var"#13#14"}}}) (generic function with 1 method) julia> f.(t) Table with 3 columns and 3 rows: @@ -223,4 +227,4 @@ Table with 3 columns and 3 rows: ## Lazy mapping -It is also worth mentioning the possibility of lazily mapping the values. Functions such as `mapview` from *SplitApplyCombine* can let you construct a "view" of a new table based on existing data. This way you can avoid using up precious resources, like RAM, yet can still call up data upon demand. It is worth noting that strategies like this may be used internally in more complicated grouping and joining operations. \ No newline at end of file +It is also worth mentioning the possibility of lazily mapping the values. Functions such as `mapview` from *SplitApplyCombine* can let you construct a "view" of a new table based on existing data. This way you can avoid using up precious resources, like RAM, yet can still call up data upon demand. It is worth noting that strategies like this may be used internally in more complicated grouping and joining operations. diff --git a/docs/src/man/reduce.md b/docs/src/man/reduce.md index 4f4d53d..c130247 100644 --- a/docs/src/man/reduce.md +++ b/docs/src/man/reduce.md @@ -1,3 +1,8 @@ +```@meta +DocTestSetup = quote + using TypedTables +end +``` # Reducing data Here we demonstrate how to ask a few questions with "scalar" answers - like "Does the table contain *x*?", or "What is the average value of *y*?" @@ -6,7 +11,7 @@ Here we demonstrate how to ask a few questions with "scalar" answers - like "Doe One of the most basic questions to ask is: "Is this element in the table/column?". Julia's `in` operator is perfect for this. -```julia +```jldoctest reduce julia> t = Table(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37]) Table with 2 columns and 3 rows: name age @@ -28,7 +33,7 @@ The `in` function can also be used as an infix operator, as in `"Alice" in t.nam The `count` method is useful for asking how many rows satisfy a certain criterion. -```julia +```jldoctest reduce julia> count(row -> row.age > 40, t) 1 ``` @@ -37,7 +42,7 @@ julia> count(row -> row.age > 40, t) Individual columns can be reduced in the typical way for Julia arrays. Some examples. -```julia +```jldoctest reduce julia> sum(t.age) 104 @@ -58,7 +63,7 @@ Note that `join` is a string joining function; see `innerjoin` (from *SplitApply It's just as easy to calculate multi-column statistics by reducing over the entire table. -```julia +```jldoctest reduce julia> mapreduce(row -> length(row.name) * row.age, +, t) 510 -``` \ No newline at end of file +``` diff --git a/docs/src/man/reference.md b/docs/src/man/reference.md index 7747ad8..002e1b4 100644 --- a/docs/src/man/reference.md +++ b/docs/src/man/reference.md @@ -23,9 +23,11 @@ TypedTables.columnnames ## Property selection ```@docs +Base.getproperty TypedTables.getproperties TypedTables.deleteproperty TypedTables.deleteproperties +TypedTables.propertytype ``` ## Convenience macros diff --git a/docs/src/man/table.md b/docs/src/man/table.md index 20c11ba..f0fa2c9 100644 --- a/docs/src/man/table.md +++ b/docs/src/man/table.md @@ -1,8 +1,13 @@ +```@meta +DocTestSetup = quote + using TypedTables +end +``` # Table It's simple to get started and create a table! -```julia +```jldoctest julia> using TypedTables julia> t = Table(a = [1, 2, 3], b = [2.0, 4.0, 6.0]) @@ -17,7 +22,7 @@ julia> t[1] # Get first row (a = 1, b = 2.0) julia> t.a # Get column `a` -3-element Array{Int64,1}: +3-element Vector{Int64}: 1 2 3 @@ -31,7 +36,7 @@ Table is actually a Julia array type, where each element (row) is a `NamedTuple` * Internally, a `Table` stores a (named) tuple of arrays, and is a convenient structure for column-based storage of tabular data. -Thus, manipulating data as a `Table` is as easy as manipulating arrays and named tuples - which is something Julia was specifically designed to make simple, efficient and *fun*. +Thus, manipulating data as a `Table` is as easy as manipulating arrays and named tuples - which is something Julia was specifically designed to make simple, efficient and *fun*. `Table`s (and their columns) may be an `AbstractArray` of any dimensionality. This lets you take advantage of Julia's powerful array functionality, such as multidimensional broadcasting. Each column must be an array of the same dimensionality and size of the other columns. @@ -49,7 +54,7 @@ Finally, since `Table` is unoppinionated about the underlying array storage (and The easiest way to create a table from columns is with keyword arguments, such as -```julia +```jldoctest creating-tables julia> t = Table(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37]) Table with 2 columns and 3 rows: name age @@ -62,7 +67,7 @@ The constructor will equally accept a `NamedTuple` of columns, as `Table((name = Also, one can easily convert the row-storage-based vector of named tuples into columnar storage using the `Table` constructor: -```julia +```jldoctest julia> Table([(name = "Alice", age = 25), (name = "Bob", age = 42), (name = "Charlie", age = 37)]) Table with 2 columns and 3 rows: name age @@ -78,13 +83,13 @@ Table with 2 columns and 3 rows: A single row of a `Table` is just a `NamedTuple`, which is easy to access. -```julia +```jldoctest creating-tables julia> t[1] (name = "Alice", age = 25) ``` Multiple rows can be indexed similarly to standard arrays in Julia: -``` +```jldoctest creating-tables julia> t[2:3] Table with 2 columns and 2 rows: name age @@ -94,7 +99,7 @@ Table with 2 columns and 2 rows: ``` One can interrogate the `length`, `size` or `axes` of a `Table` just like any other `AbstractArray`: -``` +```jldoctest creating-tables julia> length(t) 3 @@ -106,8 +111,10 @@ julia> size(t) Finally, if the backing arrays support mutation, rows can be mutated with `setindex!` -``` -julia> t[3] = (name = Charlie, name = 38) # Charlie had a birthday +```jldoctest creating-tables +julia> t[3] = (name = "Charlie", age = 38); # Charlie had a birthday + +julia> t Table with 2 columns and 3 rows: name age ┌───────────── @@ -121,24 +128,24 @@ Similarly, rows can be added or removed with `push!`, `pop!` and [so-on](https:/ ### Column access A single column can be recovered using Julia's new `getproperty` syntax using the `.` operator. -```julia +```jldoctest creating-tables julia> t.name -3-element Array{String,1}: - "Alice" - "Bob" +3-element Vector{String}: + "Alice" + "Bob" "Charlie" ``` Currently, the simplest way to extract more than one column is to construct a brand new table out of the columns (as in `table2 = Table(column1 = table1.column1, column2 = table1.column2, ...)`). The columns of a `Table` can be accessed directly as a `NamedTuple` of arrays using the `columns` function. -```julia +```jldoctest creating-tables julia> columns(t) -(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37]) +(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 38]) ``` There is a `columnnames` function for getting the names of the columns: -```julia +```jldoctest creating-tables julia> columnnames(t) (:name, :age) ``` @@ -151,7 +158,7 @@ Finally, the values contained in entire columns may be updated using `.=`, such From the above, we can see two identical ways to get a cell of data: -```julia +```jldoctest creating-tables julia> t[1].name "Alice" @@ -163,7 +170,7 @@ While Julia's compiler will elide a lot of unnecessary code, you may find it fas Similarly, the value of a cell can be updated via `setindex!`, for example using the syntax `t.name[1] = "Alicia"`. Note that the syntax `t[1].name = "Alicia"` will error because you are trying to mutate `t[1]`, which is an immutable *copy* of the row (completely independent from `t`). -## Comparison with other packages +## Comparison with other packages ### `DataFrame` diff --git a/docs/src/man/tutorial.md b/docs/src/man/tutorial.md index 759a85c..a6a49da 100644 --- a/docs/src/man/tutorial.md +++ b/docs/src/man/tutorial.md @@ -1,3 +1,8 @@ +```@meta +DocTestSetup = quote + using TypedTables +end +``` # Quick start tutorial After reading this tutorial, you should be able to use Julia to perform a range of data @@ -11,7 +16,7 @@ It's simple to get started and create a table! A `Table` is a wrapper around column arrays. Suppose you have an array containing names and an array containing ages, then you can create a table with two columns: -```julia +```jldoctest tutorial julia> t = Table(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37]) Table with 2 columns and 3 rows: name age @@ -29,7 +34,7 @@ collection of named tuples is a "relation", and `Table`s are useful for performi You can access elements (rows) exactly like any other Julia array. -```julia +```jldoctest tutorial julia> t[1] (name = "Alice", age = 25) @@ -43,7 +48,7 @@ Table with 2 columns and 2 rows: A element (row) of the table can be updated with the usual array syntax. -```julia +```jldoctest tutorial julia> t[1] = (name = "Alice", age = 26); # Alice had a birthday! julia> t @@ -57,18 +62,18 @@ Table with 2 columns and 3 rows: You can easily access a column by the tables "properties", use the `.` operator. -```julia +```jldoctest tutorial julia> t.name -3-element Array{String,1}: - "Alice" - "Bob" +3-element Vector{String}: + "Alice" + "Bob" "Charlie" ``` You can ask what the properties (column names) of a `Table` with the `propertynames` function (as well as the `columnnames` function). -```julia +```jldoctest tutorial julia> propertynames(t) (:name, :age) ``` @@ -78,7 +83,7 @@ compiler works with when considering Julia code itself. Individual cells can be accessed in two, symmetric ways. -```julia +```jldoctest tutorial julia> t.name[2] "Bob" @@ -90,7 +95,7 @@ Note that the first way is more efficient, and recommended, because in the secon intermediate value `t[2]` is assembled from the elements of *all* the columns. The first syntax also supports updating. -```julia +```jldoctest tutorial julia> t.name[2] = "Robert"; # No nicknames here... julia> t @@ -109,7 +114,7 @@ table by the old variable name, if you want. Multiple tables and additional columns can be created in the one `Table` constructor. For example, it is easy to add an additional column. -```julia +```jldoctest tutorial julia> Table(t; lastname = ["Smith", "Smith", "Smith"]) Table with 3 columns and 3 rows: name age lastname @@ -121,7 +126,7 @@ Table with 3 columns and 3 rows: And we can delete a column by setting it to `nothing`. -```julia +```jldoctest tutorial julia> Table(t; age = nothing) Table with 1 column and 3 rows: name @@ -139,7 +144,7 @@ is able to produce lightning fast machine code for processing your data. Sometimes, it *is* handy to be able to add, remove and rename columns without create a new `Table` container. The `FlexTable` type allows for this. -```julia +```jldoctest tutorial-flextable julia> ft = FlexTable(names = ["Alice", "Bob", "Charlie"]) FlexTable with 1 column and 3 rows: names @@ -161,7 +166,7 @@ FlexTable with 2 columns and 3 rows: A column can be deleted by setting it to `nothing`. -```julia +```jldoctest tutorial-flextable julia> ft.age = nothing; julia> ft @@ -185,7 +190,7 @@ The recommended way to handle missing data in Julia is by using `missing`, which with its very own type `Missing`. For example, we may create a table where some people haven't specified their age. -```julia +```jldoctest julia> Table(name = ["Alice", "Bob", "Charlie"], age = [25, missing, 37]) Table with 2 columns and 3 rows: name age @@ -215,7 +220,7 @@ Charlie,37 We can load this file from disk using the `CSV.read` function. -```julia +```julia-repl julia> using CSV julia> t = CSV.read("input.csv", Table) @@ -229,17 +234,17 @@ FlexTable with 2 columns and 3 rows: Similary, we can write a table to a new file `output.csv` with the `CSV.write` function. -```julia +```julia-repl julia> CSV.write("output.csv", t) ``` ## Finding data Julia's broadcasting and indexing syntax can work together to make it easy to find rows -of data based on given creteria. Suppose we wanted to find all the "old" people in the +of data based on given creteria. Suppose we wanted to find all the "old" people in the table. -```julia +```jldoctest find julia> t = Table(name = ["Alice", "Bob", "Charlie"], age = [25, 42, 37]) Table with 2 columns and 3 rows: name age @@ -249,19 +254,19 @@ Table with 2 columns and 3 rows: 3 │ Charlie 37 julia> t.age .> 40 -3-element BitArray{1}: - false - true - false +3-element BitVector: + 0 + 1 + 0 ``` Bob and Alice might disagree about what "old" means, but here we have identified all the -people over 40 years of age. Note the difference between the "scalar" operator `>` and the +people over 40 years of age. Note the difference between the "scalar" operator `>` and the "broadcasting" operator `.>`. We can use "logical" indexing to collect the rows for which the above predicate is `true`. -```julia +```jldoctest find julia> t[t.age .> 40] Table with 2 columns and 1 row: name age @@ -277,14 +282,14 @@ Julia has a range of standard functions for asking common questions about a set For example, we can use the `in` operator to test if an entry is in a column. -```julia +```jldoctest find julia> "Bob" in t.name true ``` Or if a given row is `in` the table. -```julia +```jldoctest find julia> (name = "Bob", age = 41) in t false ``` @@ -294,7 +299,7 @@ false We can `sum` columns, and with the `Statistics` standard library, we can find the `mean`, `median`, and so-on. -```julia +```jldoctest find julia> sum(t.age) 104 @@ -311,13 +316,13 @@ By these metrics, Bob's age *is* above average! ## Mapping data -Functions which map rows to new rows can be used to create new tables. +Functions which map rows to new rows can be used to create new tables. Below, we create an annonymous function which takes a row containing a name and an age, and returns an inital letter and whether the person is old (greater than 40), and use Julia's built-in `map` function. -```julia +```jldoctest find julia> map(row -> (initial = first(row.name), is_old = row.age > 40), t) Table with 2 columns and 3 rows: initial is_old @@ -334,7 +339,7 @@ The `@Select` macro returns a function that can map a row to a new row (or a tab new table) by defining a functional mapping for each output column. The above example can alternatively be written as: -```julia +```jldoctest find julia> map(@Select(initial = first($name), is_old = $age > 40), t) Table with 2 columns and 3 rows: initial is_old @@ -347,7 +352,7 @@ Table with 2 columns and 3 rows: For shorthand, the `= ...` can be omitted to simply extract a column. For example, we can reorder the columns via -``` +```jldoctest find julia> @Select(age, name)(t) Table with 2 columns and 3 rows: age name @@ -362,11 +367,11 @@ of each row.) The `@Compute` macro returns a function that maps a row to a value. As for `@Select`, the input column names are prepended with `$`, for example: -```julia +```jldoctest find julia> map(@Compute($name), t) -3-element Array{String,1}: - "Alice" - "Bob" +3-element Vector{String}: + "Alice" + "Bob" "Charlie" ``` @@ -389,8 +394,11 @@ and joining data (if you wish, you may view its documentation We will demonstrate grouping data with a slightly more complex dataset. -```julia -julia> t2 = Table(firstname = ["Alice", "Bob", "Charlie", "Adam", "Eve", "Cindy", "Arthur"], lastname = ["Smith", "Smith", "Smith", "Williams", "Williams", "Brown", "King"], age = [25, 42, 37, 65, 18, 33, 54]) +```jldoctest tutorial-grouping +julia> t2 = Table(; + firstname = ["Alice", "Bob", "Charlie", "Adam", "Eve", "Cindy", "Arthur"], + lastname = ["Smith", "Smith", "Smith", "Williams", "Williams", "Brown", "King"], + age = [25, 42, 37, 65, 18, 33, 54]) Table with 3 columns and 7 rows: firstname lastname age ┌───────────────────────── @@ -406,15 +414,15 @@ Table with 3 columns and 7 rows: Let us begin with basic usage of the `group` function from *SplitApplyCombine*, where we wish to group firstnames by their initial letter. -```julia +```jldoctest tutorial-grouping julia> using SplitApplyCombine julia> group(first, t2.firstname) -Dict{Char,Array{String,1}} with 4 entries: - 'C' => ["Charlie", "Cindy"] - 'A' => ["Alice", "Adam", "Arthur"] - 'E' => ["Eve"] - 'B' => ["Bob"] +4-element Dictionaries.Dictionary{Char, Vector{String}}: + 'A' │ ["Alice", "Adam", "Arthur"] + 'B' │ ["Bob"] + 'C' │ ["Charlie", "Cindy"] + 'E' │ ["Eve"] ``` The `group` function returns a dictionary (`Dict`) where the grouping key is calculated on @@ -424,13 +432,13 @@ firstnames starting with the letter `A` belong to the same group, and so on. Sometimes you may want to transform the grouped data - you can do so by passing a second mapping function. For example, we may want to group firstnames by lastname. -```julia -julia> group(@Compute($lastname), $Compute($firstname), t2) -Dict{String,Array{String,1}} with 4 entries: - "King" => ["Arthur"] - "Williams" => ["Adam", "Eve"] - "Brown" => ["Cindy"] - "Smith" => ["Alice", "Bob", "Charlie"] +```jldoctest tutorial-grouping +julia> group(@Compute($lastname), @Compute($firstname), t2) +4-element Dictionaries.Dictionary{String, Vector{String}}: + "Smith" │ ["Alice", "Bob", "Charlie"] + "Williams" │ ["Adam", "Eve"] + "Brown" │ ["Cindy"] + "King" │ ["Arthur"] ``` Note that the returned structure is still not a `Table` at all - it is a dictionary with the unique `lastname` values as keys, returing (non-tabular) arrays of `firstname`. @@ -438,18 +446,18 @@ unique `lastname` values as keys, returing (non-tabular) arrays of `firstname`. If instead, our group elements are rows (named tuples), each group will itslef be a table. For example, we can keep the entire row by dropping the second function. -```julia +```jldoctest tutorial-grouping julia> families = group(@Compute($lastname), t2) -Groups{String,Any,Table{NamedTuple{(:firstname, :lastname, :age),Tuple{String,String,Int64}},1,NamedTuple{(:firstname, :lastname, :age),Tuple{Array{String,1},Array{String,1},Array{Int64,1}}}},Dict{String,Array{Int64,1}}} with 4 entries: - "King" => Table with 3 columns and 1 row:… - "Williams" => Table with 3 columns and 2 rows:… - "Brown" => Table with 3 columns and 1 row:… - "Smith" => Table with 3 columns and 3 rows:… +4-element Dictionaries.Dictionary{String, Table{@NamedTuple{firstname::String, lastname::String, age::Int64}, 1, @NamedTuple{firstname::Vector{String}, lastname::Vector{String}, age::Vector{Int64}}}}: + "Smith" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir… + "Williams" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir… + "Brown" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir… + "King" │ @NamedTuple{firstname::String, lastname::String, age::Int64}[(fir… ``` The results are only summarized above (for compactness), but can be easily accessed. -```julia +```jldoctest tutorial-grouping julia> families["Smith"] Table with 3 columns and 3 rows: firstname lastname age @@ -473,7 +481,7 @@ concatenate strings). Let's suppose we have a small database of customers, and the items they have ordered from an online store. -```julia +```jldoctest tutorial-joining julia> customers = Table(id = 1:3, name = ["Alice", "Bob", "Charlie"], address = ["12 Beach Street", "163 Moon Road", "6 George Street"]) Table with 3 columns and 3 rows: id name address @@ -497,7 +505,9 @@ this column to determine the `address` that we need to send the `items` to. The function expects two functions, to describe the joining key of the first table and the joining key of the second table. We will use `getproperty` to select the columns. -```julia +```jldoctest tutorial-joining +julia> using SplitApplyCombine + julia> innerjoin(@Compute($id), @Compute($customer_id), customers, orders) Table with 5 columns and 4 rows: id name address customer_id items @@ -518,5 +528,5 @@ types of joins are covered in later sections of this manual. Congratulations on completing the introductory tutorial. You should now know enough basics to get started with data analysis in Julia using *TypedTables.jl* and related packages. -The following setions of the manual demonstrate more advanced techniques, explain the +The following setions of the manual demonstrate more advanced techniques, explain the design of this (and related) packages, and provide an API reference.