Skip to content
This repository was archived by the owner on Jul 19, 2023. It is now read-only.
This repository was archived by the owner on Jul 19, 2023. It is now read-only.

Performance issues with nonlinear_diffusion! #422

@bgroenks96

Description

@bgroenks96

nonlinear_diffusion! appears to be quite slow and unexpectedly allocates. MWE (including simple reference implementation):

using BenchmarkTools
using DiffEqOperators

const nknots = 100
# grid edges
x = LinRange(0.0,10.0,nknots+1) |> Vector
# grid cell centers
x_c = (x[1:end-1] .+ x[2:end]) ./ 2
# state variable
u = sin.(2π.*x_c)
# uniform grid spacing pretending to be non-uniform ;)
dx = diff(x_c)
# boundary padded dx
dx_pad = [dx[1], dx..., dx[end]]
# boundary padded u
q = [0.0,u...,0.0]
# conductivity (diffusion function)
# note: really this should have length(x) because it is defined on the
# boundaries of the grid cells, not on the centers? but nonlinear_diffusion!
# seems to want it to match the length of q.
k = 0.5.*ones(length(q))
# output
du = similar(u)
@benchmark nonlinear_diffusion!($du,1,1,2,$k,$q,$dx_pad,$nknots)

"""
nonlineardiffusion!(∂y::AbstractVector, x::AbstractVector, Δx::AbstractVector, k::AbstractVector, Δk::AbstractArray)

Second order Laplacian with non-linear diffusion function, k.
"""
function nonlineardiffusion!(∂y::AbstractVector, x::AbstractVector, Δx::AbstractVector, k::AbstractVector, Δk::AbstractArray)
    @inbounds let x₁ = (@view x[1:end-2]),
        x₂ = (@view x[2:end-1]),
        x₃ = (@view x[3:end]),
        k₁ = (@view k[1:end-1]),
        k₂ = (@view k[2:end]),
        Δx₁ = (@view Δx[1:end-1]),
        Δx₂ = (@view Δx[2:end]);
        @. ∂y = (k₂*(x₃ - x₂)/Δx₂ - k₁*(x₂-x₁)/Δx₁)/Δk
    end
end

du2 = similar(u)
k_ = k[1:end-1] # nonlineardiffusion! expects k to be on the boundaries
dk = diff(x)
@benchmark nonlineardiffusion!($du2, $q, $dx_pad, $k_,$ dk)

@assert all(du .≈ du2)

Benchmark result from nonlinear_diffusion!

BechmarkTools.Trial: 6922 samples with 1 evaluations.
 Range (min … max):  556.640 μs …  14.328 ms  ┊ GC (min … max): 0.00% … 93.33%
 Time  (median):     644.482 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   716.439 μs ± 607.819 μs  ┊ GC (mean ± σ):  5.61% ±  6.28%

      ▇ █▁                                                       
  ▁▂▇▃█▇██▆▇▄▆▅▄▅▅▄▃▃▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  557 μs           Histogram: frequency by time         1.21 ms <

 Memory estimate: 273.14 KiB, allocs estimate: 3766.

..and from reference implementation:

BechmarkTools.Trial: 10000 samples with 140 evaluations.
 Range (min … max):  705.121 ns …   4.145 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     730.871 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   789.835 ns ± 164.178 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▇▁█▅▄▁    ▂▁▂▁▁▁▁     ▁                                       ▁
  █████████████████████▇████████▇██▇█▇▇█▇▇█▇█▆▆▇▅▆▆▆▆█▇▇▆▅▅▄▄▄▅ █
  705 ns        Histogram: log(frequency) by time       1.37 μs <

 Memory estimate: 0 bytes, allocs estimate: 0.

edit: forgot to interpolate the arguments on the reference impl run; but it doesn't seem to make much of a difference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions