This repository was archived by the owner on Jul 19, 2023. It is now read-only.
-
-
Notifications
You must be signed in to change notification settings - Fork 72
This repository was archived by the owner on Jul 19, 2023. It is now read-only.
Performance issues with nonlinear_diffusion! #422
Copy link
Copy link
Open
Description
nonlinear_diffusion!
appears to be quite slow and unexpectedly allocates. MWE (including simple reference implementation):
using BenchmarkTools
using DiffEqOperators
const nknots = 100
# grid edges
x = LinRange(0.0,10.0,nknots+1) |> Vector
# grid cell centers
x_c = (x[1:end-1] .+ x[2:end]) ./ 2
# state variable
u = sin.(2π.*x_c)
# uniform grid spacing pretending to be non-uniform ;)
dx = diff(x_c)
# boundary padded dx
dx_pad = [dx[1], dx..., dx[end]]
# boundary padded u
q = [0.0,u...,0.0]
# conductivity (diffusion function)
# note: really this should have length(x) because it is defined on the
# boundaries of the grid cells, not on the centers? but nonlinear_diffusion!
# seems to want it to match the length of q.
k = 0.5.*ones(length(q))
# output
du = similar(u)
@benchmark nonlinear_diffusion!($du,1,1,2,$k,$q,$dx_pad,$nknots)
"""
nonlineardiffusion!(∂y::AbstractVector, x::AbstractVector, Δx::AbstractVector, k::AbstractVector, Δk::AbstractArray)
Second order Laplacian with non-linear diffusion function, k.
"""
function nonlineardiffusion!(∂y::AbstractVector, x::AbstractVector, Δx::AbstractVector, k::AbstractVector, Δk::AbstractArray)
@inbounds let x₁ = (@view x[1:end-2]),
x₂ = (@view x[2:end-1]),
x₃ = (@view x[3:end]),
k₁ = (@view k[1:end-1]),
k₂ = (@view k[2:end]),
Δx₁ = (@view Δx[1:end-1]),
Δx₂ = (@view Δx[2:end]);
@. ∂y = (k₂*(x₃ - x₂)/Δx₂ - k₁*(x₂-x₁)/Δx₁)/Δk
end
end
du2 = similar(u)
k_ = k[1:end-1] # nonlineardiffusion! expects k to be on the boundaries
dk = diff(x)
@benchmark nonlineardiffusion!($du2, $q, $dx_pad, $k_,$ dk)
@assert all(du .≈ du2)
Benchmark result from nonlinear_diffusion!
BechmarkTools.Trial: 6922 samples with 1 evaluations.
Range (min … max): 556.640 μs … 14.328 ms ┊ GC (min … max): 0.00% … 93.33%
Time (median): 644.482 μs ┊ GC (median): 0.00%
Time (mean ± σ): 716.439 μs ± 607.819 μs ┊ GC (mean ± σ): 5.61% ± 6.28%
▇ █▁
▁▂▇▃█▇██▆▇▄▆▅▄▅▅▄▃▃▂▂▂▂▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
557 μs Histogram: frequency by time 1.21 ms <
Memory estimate: 273.14 KiB, allocs estimate: 3766.
..and from reference implementation:
BechmarkTools.Trial: 10000 samples with 140 evaluations.
Range (min … max): 705.121 ns … 4.145 μs ┊ GC (min … max): 0.00% … 0.00%
Time (median): 730.871 ns ┊ GC (median): 0.00%
Time (mean ± σ): 789.835 ns ± 164.178 ns ┊ GC (mean ± σ): 0.00% ± 0.00%
▇▁█▅▄▁ ▂▁▂▁▁▁▁ ▁ ▁
█████████████████████▇████████▇██▇█▇▇█▇▇█▇█▆▆▇▅▆▆▆▆█▇▇▆▅▅▄▄▄▅ █
705 ns Histogram: log(frequency) by time 1.37 μs <
Memory estimate: 0 bytes, allocs estimate: 0.
edit: forgot to interpolate the arguments on the reference impl run; but it doesn't seem to make much of a difference
Metadata
Metadata
Assignees
Labels
No labels