-
Notifications
You must be signed in to change notification settings - Fork 47
Optimization: Use dynamic array while evaluating circuit #1344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1344 +/- ##
==========================================
+ Coverage 79.18% 79.22% +0.03%
==========================================
Files 111 111
Lines 30441 30496 +55
==========================================
+ Hits 24106 24159 +53
- Misses 6335 6337 +2 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just small comments.
This would definitely require another PR but since the use of dynamic arrays are increasing, I think it would be nice to wrap its operations inside functions. For example, to update an element, we need to offset the pointer with a gep
operation. Then we need to load the value and operate it. And, finally, we need to store it.
Perhaps having utility functions for these arrays would be better.
src/libfuncs/circuit.rs
Outdated
let circuit_length = block.const_int( | ||
context, | ||
location, | ||
(1 + circuit_info.n_inputs + circuit_info.values.len()) * u384_layout.pad_to_align().size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be better to use the repeat_layout
function here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is equivalent. If you prefer I can change it to use repeat_layout
instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would use it to maintain a standard.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 4293c49.
src/libfuncs/circuit.rs
Outdated
let circuit_input_length = block.const_int( | ||
context, | ||
location, | ||
circuit_info.n_inputs * u384_layout.pad_to_align().size(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done 4293c49.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! One minor question.
// Unknown values are represented as None | ||
) -> Result<[&'this Block<'ctx>; 2]> { | ||
// Throughout the evaluation of the circuit we maintain an MLIR dynamic | ||
// array of gate values. It will contain 1 + N_INPUTS + N_GATES elements. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it need to have that length? I saw that the first element is just a 1 but as far as I could understand we never use that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't manually use it, but some gate offsets implicitly refer to the first gate (with index 0), and expect it to have a value of 1.
/// This function consumes the `circuit_data` input variable, and frees the underlying memory. | ||
/// | ||
/// The original Cairo hint evaluates all gates, even in case of failure. | ||
/// This implementation exits on first error, as there is no need for the partial outputs yet. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why there is no need for the partial outputs yet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The CircuitPartialOutputs
type is a Cairo extern type, which means that its opaque and the user cannot access it fields directly. The only way to manipulate an extern type is through libfuncs.
There is currently no libfunc that receives that type. This means that from Cairo Native's point of view, its a noop type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, just one question
Both |
Pull request was converted to draft
The base branch was changed.
Right know, we keep a vector of MLIR values while evaluating a circuit (one value per gate). This implies that we are using a lot of MLIR values at the same time, which puts pressure on the LLVM optimization phases.
This PR changes it to use a dynamic array of gate values, which uses a single register (a pointer to the array).
Minor additional optimization: Instead of converting gate values from integer to struct form in the
circuit_eval
libfunc, they are lazily converted in theget_output
libfunc. This avoid converting gate values that will never be read.EDIT: It seems that the "minor additional optimization" actually was causing the performance improvement. The main optimization of this PR doesn't achieve the expected results (there is no improvement). An additional PR was created with just the minor optimization: #1348
Benchmark
These benchmarks were executed in a Macbook M4 Pro.
I compiled several circuit-heavy contract classes, and compared the compilation time before and this PR. The contract classes were compiled with optimization level
-O2
.I benchmarked transactions involving these contracts, with optimization level
-O3
. There seems to be no significant performance change.Correctness
I replayed 25K blocks (1728032 transactions) and encountered no state diffs.
Checklist