-
Notifications
You must be signed in to change notification settings - Fork 500
[SystemDS-#3524] Multi-threading of transformdecode/[SystemDS-#3521] Improved Feature Transformations #2275
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…e done, test passes for Bin
…e done, test passes for Bin
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2275 +/- ##
============================================
- Coverage 72.58% 72.55% -0.04%
- Complexity 46221 46275 +54
============================================
Files 1489 1496 +7
Lines 174193 174561 +368
Branches 34182 34232 +50
============================================
+ Hits 126434 126646 +212
- Misses 38196 38347 +151
- Partials 9563 9568 +5 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Thank you for the patch. I will take a look into this next week @Isso-W. |
small error in test array
# Conflicts: # src/test/java/org/apache/sysds/test/functions/transform/ColumnDecoderMixedMethodsTest.java
import java.io.IOException; | ||
import java.io.ObjectInput; | ||
import java.io.ObjectOutput; | ||
import java.util.*; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid all import. Only import the required classes.
for( int j=0; j<_colList.length; j++ ) { | ||
int colID = _colList[j]; | ||
double val = UtilFunctions.objectToDouble( | ||
out.getSchema()[colID-1], out.get(i, colID-1)); | ||
long key = UtilFunctions.toLong(val); | ||
out.set(i, colID-1, getRcMapValue(j, key)); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are you iterating all the columns? A column decoder should be called for each column.
if( _onOut ) { //recode on output (after dummy) | ||
for( int i=rl; i<ru; i++ ) { | ||
for( int j=0; j<_colList.length; j++ ) { | ||
int colID = _colList[j]; | ||
double val = UtilFunctions.objectToDouble( | ||
out.getSchema()[colID-1], out.get(i, colID-1)); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove these empty lines that you added.
protected int[] _colList; | ||
protected String[] _colnames = null; | ||
protected ColumnDecoder(ValueType[] schema, int[] colList) { | ||
_schema = schema; | ||
_colList = colList; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why a column list? A column encoder should work on a single column.
long b1 = System.nanoTime(); | ||
out.ensureAllocatedColumns(in.getNumRows()); | ||
|
||
final int outColIndex = _colList[0] - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the outColIndex is always _colList[0] - 1?
for (int j = 0; j < _colList.length; j++) { | ||
double val = in.get(i, j); | ||
if (!Double.isNaN(val)) { | ||
int key = (int) Math.round(val); | ||
double bmin = _binMins[j][key - 1]; | ||
double bmax = _binMaxs[j][key - 1]; | ||
double oval = bmin + (bmax - bmin) / 2 + (val - key) * (bmax - bmin); | ||
out.getColumn(_colList[j] - 1).set(i, oval); | ||
} else { | ||
out.getColumn(_colList[j] - 1).set(i, val); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand why you are iterating all columns in a column decoder.
for( int j=0; j<_colList.length; j++ ) | ||
for( int k=_clPos[j]; k<_cuPos[j]; k++ ) | ||
if( in.get(i, k-1) != 0 ) { | ||
int col = _colList[j] - 1; | ||
Object val = UtilFunctions.doubleToObject(out.getSchema()[col], k-_clPos[j]+1); | ||
synchronized(out) { out.set(i, col, val); } | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A column decoder should work on a single column that is provided.
@Isso-W, please address the comments. And fix the tests as well. Your tests are failing in the transform package, which you should be able to reproduce. |
…erly on Bin and Pass-through.
# Conflicts: # src/test/java/org/apache/sysds/test/functions/transform/ColumnDecoderMixedMethodsTest.java
The latest changes look good @Isso-W. |
@Isso-W, can you please post your plots of FTBench (decoding) here for others to see? |
Sure @phaniarnab, here is the plot. ![]() ![]() ![]() I also created a new test using part of FTbench T9 by running DML, I can also put that test with responding json in PR to make the plot reproduciable. B.t.w. will this PR be merged into main? |
….csv, flight.csv is too large to push
This pull request introduces a new framework for column decoding in Apache SystemDS, with the addition of a base class
ColumnDecoder
and several specialized implementations (ColumnDecoderBin
,ColumnDecoderComposite
,ColumnDecoderRecode
,ColumnDecoderPassthrough
andColumnDecoderDummycode
). These changes provide a flexible and extensible structure for decoding encoded data in matrix-to-frame transformations. Below are the most important changes grouped by theme:Core Framework for Column Decoding
ColumnDecoder
as an abstract base class to define the structure for decoding operations, including methods for decoding (columnDecode
), handling sub-range decoding (subRangeDecoder
), and metadata initialization (initMetaData
). It also implementsExternalizable
for efficient serialization.Current Issues
ColumnDecoderDummycode
ist not supported yet, as well as the test caseColumnDecoderMixedMethodsTest
DecoderDummyCode
, it do not work together withDecoderRecode