-
Notifications
You must be signed in to change notification settings - Fork 229
feat(compass-collection): Process schema into format for LLM submission for Mock Data Generator – CLOUDP-337090 #7205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
packages/compass-collection/src/transform-schema-to-field-info.spec.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
|
||
const result = processSchema(schema); | ||
|
||
expect(result).to.deep.equal({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah this is a good example of how the dot notation combined with the constraints we've placed simplifies the parsing and surface area for the LLM to write to
{ | ||
name: 'Array', | ||
bsonType: 'Array', | ||
path: ['cube'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm my understanding, the path stays as ['cube'] because the named field/path captures arrays-within-arrays at all levels (and until there's a document)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, from my understanding
|
||
const result = processSchema(schema); | ||
|
||
expect(result).to.deep.equal({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good example here and below 👍🏼
fieldProbability?: number, | ||
arraySampleValues?: unknown[] | ||
): void { | ||
if (type.name === 'Array' || type.bsonType === 'Array') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use the type-predicate validators like isArraySchemaType
in compass-schema/src/components/field.tsx
Type guards will give you the branch in a program enough type information to prevent casts like type as ArraySchemaType
See https://www.typescriptlang.org/docs/handbook/2/narrowing.html#using-type-predicates
|
||
const arrayPath = `${currentPath}[]`; | ||
const sampleValues = | ||
arraySampleValues || getSampleValues(arrayType).slice(0, 3); // Limit full-context array sample values to 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of the magic 3
, use a constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had some minor feedback; lgtm overall
@@ -30,9 +29,16 @@ export type SchemaAnalysisErrorState = { | |||
error: SchemaAnalysisError; | |||
}; | |||
|
|||
export interface FieldInfo { | |||
type: string; // MongoDB type (eg. String, Double, Array, Document) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could type
be a string union type?
@@ -66,7 +66,8 @@ | |||
"react": "^17.0.2", | |||
"react-redux": "^8.1.3", | |||
"redux": "^4.2.1", | |||
"redux-thunk": "^2.4.2" | |||
"redux-thunk": "^2.4.2", | |||
"bson": "^6.10.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since bson
is only used for its types, it should be a dev dependency so it doesn't get bundled to prod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I thought at first too. However, the dependency checker was still complaining even with using import type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding 'bson'
to the ignores
in packages/compass-collection/.depcheckrc
. But not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
function isPrimitiveSchemaType(type: SchemaType): type is PrimitiveSchemaType { | ||
return ( | ||
!isConstantSchemaType(type) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] null
and undefined
classify as primitives too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird, let's go with consistency then. I disagree with that source's typing but it may be working around some type issue
Description
Converts MongoDB raw schema analysis object into a flat, LLM-friendly format for the Mock Data Generator feature.
Notation examples:
Checklist
Motivation and Context
The existing mongodb-schema structure is overly verbose for our feature and contains nested structures that are both difficult for LLMs to parse and do not correspond to the requirements of LLM structured outputs (eg. no optional fields).
Types of changes