Skip to content

Structured Ouputs: Draft Schema Validator #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: next
Choose a base branch
from

Conversation

damo
Copy link
Owner

@damo damo commented Apr 24, 2025

No description provided.

Comment on lines 14 to 16
// TODO: Is there any configuration required here (e.g., for Kotlin classes)?
// The SDK `jsonMapper()` requires that all fields of classes be marked with
// "@JsonProperty", which is not desirable, as it impedes usability.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably just need these modules:

.addModule(kotlinModule())
.addModule(Jdk8Module())
.addModule(JavaTimeModule())

import com.github.victools.jsonschema.module.jackson.JacksonModule
import com.openai.models.ResponseFormatJsonSchema

internal class StructuredOutputs private constructor() {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need a class here. We can just make the functions in this class internal top-level functions

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think that was a hang-over from the earlier experiment where these functions were called from the Java example code. Putting them in a class made the access more natural than if they were top-level functions (no "Kt" at the end of the "class" name). Unless we want these to be part of the public API, they can be promoted to top-level functions, sure.

Comment on lines 30 to 47
val configBuilder =
SchemaGeneratorConfigBuilder(
com.github.victools.jsonschema.generator.SchemaVersion.DRAFT_2020_12,
OptionPreset.PLAIN_JSON,
)
// Add `"additionalProperties" : false` to all object schemas (see OpenAI).
.with(Option.FORBIDDEN_ADDITIONAL_PROPERTIES_BY_DEFAULT)
// Use `JacksonModule` to support the use of Jackson annotations to set property
// and class names and descriptions and to mark fields with `@JsonIgnore`.
.with(JacksonModule())

configBuilder
.forFields()
// For OpenAI schemas, _all_ properties _must_ be required. Override the
// interpretation of the Jackson `required` parameter to the `@JsonProperty`
// annotation: it will always be assumed to be `true`, even if explicitly `false`
// and even if there is no `@JsonProperty` annotation present.
.withRequiredCheck { true }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason not to inline this below?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I don't know what you mean.

If you mean "below" as in in the fromJson() function, then it is because extractSchema returns a JsonNode that is the input to the JsonSchemaValidator and is the basis of the unit testing approach. fromJson returns an instance of Class<T>, which is not useful in that respect.

If you mean "below" as in the next line where build() is called, then forFields() returns a type that does not have a build() function in its interface.

I'm guessing you might mean something else, though.

Comment on lines 52 to 91
fun <T> fromJson(json: String, type: Class<T>): T {
try {
return MAPPER.readValue(json, type)
} catch (e: Exception) {
// TODO: Might not want to include the JSON in the error message; it might contain
// sensitive data.
throw RuntimeException("Error parsing JSON: $json", e)
}
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
fun <T> fromJson(json: String, type: Class<T>): T {
try {
return MAPPER.readValue(json, type)
} catch (e: Exception) {
// TODO: Might not want to include the JSON in the error message; it might contain
// sensitive data.
throw RuntimeException("Error parsing JSON: $json", e)
}
}
}
fun <T> fromJson(json: String, type: Class<T>): T =
try {
MAPPER.readValue(json, type)
} catch (e: Exception) {
// TODO: Might not want to include the JSON in the error message; it might contain
// sensitive data.
throw RuntimeException("Error parsing JSON: $json", e)
}
}

} catch (e: Exception) {
// TODO: Might not want to include the JSON in the error message; it might contain
// sensitive data.
throw RuntimeException("Error parsing JSON: $json", e)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be OpenAIInvalidDataException

Comment on lines 47 to 49
// FIXME: Pretty sure this is wrong!
/** @see ChatCompletion._choices */
fun _choices(): JsonField<List<ChatCompletion.Choice>> = chatCompletion._choices()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you know what, I think we want the following above:

val choices by lazy {
    chatCompletion._choices().map { choices ->
        choices.map { Choice<T>(responseFormat, it) }
    }
}

Which will make the choices variable contain a JsonField<List<Choice<T>>>

Then we can do:

fun _choices(): JsonField<List<Choice<T>>> = choices

fun choices(): List<Choice<T>> = choices.getRequired("choices")

Does that make sense?

Similarly for the other structured classes

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I figured you'd figure this out with less figuring than me; you have more of a feel for the JsonField stuff.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we be calling that val _choices ... for consistency with the naming convention used for the function names?

Comment on lines 114 to 119
// TODO: This is probably wrong. Do we need this function?
/** @see ChatCompletion.Choice.validate */
fun validate(): ChatCompletion.Choice = choice.validate()

/** @see ChatCompletion.Choice.isValid */
fun isValid(): Boolean = choice.isValid()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the following would be correct:

fun validate() = apply {
    message().validate()
    choice.validate()
}

fun isValid(): Boolean =
    try {
        validate()
        true
    } catch (e: OpenAIInvalidDataException) {
        false
    }

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that looks about right. It would have been nice not to have any code that wasn't just a direct delegation to wrapped objects, but it looks like that is unavoidable here.

Comment on lines 27 to 33
@JvmSynthetic
internal fun from(
structuredChatCompletionCreateParams: StructuredChatCompletionCreateParams<T>
) = apply {
responseFormat = structuredChatCompletionCreateParams.responseFormat
paramsBuilder = structuredChatCompletionCreateParams.params.toBuilder()
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason this exists in other classes is to offer a toBuilder() method. Did we wanna do that here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I don't think so. The toBuilder() gets a bit complicated here. We have to call toBuilder() on wrapped objects, etc. and it all looks like it would massively increase the amount of testing required. I'd be in favor of dropping from() and toBuilder() unless we hit a use case that actually requires them. The thinner we keep these delegating Structured... classes, the easier they will be to maintain.

Comment on lines 389 to 415
// TODO: Confirm that these are not required/desired on this Builder class.
// /** @see ChatCompletionCreateParams.Builder.responseFormat */
// fun responseFormat(responseFormat: ChatCompletionCreateParams.ResponseFormat) =
// apply {
// paramsBuilder.responseFormat(responseFormat)
// }
//
// /** @see ChatCompletionCreateParams.Builder.responseFormat */
// fun responseFormat(responseFormat:
// JsonField<ChatCompletionCreateParams.ResponseFormat>) =
// apply {
// paramsBuilder.responseFormat(responseFormat)
// }
//
// /** @see ChatCompletionCreateParams.Builder.responseFormat */
// fun responseFormat(text: ResponseFormatText) = apply {
// paramsBuilder.responseFormat(text) }
//
// /** @see ChatCompletionCreateParams.Builder.responseFormat */
// fun responseFormat(jsonSchema: ResponseFormatJsonSchema) = apply {
// paramsBuilder.responseFormat(jsonSchema)
// }
//
// /** @see ChatCompletionCreateParams.Builder.responseFormat */
// fun responseFormat(jsonObject: ResponseFormatJsonObject) = apply {
// paramsBuilder.responseFormat(jsonObject)
// }

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think you're right that e don't need these here

* .responseFormat()
* ```
*
* @throws IllegalStateException if the required field is unset.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* @throws IllegalStateException if the required field is unset.
* @throws IllegalStateException if any required field is unset.

@damo damo force-pushed the damo/structured-outputs branch 2 times, most recently from ba504f0 to ef0e041 Compare May 5, 2025 15:33
@damo damo force-pushed the damo/structured-outputs branch from 8c0d60a to 67381ff Compare May 7, 2025 10:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants