-
-
Notifications
You must be signed in to change notification settings - Fork 52
GSoC: bowtie-perf
: a Performance Tester for JSON Schema implementations
#605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @Julian , I would love to work on this as part of GSoC 2024. I have already started familiarising myself with bowtie and would love to discuss more about this project. Thanks |
Great! This one is quite self contained -- if you're already familiar with how Bowtie works, my best recommendation here is to start researching ways of doing "generic" performance monitoring of applications. Specifically, |
Thanks @Julian , I would start researching and learning about it right away. |
Hey @Julian I am Ashmit Jagtap from Indian Institute of Information Technology, Pune. I would love to work for this project under GSOC 2024. I have been contributing in Async API as well as trying to make some PR's in the bowtie project. I will start researching about the things that we may need for the project and keep you posted for any queries/ improvemnets that i may have. |
Hi @Julian 👋 I also wanted to express my interest in working on this project. I've been learning a lot about |
Thanks a lot for joining JSON Schema org for this edition of GSoC!! Qualification tasks will be published as comments in the project ideas by Thursday/Friday of this week. In addition I'd like to invite you to a office hours session this thursday 18:30 UTC where we'll present the ideas and the relevant date to consider at this stage of the program. Please use this link to join the session: See you there! |
Hi!
Yep exactly!
Timing is a first obvious one. Number of instructions would also be interesting certainly. And some other even more basic ones are in bowtie-json-schema/bowtie#904 ! |
@Julian - thanks for the info! I've been thinking about this a bit more. In my mind there are two approaches that stick out:
In general additional metrics apart from timing seem pretty complicated to collect due to the variations in implementations, but timing data seems in reach with either approach. I'm interested in your thoughts or whether you have any additional high-level ideas for exploration. Thanks! |
I like that breakdown a lot! I've definitely got more thoughts, so will have to come back and elaborate further. On the async bit -- just want to mention 2 possible mitigating factors -- one (which I never went back and edited into the issue you saw) is that we can definitely run only one implementation at a time -- so as long as we structure the actual timing collection correctly we could essentially just run each implementation one at a time and see whether that helps improve our accuracy. And second, I still hope to do some internal refactoring to make how Bowtie runs implementations more general, so yeah we should keep everything on the table there if it turns out that's helpful for this functionality as well! I also would stress the Benchmark part a bit too, I don't think it's critical we have a gigantic suite, but I think a good early part of this will be identifying say 10 representative benchmarks which we use to kick our tires and find interesting performance-adjacent learning -- here's a recent example that I added as a benchmark to my specific implementation -- https://github.com/python-jsonschema/jsonschema/blob/0bc3b1fe8770195eb9f7e5c0d7d84c7007b9a2a5/jsonschema/benchmarks/useless_keywords.py -- not sure how much sense that will make in isolation, but it's basically "imagine a giant JSON Schema where most of the keywords are useless -- does the implementation understand that it should immediately through all those keywords away, or does it continuously re-process them every time it does validation?". Will come back with more specific thoughts but from clicking (but not yet reading) your links, you're on the right track for sure, so well done already. |
@Julian , I had started to look into ways of performance profiling and monitoring for applications and these are some of my initial understandings:
Would love to know your thoughts on the same. |
Great! I would definitely focus on the second kind first -- i.e. performance and profiling that is more language-agnostic even if less granular. We can always get to doing both. It's true that the story is better on Linux than other OSes there, I think that's probably fine, certainly to start, as that's generally the main OS where people run "real applications" which is where someone is more likely to care about performance, but there are some even cruder things we can do which are also OS agnostic, like pure timing numbers, so that's almost certainly the easiest first target regardless. |
@Julian, on the basis of my learnings there can be 2 broad ways in which we implement time measurements while running implementations: 1. Using some existing profiling library such as Cprofile or timeit: Cpython: timeit: 2. Implementing using the very basic Python provides a high precision clock - With this we can start measuring time after the container has been started in this way ignoring time spent starting or closing the containers:
References:
Do you have some additional insights regarding the same? |
|
🚩 IMPORTANT INSTRUCTIONS REGARDING HOW AND WHERE TO SUBMIT YOU APPLICATION 🚩 Please join this discussion in JSON Schema slack to get the last details very important details on how to better submit your application to JSON Schema. See communication here. |
Hello! 👋 This issue has been automatically marked as stale due to inactivity 😴 It will be closed in 180 days if no further activity occurs. To keep it active, please add a comment with more details. There can be many reasons why a specific issue has no activity. The most probable cause is a lack of time, not a lack of interest. Let us figure out together how to push this issue forward. Connect with us through our slack channel : https://json-schema.org/slack Thank you for your patience ❤️ |
@Rahul-web-hub just FYI this project was done as part of GSoC'24 already. |
Project title
bowtie-perf
: a Performance Tester for JSON Schema implementationsBrief Description
Bowtie is a tool which provides "universal access" to JSON Schema implementations, giving JSON Schema users a way to use any implementation supported by Bowtie.
A primary use case for Bowtie was to allow comparing implementations to one another, which can be seen on Bowtie's website and which gives anyone a way to see how correct a JSON Schema implementation is relative to the JSON Schema specification.
But it can do more! Let's write a performance tester using Bowtie, giving us a way to also compare performance of implementations by timing how long they take to do their work. This information could be used to do performance optimization, or as a second dimension that users could use when comparing implementations with one another.
Refs: bowtie-json-schema/bowtie#35
Expected Outcomes
bowtie perf
command which reports on implementation performance when executing its validationSkills Required
Mentors
@Julian
Expected Difficulty
Hard
Expected Time Commitment
350 hour
The text was updated successfully, but these errors were encountered: