the "old version" appears to output finer-grained descriptions and potentially more segmented chunks (e.g., detailed scene breakdowns for figures like the cartoon character with a telescope), while the "new version" consolidates elements into broader summaries (e.g., a high-level overview of the fishbowl illustration)
How could I get the result like the old version?
Old:

New:
