Skip to content

Improved support for string_views for TOML and XML #445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 8, 2025

Conversation

liuzicheng1987
Copy link
Contributor

No description provided.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @liuzicheng1987, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, Gemini here with a summary of this pull request. The main goal of this PR, as indicated by the title, is to improve the support for std::string_view when reading (parsing) both TOML and XML data. This is achieved by modifying the existing read functions that accept string_view to use more efficient methods provided by the underlying parsing libraries (toml++ and pugixml) that can directly handle buffer-like inputs or string views, avoiding unnecessary conversions to std::string. Additionally, a redundant read overload for std::string in the TOML reader was removed.

Highlights

  • Improved XML Parsing with string_view: The read function for XML that accepts std::string_view now uses pugi::xml_document::load_buffer with the string_view's data and size, which is more efficient than load_string when dealing with potentially non-null-terminated string views.
  • C++17 Namespace Syntax: The XML reader file (include/rfl/xml/read.hpp) was updated to use the C++17 inline namespace syntax namespace rfl ::xml.

Changelog

  • include/rfl/toml/read.hpp
    • Added #include <string_view> (Diff 1, line 6).
    • Removed the read template function overload that accepted const std::string& (Diff 2, lines 28-33).
    • Modified the read template function overload accepting const std::string_view to directly parse the input using ::toml::parse(_toml_str) instead of converting to std::string (Diff 2, lines 31-34).
  • include/rfl/xml/read.hpp
    • Updated namespace declaration from nested namespace rfl { namespace xml { to C++17 inline namespace rfl ::xml { (Diff 1, line 15).
    • Modified the read template function accepting const std::string_view to use doc.load_buffer(_xml_str.data(), _xml_str.size()) instead of doc.load_string(_xml_str.data()) for more efficient handling of string views (Diff 2, line 34).
    • Updated the closing namespace declaration to match the C++17 syntax } // namespace rfl::xml (Diff 3, line 51).
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request makes excellent improvements to std::string_view support for both TOML and XML reading.

Key highlights:

  • TOML: The read function for std::string_view now directly passes the view to toml::parse, avoiding an unnecessary and potentially costly conversion to std::string. This is a good performance optimization.
  • XML: The read function now uses pugi::xml_document::load_buffer instead of load_string when parsing from a std::string_view. This is a crucial change for correctness and security, as it correctly handles non-null-terminated views.
  • Style: The adoption of C++17 nested namespace syntax (e.g., namespace rfl::xml) is a nice touch for conciseness and readability.

The changes are clear, well-targeted, and enhance the library's efficiency and safety. Good work!

Summary of Findings

  • TOML Parsing Efficiency with string_view: The TOML reading functionality was improved to parse std::string_view directly via toml::parse, eliminating an unnecessary intermediate std::string creation. This enhances performance by avoiding memory allocation and data copying.
  • XML Parsing Safety with string_view: The XML reading functionality was made safer by switching from pugi::xml_document::load_string to load_buffer for std::string_view inputs. This prevents potential buffer over-reads if the string_view is not null-terminated, improving correctness and security.
  • Code Style Modernization: The use of C++17 nested namespace syntax (e.g., namespace rfl::xml) was adopted, improving code conciseness. This was a good stylistic update.

Merge Readiness

The changes in this PR are solid improvements and address important aspects of using std::string_view for parsing TOML and XML. The code quality is good, and the modifications enhance both performance and safety.

Based on this review, the PR appears ready for merging. As always, please ensure all CI checks pass. I am not authorized to approve pull requests, so this will require approval from another reviewer before merging.

@@ -32,7 +31,7 @@ auto read(const InputVarType& _var) {
template <class T, class... Ps>
Result<T> read(const std::string_view _xml_str) {
pugi::xml_document doc;
const auto result = doc.load_string(_xml_str.data());
const auto result = doc.load_buffer(_xml_str.data(), _xml_str.size());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Excellent change switching from doc.load_string(_xml_str.data()) to doc.load_buffer(_xml_str.data(), _xml_str.size())!

Using load_string with _xml_str.data() could be unsafe because std::string_view::data() doesn't guarantee a null-terminated string. If the underlying character sequence isn't null-terminated, load_string might read past the end of the buffer, leading to undefined behavior or security vulnerabilities.

load_buffer, by taking an explicit size, correctly and safely handles std::string_view data, regardless of null termination. This significantly improves the robustness and security of XML parsing from a std::string_view. Great catch and important fix!

Comment on lines +33 to +34
auto table = ::toml::parse(_toml_str);
return read<T, Ps...>(&table);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This change to directly use ::toml::parse(_toml_str) with std::string_view (and subsequently removing the std::string overload that converted std::string_view to std::string) is a good efficiency improvement.

Previously, a std::string_view might be converted to a std::string before parsing (as seen in the removed overload return read<T, Ps...>(std::string(_toml_str));), which involved an unnecessary allocation and copy.
By parsing the std::string_view directly using toml::parse (which supports std::string_view), this overhead is avoided. This is a valuable optimization, especially for performance-sensitive scenarios. Well done!

@ZXShady
Copy link
Contributor

ZXShady commented Jun 8, 2025

my issue was not comprehensive,

there is some other places like

https://github.com/getml/reflect-cpp/blob/main/src%2Frfl%2Fyaml%2FWriter.cpp#L65

@liuzicheng1987
Copy link
Contributor Author

@ZXShady , the writer should be fine. It's not exposed to the end-users. These string views are actually derived from the field names of our structs.

@ZXShady
Copy link
Contributor

ZXShady commented Jun 8, 2025

@ZXShady , the writer should be fine. It's not exposed to the end-users. These string views are actually derived from the field names of our structs.

shouldn't it be in the internal namespace then?

@liuzicheng1987
Copy link
Contributor Author

@ZXShady , I don't really think so. I actually do encourage people to write their own readers and writers and go at great length to explain how to do so. It's just that no one would ever directly use them. They would be exposed to the library-internal code. It's pretty similar to your custom enum ranges.

Basically, the string_views in the writers are the field names derived from the structs. They originally come in the form of a StringLiteral:

https://github.com/getml/reflect-cpp/blob/main/include/rfl/internal/StringLiteral.hpp

As you can see, they are null-terminated, pretty much precisely for this reason. Where we can use the size, we do, but if we can't, it's not a big problem. I understand this is a potential footgun, but the reality is that yaml-cpp does not support string views. (Let alone any of the C libraries.) If we were to replace them with std::string, this would basically force us to always create a deep copy, which is something I would very much like to avoid.

However, xml::read is exposed to the end user, of course, which is why your issue (#444) is a really great catch! Thank for this!

@ZXShady
Copy link
Contributor

ZXShady commented Jun 8, 2025

@ZXShady , I don't really think so. I actually do encourage people to write their own readers and writers and go at great length to explain how to do so. It's just that no one would ever directly use them. They would be exposed to the library-internal code. It's pretty similar to your custom enum ranges.

Basically, the string_views in the writers are the field names derived from the structs. They originally come in the form of a StringLiteral:

https://github.com/getml/reflect-cpp/blob/main/include/rfl/internal/StringLiteral.hpp

As you can see, they are null-terminated, pretty much precisely for this reason. Where we can use the size, we do, but if we can't, it's not a big problem. I understand this is a potential footgun, but the reality is that yaml-cpp does not support string views. (Let alone any of the C libraries.) If we were to replace them with std::string, this would basically force us to always create a deep copy, which is something I would very much like to avoid.

https://github.com/jbeder/yaml-cpp/blob/master/include%2Fyaml-cpp%2Femitter.h#L76

has an overload.
But I see what you mean the strings are the Literals therefore always null terminated.

When will C++ get zstring_view accepted in the standard library

@liuzicheng1987
Copy link
Contributor Author

@ZXShady , turns out it actually has a direct overload to std::string_view. That's great.

https://github.com/jbeder/yaml-cpp/blob/master/include%2Fyaml-cpp%2Femitter.h#L210

@liuzicheng1987
Copy link
Contributor Author

@ZXShady , unfortunately, the string_view support does not appear to be included in the latest version of yaml-cpp. However, I was able to get string_view support in pugixml after upgrading to pugixml 1.15.

@liuzicheng1987 liuzicheng1987 merged commit ec6ab3b into main Jun 8, 2025
26 checks passed
@liuzicheng1987 liuzicheng1987 deleted the f/string_views branch June 8, 2025 19:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants