-
Notifications
You must be signed in to change notification settings - Fork 130
Improved support for string_views for TOML and XML #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @liuzicheng1987, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, Gemini here with a summary of this pull request. The main goal of this PR, as indicated by the title, is to improve the support for std::string_view
when reading (parsing) both TOML and XML data. This is achieved by modifying the existing read
functions that accept string_view
to use more efficient methods provided by the underlying parsing libraries (toml++
and pugixml
) that can directly handle buffer-like inputs or string views, avoiding unnecessary conversions to std::string
. Additionally, a redundant read
overload for std::string
in the TOML reader was removed.
Highlights
- Improved XML Parsing with string_view: The
read
function for XML that acceptsstd::string_view
now usespugi::xml_document::load_buffer
with thestring_view
's data and size, which is more efficient thanload_string
when dealing with potentially non-null-terminated string views. - C++17 Namespace Syntax: The XML reader file (
include/rfl/xml/read.hpp
) was updated to use the C++17 inline namespace syntaxnamespace rfl ::xml
.
Changelog
- include/rfl/toml/read.hpp
- Added
#include <string_view>
(Diff 1, line 6). - Removed the
read
template function overload that acceptedconst std::string&
(Diff 2, lines 28-33). - Modified the
read
template function overload acceptingconst std::string_view
to directly parse the input using::toml::parse(_toml_str)
instead of converting tostd::string
(Diff 2, lines 31-34).
- Added
- include/rfl/xml/read.hpp
- Updated namespace declaration from nested
namespace rfl { namespace xml {
to C++17 inlinenamespace rfl ::xml {
(Diff 1, line 15). - Modified the
read
template function acceptingconst std::string_view
to usedoc.load_buffer(_xml_str.data(), _xml_str.size())
instead ofdoc.load_string(_xml_str.data())
for more efficient handling of string views (Diff 2, line 34). - Updated the closing namespace declaration to match the C++17 syntax
} // namespace rfl::xml
(Diff 3, line 51).
- Updated namespace declaration from nested
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request makes excellent improvements to std::string_view
support for both TOML and XML reading.
Key highlights:
- TOML: The
read
function forstd::string_view
now directly passes the view totoml::parse
, avoiding an unnecessary and potentially costly conversion tostd::string
. This is a good performance optimization. - XML: The
read
function now usespugi::xml_document::load_buffer
instead ofload_string
when parsing from astd::string_view
. This is a crucial change for correctness and security, as it correctly handles non-null-terminated views. - Style: The adoption of C++17 nested namespace syntax (e.g.,
namespace rfl::xml
) is a nice touch for conciseness and readability.
The changes are clear, well-targeted, and enhance the library's efficiency and safety. Good work!
Summary of Findings
- TOML Parsing Efficiency with string_view: The TOML reading functionality was improved to parse
std::string_view
directly viatoml::parse
, eliminating an unnecessary intermediatestd::string
creation. This enhances performance by avoiding memory allocation and data copying. - XML Parsing Safety with string_view: The XML reading functionality was made safer by switching from
pugi::xml_document::load_string
toload_buffer
forstd::string_view
inputs. This prevents potential buffer over-reads if thestring_view
is not null-terminated, improving correctness and security. - Code Style Modernization: The use of C++17 nested namespace syntax (e.g.,
namespace rfl::xml
) was adopted, improving code conciseness. This was a good stylistic update.
Merge Readiness
The changes in this PR are solid improvements and address important aspects of using std::string_view
for parsing TOML and XML. The code quality is good, and the modifications enhance both performance and safety.
Based on this review, the PR appears ready for merging. As always, please ensure all CI checks pass. I am not authorized to approve pull requests, so this will require approval from another reviewer before merging.
@@ -32,7 +31,7 @@ auto read(const InputVarType& _var) { | |||
template <class T, class... Ps> | |||
Result<T> read(const std::string_view _xml_str) { | |||
pugi::xml_document doc; | |||
const auto result = doc.load_string(_xml_str.data()); | |||
const auto result = doc.load_buffer(_xml_str.data(), _xml_str.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent change switching from doc.load_string(_xml_str.data())
to doc.load_buffer(_xml_str.data(), _xml_str.size())
!
Using load_string
with _xml_str.data()
could be unsafe because std::string_view::data()
doesn't guarantee a null-terminated string. If the underlying character sequence isn't null-terminated, load_string
might read past the end of the buffer, leading to undefined behavior or security vulnerabilities.
load_buffer
, by taking an explicit size, correctly and safely handles std::string_view
data, regardless of null termination. This significantly improves the robustness and security of XML parsing from a std::string_view
. Great catch and important fix!
auto table = ::toml::parse(_toml_str); | ||
return read<T, Ps...>(&table); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change to directly use ::toml::parse(_toml_str)
with std::string_view
(and subsequently removing the std::string
overload that converted std::string_view
to std::string
) is a good efficiency improvement.
Previously, a std::string_view
might be converted to a std::string
before parsing (as seen in the removed overload return read<T, Ps...>(std::string(_toml_str));
), which involved an unnecessary allocation and copy.
By parsing the std::string_view
directly using toml::parse
(which supports std::string_view
), this overhead is avoided. This is a valuable optimization, especially for performance-sensitive scenarios. Well done!
my issue was not comprehensive, there is some other places like https://github.com/getml/reflect-cpp/blob/main/src%2Frfl%2Fyaml%2FWriter.cpp#L65 |
@ZXShady , the writer should be fine. It's not exposed to the end-users. These string views are actually derived from the field names of our structs. |
shouldn't it be in the internal namespace then? |
@ZXShady , I don't really think so. I actually do encourage people to write their own readers and writers and go at great length to explain how to do so. It's just that no one would ever directly use them. They would be exposed to the library-internal code. It's pretty similar to your custom enum ranges. Basically, the string_views in the writers are the field names derived from the structs. They originally come in the form of a StringLiteral: https://github.com/getml/reflect-cpp/blob/main/include/rfl/internal/StringLiteral.hpp As you can see, they are null-terminated, pretty much precisely for this reason. Where we can use the size, we do, but if we can't, it's not a big problem. I understand this is a potential footgun, but the reality is that yaml-cpp does not support string views. (Let alone any of the C libraries.) If we were to replace them with std::string, this would basically force us to always create a deep copy, which is something I would very much like to avoid. However, xml::read is exposed to the end user, of course, which is why your issue (#444) is a really great catch! Thank for this! |
https://github.com/jbeder/yaml-cpp/blob/master/include%2Fyaml-cpp%2Femitter.h#L76 has an overload. When will C++ get zstring_view accepted in the standard library |
@ZXShady , turns out it actually has a direct overload to std::string_view. That's great. https://github.com/jbeder/yaml-cpp/blob/master/include%2Fyaml-cpp%2Femitter.h#L210 |
@ZXShady , unfortunately, the string_view support does not appear to be included in the latest version of yaml-cpp. However, I was able to get string_view support in pugixml after upgrading to pugixml 1.15. |
No description provided.