-
Notifications
You must be signed in to change notification settings - Fork 13.5k
Create a EncodingConverter class with both iconv and icu support. #138893
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
@llvm/pr-subscribers-llvm-support Author: Abhina Sree (abhina-sree) ChangesThis patch adds a wrapper class called CharSetConverter for ConverterEBCDIC. This class is then extended to support the ICU library or iconv library. The ICU library currently takes priority over the iconv library. Relevant RFCs: Patch is 29.39 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/138893.diff 9 Files Affected:
diff --git a/llvm/CMakeLists.txt b/llvm/CMakeLists.txt
index e8d9ec0d6153a..894c0e1d2e5ae 100644
--- a/llvm/CMakeLists.txt
+++ b/llvm/CMakeLists.txt
@@ -592,6 +592,10 @@ else()
option(LLVM_ENABLE_THREADS "Use threads if available." ON)
endif()
+set(LLVM_ENABLE_ICU "OFF" CACHE STRING "Use ICU for character conversion support if available. Can be ON, OFF, or FORCE_ON")
+
+set(LLVM_ENABLE_ICONV "OFF" CACHE STRING "Use iconv for character conversion support if available. Can be ON, OFF, or FORCE_ON")
+
set(LLVM_ENABLE_ZLIB "ON" CACHE STRING "Use zlib for compression/decompression if available. Can be ON, OFF, or FORCE_ON")
set(LLVM_ENABLE_ZSTD "ON" CACHE STRING "Use zstd for compression/decompression if available. Can be ON, OFF, or FORCE_ON")
diff --git a/llvm/cmake/config-ix.cmake b/llvm/cmake/config-ix.cmake
index 43311dad457ec..f7e826b34d26f 100644
--- a/llvm/cmake/config-ix.cmake
+++ b/llvm/cmake/config-ix.cmake
@@ -294,6 +294,41 @@ if(LLVM_HAS_LOGF128)
set(LLVM_HAS_LOGF128 "${HAS_LOGF128}")
endif()
+if (LLVM_ENABLE_ICU STREQUAL FORCE_ON AND LLVM_ENABLE_ICONV STREQUAL FORCE_ON)
+ message(FATAL_ERROR "LLVM_ENABLE_ICU and LLVM_ENABLE_ICONV should not both be FORCE_ON")
+endif()
+
+# Check for ICU. Only allow an optional, dynamic link for ICU so we don't impact LLVM's licensing.
+if(LLVM_ENABLE_ICU AND NOT(LLVM_ENABLE_ICONV STREQUAL FORCE_ON))
+ set(LIBRARY_SUFFIXES ${CMAKE_FIND_LIBRARY_SUFFIXES})
+ set(CMAKE_FIND_LIBRARY_SUFFIXES "${CMAKE_SHARED_LIBRARY_SUFFIX}")
+ if (LLVM_ENABLE_ICU STREQUAL FORCE_ON)
+ find_package(ICU REQUIRED COMPONENTS uc i18n)
+ if (NOT ICU_FOUND)
+ message(FATAL_ERROR "Failed to configure ICU, but LLVM_ENABLE_ICU is FORCE_ON")
+ endif()
+ else()
+ find_package(ICU COMPONENTS uc i18n)
+ endif()
+ set(HAVE_ICU ${ICU_FOUND})
+ set(CMAKE_FIND_LIBRARY_SUFFIXES ${LIBRARY_SUFFIXES})
+endif()
+
+# Check for builtin iconv to avoid licensing issues.
+if(LLVM_ENABLE_ICONV AND NOT HAVE_ICU)
+ if (LLVM_ENABLE_ICONV STREQUAL FORCE_ON)
+ find_package(Iconv REQUIRED)
+ if (NOT Iconv_FOUND OR NOT Iconv_IS_BUILT_IN)
+ message(FATAL_ERROR "Failed to configure iconv, but LLVM_ENABLE_ICONV is FORCE_ON")
+ endif()
+ else()
+ find_package(Iconv)
+ endif()
+ if(Iconv_FOUND AND Iconv_IS_BUILT_IN)
+ set(HAVE_ICONV 1)
+ endif()
+endif()
+
# function checks
check_symbol_exists(arc4random "stdlib.h" HAVE_DECL_ARC4RANDOM)
find_package(Backtrace)
diff --git a/llvm/include/llvm/Config/config.h.cmake b/llvm/include/llvm/Config/config.h.cmake
index 7efac55ab0352..3f70a0150da4f 100644
--- a/llvm/include/llvm/Config/config.h.cmake
+++ b/llvm/include/llvm/Config/config.h.cmake
@@ -236,6 +236,12 @@
/* Have host's ___chkstk_ms */
#cmakedefine HAVE____CHKSTK_MS ${HAVE____CHKSTK_MS}
+/* Define if ICU library is available */
+#cmakedefine HAVE_ICU ${HAVE_ICU}
+
+/* Define if iconv library is available */
+#cmakedefine HAVE_ICONV ${HAVE_ICONV}
+
/* Linker version detected at compile time. */
#cmakedefine HOST_LINK_VERSION "${HOST_LINK_VERSION}"
diff --git a/llvm/include/llvm/Support/CharSet.h b/llvm/include/llvm/Support/CharSet.h
new file mode 100644
index 0000000000000..6a28cd19f4143
--- /dev/null
+++ b/llvm/include/llvm/Support/CharSet.h
@@ -0,0 +1,141 @@
+//===-- CharSet.h - Characters set conversion class ---------------*- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file provides a utility class to convert between different character
+/// set encodings.
+///
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_SUPPORT_CHARSET_H
+#define LLVM_SUPPORT_CHARSET_H
+
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringRef.h"
+#include "llvm/Config/config.h"
+#include "llvm/Support/ErrorOr.h"
+
+#include <string>
+#include <system_error>
+
+namespace llvm {
+
+template <typename T> class SmallVectorImpl;
+
+namespace details {
+class CharSetConverterImplBase {
+
+private:
+ /// Converts a string.
+ /// \param[in] Source source string
+ /// \param[out] Result container for converted string
+ /// \return error code in case something went wrong
+ ///
+ /// The following error codes can occur, among others:
+ /// - std::errc::argument_list_too_long: The result requires more than
+ /// std::numeric_limits<size_t>::max() bytes.
+ /// - std::errc::illegal_byte_sequence: The input contains an invalid
+ /// multibyte sequence.
+ /// - std::errc::invalid_argument: The input contains an incomplete
+ /// multibyte sequence.
+ ///
+ /// If the destination charset is a stateful character set, the shift state
+ /// will be set to the initial state.
+ ///
+ /// In case of an error, the result string contains the successfully converted
+ /// part of the input string.
+ ///
+ virtual std::error_code convertString(StringRef Source,
+ SmallVectorImpl<char> &Result) = 0;
+
+ /// Resets the converter to the initial state.
+ virtual void reset() = 0;
+
+public:
+ virtual ~CharSetConverterImplBase() = default;
+
+ /// Converts a string and resets the converter to the initial state.
+ std::error_code convert(StringRef Source, SmallVectorImpl<char> &Result) {
+ auto EC = convertString(Source, Result);
+ reset();
+ return EC;
+ }
+};
+} // namespace details
+
+// Names inspired by https://wg21.link/p1885.
+namespace text_encoding {
+enum class id {
+ /// UTF-8 character set encoding.
+ UTF8,
+
+ /// IBM EBCDIC 1047 character set encoding.
+ IBM1047
+};
+} // end namespace text_encoding
+
+/// Utility class to convert between different character set encodings.
+class CharSetConverter {
+ std::unique_ptr<details::CharSetConverterImplBase> Converter;
+
+ CharSetConverter(std::unique_ptr<details::CharSetConverterImplBase> Converter)
+ : Converter(std::move(Converter)) {}
+
+public:
+ /// Creates a CharSetConverter instance.
+ /// Returns std::errc::invalid_argument in case the requested conversion is
+ /// not supported.
+ /// \param[in] CSFrom the source character encoding
+ /// \param[in] CSTo the target character encoding
+ /// \return a CharSetConverter instance or an error code
+ static ErrorOr<CharSetConverter> create(text_encoding::id CSFrom,
+ text_encoding::id CSTo);
+
+ /// Creates a CharSetConverter instance.
+ /// Returns std::errc::invalid_argument in case the requested conversion is
+ /// not supported.
+ /// \param[in] CPFrom name of the source character encoding
+ /// \param[in] CPTo name of the target character encoding
+ /// \return a CharSetConverter instance or an error code
+ static ErrorOr<CharSetConverter> create(StringRef CPFrom, StringRef CPTo);
+
+ CharSetConverter(const CharSetConverter &) = delete;
+ CharSetConverter &operator=(const CharSetConverter &) = delete;
+
+ CharSetConverter(CharSetConverter &&Other)
+ : Converter(std::move(Other.Converter)) {}
+
+ CharSetConverter &operator=(CharSetConverter &&Other) {
+ if (this != &Other)
+ Converter = std::move(Other.Converter);
+ return *this;
+ }
+
+ ~CharSetConverter() = default;
+
+ /// Converts a string.
+ /// \param[in] Source source string
+ /// \param[out] Result container for converted string
+ /// \return error code in case something went wrong
+ std::error_code convert(StringRef Source,
+ SmallVectorImpl<char> &Result) const {
+ return Converter->convert(Source, Result);
+ }
+
+ ErrorOr<std::string> convert(StringRef Source) const {
+ SmallString<100> Result;
+ auto EC = Converter->convert(Source, Result);
+ if (!EC)
+ return std::string(Result);
+ return EC;
+ }
+};
+
+} // namespace llvm
+
+#endif
diff --git a/llvm/lib/Support/CMakeLists.txt b/llvm/lib/Support/CMakeLists.txt
index df1e65f3a588c..9a7d26a35bf1a 100644
--- a/llvm/lib/Support/CMakeLists.txt
+++ b/llvm/lib/Support/CMakeLists.txt
@@ -162,6 +162,7 @@ add_llvm_component_library(LLVMSupport
CachePruning.cpp
Caching.cpp
circular_raw_ostream.cpp
+ CharSet.cpp
Chrono.cpp
COM.cpp
CodeGenCoverage.cpp
@@ -316,6 +317,14 @@ add_llvm_component_library(LLVMSupport
Demangle
)
+# Link ICU library if it is an external library.
+if(ICU_FOUND)
+ target_link_libraries(LLVMSupport
+ PRIVATE
+ ${ICU_LIBRARIES}
+ )
+endif()
+
set(llvm_system_libs ${system_libs})
# This block is only needed for llvm-config. When we deprecate llvm-config and
diff --git a/llvm/lib/Support/CharSet.cpp b/llvm/lib/Support/CharSet.cpp
new file mode 100644
index 0000000000000..6810cf9c6e376
--- /dev/null
+++ b/llvm/lib/Support/CharSet.cpp
@@ -0,0 +1,344 @@
+//===-- CharSet.cpp - Characters sets conversion class ------------*- C++ -*-=//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file provides utility classes to convert between different character
+/// set encodings.
+///
+//===----------------------------------------------------------------------===//
+
+#include "llvm/Support/CharSet.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/StringExtras.h"
+#include "llvm/Support/ConvertEBCDIC.h"
+#include "llvm/Support/raw_ostream.h"
+#include <algorithm>
+#include <limits>
+#include <system_error>
+
+#ifdef HAVE_ICU
+#include <unicode/ucnv.h>
+#elif defined(HAVE_ICONV)
+#include <iconv.h>
+#endif
+
+using namespace llvm;
+
+// Normalize the charset name with the charset alias matching algorithm proposed
+// in https://www.unicode.org/reports/tr22/tr22-8.html#Charset_Alias_Matching.
+static void normalizeCharSetName(StringRef CSName,
+ SmallVectorImpl<char> &Normalized) {
+ bool PrevDigit = false;
+ for (auto Ch : CSName) {
+ if (isAlnum(Ch)) {
+ Ch = toLower(Ch);
+ if (Ch != '0' || PrevDigit) {
+ PrevDigit = isDigit(Ch);
+ Normalized.push_back(Ch);
+ }
+ }
+ }
+}
+
+// Maps the charset name to enum constant if possible.
+static std::optional<text_encoding::id> getKnownCharSet(StringRef CSName) {
+ SmallString<16> Normalized;
+ normalizeCharSetName(CSName, Normalized);
+ if (Normalized.equals("utf8"))
+ return text_encoding::id::UTF8;
+ if (Normalized.equals("ibm1047"))
+ return text_encoding::id::IBM1047;
+ return std::nullopt;
+}
+
+LLVM_ATTRIBUTE_UNUSED static void
+HandleOverflow(size_t &Capacity, char *&Output, size_t &OutputLength,
+ SmallVectorImpl<char> &Result) {
+ // No space left in output buffer. Double the size of the underlying
+ // memory in the SmallVectorImpl, adjust pointer and length and continue
+ // the conversion.
+ Capacity = (Capacity < std::numeric_limits<size_t>::max() / 2)
+ ? 2 * Capacity
+ : std::numeric_limits<size_t>::max();
+ Result.resize(0);
+ Result.resize_for_overwrite(Capacity);
+ Output = static_cast<char *>(Result.data());
+ OutputLength = Capacity;
+}
+
+namespace {
+enum ConversionType {
+ UTF8ToIBM1047,
+ IBM1047ToUTF8,
+};
+
+// Support conversion between EBCDIC 1047 and UTF-8. This class uses
+// built-in translation tables that allow for translation between the
+// aforementioned character sets. The use of tables for conversion is only
+// possible because EBCDIC 1047 is a single-byte, stateless encoding; other
+// character sets are not supported.
+class CharSetConverterTable : public details::CharSetConverterImplBase {
+ const ConversionType ConvType;
+
+public:
+ CharSetConverterTable(ConversionType ConvType) : ConvType(ConvType) {}
+
+ std::error_code convertString(StringRef Source,
+ SmallVectorImpl<char> &Result) override;
+
+ void reset() override {}
+};
+
+std::error_code
+CharSetConverterTable::convertString(StringRef Source,
+ SmallVectorImpl<char> &Result) {
+ if (ConvType == IBM1047ToUTF8) {
+ ConverterEBCDIC::convertToUTF8(Source, Result);
+ return std::error_code();
+ } else if (ConvType == UTF8ToIBM1047) {
+ return ConverterEBCDIC::convertToEBCDIC(Source, Result);
+ }
+ llvm_unreachable("Invalid ConvType!");
+ return std::error_code();
+}
+
+#ifdef HAVE_ICU
+struct UConverterDeleter {
+ void operator()(UConverter *Converter) const {
+ if (Converter)
+ ucnv_close(Converter);
+ }
+};
+using UConverterUniquePtr = std::unique_ptr<UConverter, UConverterDeleter>;
+
+class CharSetConverterICU : public details::CharSetConverterImplBase {
+ UConverterUniquePtr FromConvDesc;
+ UConverterUniquePtr ToConvDesc;
+
+public:
+ CharSetConverterICU(UConverterUniquePtr FromConverter,
+ UConverterUniquePtr ToConverter)
+ : FromConvDesc(std::move(FromConverter)),
+ ToConvDesc(std::move(ToConverter)) {}
+
+ std::error_code convertString(StringRef Source,
+ SmallVectorImpl<char> &Result) override;
+
+ void reset() override;
+};
+
+std::error_code
+CharSetConverterICU::convertString(StringRef Source,
+ SmallVectorImpl<char> &Result) {
+ // Setup the input in case it has no backing data.
+ size_t InputLength = Source.size();
+ const char *In = InputLength ? const_cast<char *>(Source.data()) : "";
+
+ // Setup the output. We directly write into the SmallVector.
+ size_t Capacity = Result.capacity();
+ size_t OutputLength = Capacity;
+ Result.resize_for_overwrite(Capacity);
+ char *Output = static_cast<char *>(Result.data());
+ UErrorCode EC = U_ZERO_ERROR;
+
+ ucnv_setToUCallBack(&*FromConvDesc, UCNV_TO_U_CALLBACK_STOP, NULL, NULL, NULL,
+ &EC);
+ ucnv_setFromUCallBack(&*ToConvDesc, UCNV_FROM_U_CALLBACK_STOP, NULL, NULL,
+ NULL, &EC);
+ assert(U_SUCCESS(EC));
+
+ do {
+ EC = U_ZERO_ERROR;
+ const char *Input = In;
+
+ Output = InputLength ? static_cast<char *>(Result.data()) : nullptr;
+ ucnv_convertEx(&*ToConvDesc, &*FromConvDesc, &Output, Result.end(), &Input,
+ In + InputLength, /*pivotStart=*/NULL,
+ /*pivotSource=*/NULL, /*pivotTarget=*/NULL,
+ /*pivotLimit=*/NULL, /*reset=*/true,
+ /*flush=*/true, &EC);
+ if (U_FAILURE(EC)) {
+ if (EC == U_BUFFER_OVERFLOW_ERROR &&
+ Capacity < std::numeric_limits<size_t>::max()) {
+ HandleOverflow(Capacity, Output, OutputLength, Result);
+ continue;
+ }
+ // Some other error occured.
+ Result.resize(Output - Result.data());
+ return std::error_code(EILSEQ, std::generic_category());
+ }
+ break;
+ } while (true);
+
+ Result.resize(Output - Result.data());
+ return std::error_code();
+}
+
+void CharSetConverterICU::reset() {
+ ucnv_reset(&*FromConvDesc);
+ ucnv_reset(&*ToConvDesc);
+}
+
+#elif defined(HAVE_ICONV)
+class CharSetConverterIconv : public details::CharSetConverterImplBase {
+ class UniqueIconvT {
+ iconv_t ConvDesc;
+
+ public:
+ operator iconv_t() const { return ConvDesc; }
+ UniqueIconvT(iconv_t CD) : ConvDesc(CD) {}
+ ~UniqueIconvT() {
+ if (ConvDesc != (iconv_t)-1) {
+ iconv_close(ConvDesc);
+ ConvDesc = (iconv_t)-1;
+ }
+ }
+ UniqueIconvT(UniqueIconvT &&Other) : ConvDesc(Other.ConvDesc) {
+ Other.ConvDesc = (iconv_t)-1;
+ }
+ UniqueIconvT &operator=(UniqueIconvT &&Other) {
+ if (&Other != this) {
+ ConvDesc = Other.ConvDesc;
+ Other.ConvDesc = (iconv_t)-1;
+ }
+ return *this;
+ }
+ };
+ UniqueIconvT ConvDesc;
+
+public:
+ CharSetConverterIconv(UniqueIconvT ConvDesc)
+ : ConvDesc(std::move(ConvDesc)) {}
+
+ std::error_code convertString(StringRef Source,
+ SmallVectorImpl<char> &Result) override;
+
+ void reset() override;
+};
+
+std::error_code
+CharSetConverterIconv::convertString(StringRef Source,
+ SmallVectorImpl<char> &Result) {
+ // Setup the output. We directly write into the SmallVector.
+ size_t Capacity = Result.capacity();
+ char *Output = static_cast<char *>(Result.data());
+ size_t OutputLength = Capacity;
+ Result.resize_for_overwrite(Capacity);
+
+ size_t Ret;
+ // Handle errors returned from iconv().
+ auto HandleError = [&Capacity, &Output, &OutputLength, &Result,
+ this](size_t Ret) {
+ if (Ret == static_cast<size_t>(-1)) {
+ // An error occured. Check if we can gracefully handle it.
+ if (errno == E2BIG && Capacity < std::numeric_limits<size_t>::max()) {
+ HandleOverflow(Capacity, Output, OutputLength, Result);
+ // Reset converter
+ iconv(ConvDesc, nullptr, nullptr, nullptr, nullptr);
+ return std::error_code();
+ } else {
+ // Some other error occured.
+ Result.resize(Output - Result.data());
+ return std::error_code(errno, std::generic_category());
+ }
+ } else {
+ // A positive return value indicates that some characters were converted
+ // in a nonreversible way, that is, replaced with a SUB symbol. Returning
+ // an error in this case makes sure that both conversion routines behave
+ // in the same way.
+ return std::make_error_code(std::errc::illegal_byte_sequence);
+ }
+ };
+
+ do {
+ // Setup the input. Use nullptr to reset iconv state if input length is
+ // zero.
+ size_t InputLength = Source.size();
+ char *Input = InputLength ? const_cast<char *>(Source.data()) : nullptr;
+ Ret = iconv(ConvDesc, &Input, &InputLength, &Output, &OutputLength);
+ if (Ret != 0) {
+ if (auto EC = HandleError(Ret))
+ return EC;
+ continue;
+ }
+ // Flush the converter
+ Ret = iconv(ConvDesc, nullptr, nullptr, &Output, &OutputLength);
+ if (Ret != 0) {
+ if (auto EC = HandleError(Ret))
+ return EC;
+ continue;
+ }
+ break;
+ } while (true);
+
+ // Re-adjust size to actual size.
+ Result.resize(Output - Result.data());
+ return std::error_code();
+}
+
+void CharSetConverterIconv::reset() {
+ iconv(ConvDesc, nullptr, nullptr, nullptr, nullptr);
+}
+
+#endif // HAVE_ICONV
+} // namespace
+
+ErrorOr<CharSetConverter> CharSetConverter::create(text_encoding::id CPFrom,
+ text_encoding::id CPTo) {
+
+ assert(CPFrom != CPTo && "Text encodings should be distinct");
+
+ ConversionType Conversion;
+ if (CPFrom == text_encoding::id::UTF8 && CPTo == text_encoding::id::IBM1047)
+ Conversion = UTF8ToIBM1047;
+ else if (CPFrom == text_encoding::id::IBM1047 &&
+ CPTo == text_encoding::id::UTF8)
+ Conversion = IBM1047ToUTF8;
+ else
+ return std::error_code(errno, std::generic_category());
+
+ std::unique_ptr<details::CharSetConverterImplBase> Converter =
+ std::make_unique<CharSetConverterTable>(Conversion);
+ return CharSetConverter(std::move(Converter));
+}
+
+ErrorOr<CharSetConverter> CharSetConverter::create(StringRef CSFrom,
+ StringRef CSTo) {
+ std::optional<text_encoding::id> From = getKnownCharSet(CSFrom);
+ std::optional<text_encoding::id> To = getKnownCharSet(CSTo);
+ if (From && To) {
+ ErrorOr<CharSetConverter> Converter = create(*From, *To);
+ if (Converter)
+ return Converter;
+ }
+#ifdef HAVE_ICU
+ UErrorCode EC = U_ZERO_ERROR;
+ UConverterUniquePtr FromConvDesc(ucnv_open(CSFrom.str().c_str(), &EC));
+ if (U_FAILURE(EC)) {
+ return std::error_code(errno, std::generic_category());
+ }
+ UConverterUniquePtr ToConvDesc(ucnv_open(CSTo.str().c_str(), &EC));
+ if (U_FAILURE(EC)) {
+ return std::error_code(errno, std::generic_category());
+ }
+ std::unique_ptr<details::CharSetConverterImplBase> Converter =
+ std::make_unique<CharSetConverterICU>(std::move(FromConvDesc),
+ std::move(ToConvDesc));
+ return CharSetConverter(std::move(Converter));
+#elif defined(HAVE_ICONV)...
[truncated]
|
✅ With the latest revision this PR passed the C/C++ code formatter. |
5e8d930
to
52635f2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with final nits
Can we consistently use "text encoding"? Rather than "charset" (including in file names?), thanks |
llvm/include/llvm/Support/CharSet.h
Outdated
/// Converts a string and resets the converter to the initial state. | ||
std::error_code convert(StringRef Source, SmallVectorImpl<char> &Result) { | ||
auto EC = convertString(Source, Result); | ||
reset(); | ||
return EC; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please provide an overload that returns ErrorOr<std::string>
, I find that interface cumbersome to use in the -fexec-patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I provided one on line 129 of this file, is that enough or do I need to also add it to this class as well?
This patch adds a wrapper class called EncodingConverter for ConverterEBCDIC. This class is then extended to support the ICU library or iconv library. The ICU library currently takes priority over the iconv library.
Relevant RFCs:
https://discourse.llvm.org/t/rfc-adding-a-charset-converter-to-the-llvm-support-library/69795
https://discourse.llvm.org/t/rfc-enabling-fexec-charset-support-to-llvm-and-clang-reposting/71512
Stacked PR to enable fexec-charset that depends on this:
#138895
See old PR for review and commit history: #74516