Description
Please provide us with the following information:
This issue is for a: (mark with an x
)
- [x] bug report -> please search issues before submitting
- [ ] feature request
- [x] documentation issue or request
- [ ] regression (a behavior that used to work and stopped in a new release)
Minimal steps to reproduce
run python build_index.py on a windows machine with german settings (this has cp1252 as default)
the program fails with a UnicodeDecodeError as can be seen in the logs below
The problem can be easily fixed if you set the codepage to utf-8 in the terminal/shell/powershell,
e.g. in powershell:
[Console]::OutputEncoding = [System.Text.Encoding]::UTF8
[Console]::InputEncoding = [System.Text.Encoding]::UTF8
We should add this information to the docs.
I can create a PR for this if you consider the information usefull (I do :-) )
Any log messages given by the failure
Failed Build
(.venv) PS C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial> python build_index.py
Data directory 'C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\data/product-info/' exists and contains 20 files.
Crack and chunk files from local path: C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\data/product-info/
Start embedding using connection with id = ...
Start creating index from embeddings.
Successfully created index at C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\tutorial-index-mlindex
Method indexes: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class Index: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Exception in thread Thread-19 (_readerthread):
Traceback (most recent call last):
File "c:\Program Files\Python311\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "c:\Program Files\Python311\Lib\threading.py", line 982, in run
self._target(*self._args, **self._kwargs)
File "c:\Program Files\Python311\Lib\subprocess.py", line 1599, in _readerthread
buffer.append(fh.read())
^^^^^^^^^
File "c:\Program Files\Python311\Lib\encodings\cp1252.py", line 23, in decode
return codecs.charmap_decode(input,self.errors,decoding_table)[0]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 271: character maps to
Uploading tutorial-index-mlindex (0.0 MBs): 100%|#####################################################################################################################################| 1296/1296 [00:00<00:00, 1996.11it/s]
Fix. e.g. for powershell
[Console]::OutputEncoding = [System.Text.Encoding]::UT
[Console]::InputEncoding = [System.Text.Encoding]::UTF8
Expected/desired behavior
with the fix above it runs fine e.g.
(.venv) PS C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial> python build_index.py
Data directory 'C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\data/product-info/' exists and contains 20 files.
Crack and chunk files from local path: C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\data/product-info/
Start embedding using connection with id = ...
Start creating index from embeddings.
Successfully created index at C:\work\Azure-Samples\rag-data-openai-python-promptflow\tutorial\tutorial-index-mlindex
Method indexes: This is an experimental method, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
Class Index: This is an experimental class, and may change at any time. Please see https://aka.ms/azuremlexperimental for more information.
OS and Version?
Windows 7, 8 or 10. Linux (which distribution). macOS (Yosemite? El Capitan? Sierra?)
not OS specific
Versions
not version specific
Mention any other details that might be useful
Thanks! We'll be in touch soon.