Skip to content

Commit 06dbeec

Browse files
authored
Merge pull request #219 from Roche/dev
version 1.2.1
2 parents e666ef8 + 4b8f09d commit 06dbeec

38 files changed

+6976
-5582
lines changed

CITATION.cff

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@ authors:
55
given-names: "Otto"
66
orcid: "https://orcid.org/0000-0002-3363-9287"
77
title: "Pyreadstat"
8-
version: 1.2.0
8+
version: 1.2.1
99
doi: 10.5281/zenodo.6612282
1010
date-released: 2018-09-24
1111
url: "https://github.com/Roche/pyreadstat"

README.md

Lines changed: 31 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,7 @@ the original applications in this regard.**
4545
- [Missing Values](#missing-values)
4646
+ [SPSS](#spss)
4747
+ [SAS and STATA](#sas-and-stata)
48+
- [Reading datetime and date columns](#reading-datetime-and-date-columns)
4849
- [Other options](#other-options)
4950
+ [More writing options](#more-writing-options)
5051
- [File specific options](#file-specific-options)
@@ -637,6 +638,36 @@ This is a list listing all user defined missing values.
637638
User defined missing values are currently not supported for file types other than sas7bdat,
638639
sas7bcat and dta.
639640

641+
#### Reading datetime and date columns
642+
643+
SAS, SPSS and STATA represent datetime, date and other similar concepts as a numeric column and then applies a
644+
display format on top. Roughly speaking, internally there are two possible representations: one for concepts with a day or lower
645+
granularity (date, week, quarter, year, etc.) and those with a higher granularity than a day (datetime, time, hour, etc).
646+
The first group is suceptible to be converted to a python date object and the second to a python datetime object.
647+
648+
Pyreadstat attempts to read columns with datetime, date and time formats that are convertible
649+
to python datetime, date and time objects automatically. However there are other formats that are not fully convertible to
650+
any of these formats, for example SAS "YEAR" (displaying only the year), "MMYY" (displaying only month and year), etc.
651+
Because there are too many of these formats and these keep changing, it is not possible to implement a rule for each of
652+
those, therefore these columns are not transformed and the user will obtain a numeric column.
653+
654+
In order to cope with this issue, there are two options for each reader function: extra\_datetime\_formats and
655+
extra\_date\_formats that allow the user to
656+
pass these datetime or date formats, to transform the numeric values into datetime or date python objects. Then, the user
657+
can format those columns appropiately; for example extracting the year only to an integer column in the case of 'YEAR' or
658+
formatting it to a string 'YYYY-MM' in the case of 'MMYY'. The choice between datetime or date format depends on the granularity
659+
of the data as explained above.
660+
661+
This arguments are also useful in the case you have a valid datetime, date or time format that is currently not recognized in pyreadstat.
662+
In those cases, feel free to file an issue to ask those to be added to the list, in the meantime you can use these arguments to do
663+
the conversion.
664+
665+
```python
666+
import pyreadstat
667+
668+
df, meta = pyreadstat.read_sas7bdat('/path/to/a/file.sas7bdat', extra_date_formats=["YEAR", "MMYY"])
669+
```
670+
640671
#### Other options
641672

642673
You can set the encoding of the original file manually. The encoding must be a [iconv-compatible encoding](https://gist.github.com/hakre/4188459).

change_log.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,11 @@
1+
# 1.2.1 (github, pypi and conda 2023.02.22)
2+
* Readstat source updated to version 1.1.9
3+
* introduced recognition for pandas datatype datetime64[ns, UTC] and other datetime64 types when writing,
4+
so that this column type gets correctly written as datetime
5+
* introduced extra_datetime_formats and extra_date_formats arguments for read functions, cleaned the list of
6+
sas date, datetime and time formats to exclude those not directly convertible to python objects
7+
* improved performace of writer when there are datetime64 columns
8+
19
# 1.2.0 (github, pypi and conda 2022.10.25)
210
* Fixed #206, #207
311
* added pyproject.toml
-13 Bytes
Binary file not shown.

docs/_build/doctrees/index.doctree

12.1 KB
Binary file not shown.

docs/_build/html/.buildinfo

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: d1957ba96adbb9536e51c852822e9ccb
3+
config: dc63e4405a0437fb9efe8c4f5ffb3848
44
tags: 645f666f9bcd5a90fca523b33c5a78b7

docs/_build/html/_static/documentation_options.js

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
var DOCUMENTATION_OPTIONS = {
22
URL_ROOT: document.getElementById("documentation_options").getAttribute('data-url_root'),
3-
VERSION: '1.2.0',
3+
VERSION: '1.2.1',
44
LANGUAGE: 'None',
55
COLLAPSE_INDEX: false,
66
BUILDER: 'html',

docs/_build/html/genindex.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<head>
44
<meta charset="utf-8" />
55
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
6-
<title>Index &mdash; pyreadstat 1.2.0 documentation</title>
6+
<title>Index &mdash; pyreadstat 1.2.1 documentation</title>
77
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
88
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
99
<!--[if lt IE 9]>

docs/_build/html/index.html

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4,7 +4,7 @@
44
<meta charset="utf-8" /><meta name="generator" content="Docutils 0.17.1: http://docutils.sourceforge.net/" />
55

66
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
7-
<title>Welcome to pyreadstat’s documentation! &mdash; pyreadstat 1.2.0 documentation</title>
7+
<title>Welcome to pyreadstat’s documentation! &mdash; pyreadstat 1.2.1 documentation</title>
88
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
99
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
1010
<!--[if lt IE 9]>
@@ -154,6 +154,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
154154
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
155155
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
156156
dataframe is avoided.</p></li>
157+
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
158+
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
157159
</ul>
158160
</dd>
159161
<dt class="field-even">Returns</dt>
@@ -252,6 +254,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
252254
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
253255
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
254256
dataframe is avoided.</p></li>
257+
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
258+
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
255259
</ul>
256260
</dd>
257261
<dt class="field-even">Returns</dt>
@@ -335,6 +339,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
335339
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
336340
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
337341
dataframe is avoided.</p></li>
342+
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
343+
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
338344
</ul>
339345
</dd>
340346
<dt class="field-even">Returns</dt>
@@ -384,6 +390,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
384390
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
385391
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
386392
dataframe is avoided.</p></li>
393+
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
394+
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
387395
</ul>
388396
</dd>
389397
<dt class="field-even">Returns</dt>
@@ -422,6 +430,8 @@ <h1>Metadata Object Description<a class="headerlink" href="#metadata-object-desc
422430
<li><p><strong>output_format</strong> (<em>str</em><em>, </em><em>optional</em>) – one of ‘pandas’ (default) or ‘dict’. If ‘dict’ a dictionary with numpy arrays as values will be returned, the
423431
user can then convert it to her preferred data format. Using dict is faster as the other types as the conversion to a pandas
424432
dataframe is avoided.</p></li>
433+
<li><p><strong>extra_datetime_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python datetime objects</p></li>
434+
<li><p><strong>extra_date_formats</strong> (<em>list of str</em><em>, </em><em>optional</em>) – formats to be parsed as python date objects</p></li>
425435
</ul>
426436
</dd>
427437
<dt class="field-even">Returns</dt>

docs/_build/html/py-modindex.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<head>
44
<meta charset="utf-8" />
55
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
6-
<title>Python Module Index &mdash; pyreadstat 1.2.0 documentation</title>
6+
<title>Python Module Index &mdash; pyreadstat 1.2.1 documentation</title>
77
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
88
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
99
<!--[if lt IE 9]>

docs/_build/html/search.html

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
<head>
44
<meta charset="utf-8" />
55
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
6-
<title>Search &mdash; pyreadstat 1.2.0 documentation</title>
6+
<title>Search &mdash; pyreadstat 1.2.1 documentation</title>
77
<link rel="stylesheet" href="_static/pygments.css" type="text/css" />
88
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
99

docs/_build/html/searchindex.js

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

docs/conf.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@
2626
# The short X.Y version
2727
version = ''
2828
# The full version, including alpha/beta/rc tags
29-
release = '1.2.0'
29+
release = '1.2.1'
3030

3131

3232
# -- General configuration ---------------------------------------------------

0 commit comments

Comments
 (0)