Skip to content

Init reference doc generation #280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open

Init reference doc generation #280

wants to merge 4 commits into from

Conversation

gewenyu99
Copy link

👋 Helloooo, it's time for some basic reference doc generation for Python.

For now, we document:

  • client class and all methods
  • module level methods
  • types where possible

Generation script

The specs will be compiled when you run uv run python docs/generate_json_schemas.py

Automatic generation

This is out of scope for now. Instead, we'll be manually adding this and pinning versions to the Posthog.com repo for a while before we worry about automating anything.

I want to wait until :

  • we generate a few of these and feed reasonably confident
  • decide if these specs are going to be versioned or we always use latest version
  • replicate this per our Q3 goal for a few more SDKs (we might decide to change the schema/etc.)

Basically I think automating this early is going to be painful for now

@gewenyu99 gewenyu99 requested a review from a team July 8, 2025 21:31
Copy link
Contributor

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Summary

Introduces reference documentation generation for the Python SDK with a focus on comprehensive API documentation for client methods, module functions, and types.

  • Added docs/generate_json_schemas.py to create structured JSON documentation with proper error handling and docstring parsing
  • Added docs/doc_constant.py defining documentation generation constants like type exclusions and parsing patterns
  • Enhanced docstrings in posthog/contexts.py and posthog/__init__.py with standardized formats including Args, Examples, and Categories
  • Improved client method documentation in posthog/client.py with organized categories and proper type hints

6 files reviewed, 11 comments
Edit PR Review Bot Settings | Greptile

@@ -18,3 +18,4 @@ posthog-analytics
pyrightconfig.json
.env
.DS_Store
posthog-python-references.json
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Add newline at end of file to follow Git best practices

def analyze_parameter(param: inspect.Parameter, docstring: str = "") -> dict:
"""Analyze a function parameter and return its documentation."""
# Determine if parameter is optional (has default value)
is_optional = param.default == inspect.Parameter.empty
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Logic error in is_optional determination - currently returns True when parameter has no default (should be opposite)

Suggested change
is_optional = param.default == inspect.Parameter.empty
is_optional = param.default != inspect.Parameter.empty


# Types that are built-in to Python and don't need to be documented
NO_DOCS_TYPES = [
"Client",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: 'Client' probably shouldn't be in NO_DOCS_TYPES since it's a core SDK class that needs documentation

Comment on lines 354 to 357
type_info = analyze_type(obj)
types_list.append(type_info)
except Exception as e:
print(f"Error analyzing type {name}: {e}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Exception in type analysis should add type name to noDocsTypes instead of just printing error

Suggested change
type_info = analyze_type(obj)
types_list.append(type_info)
except Exception as e:
print(f"Error analyzing type {name}: {e}")
type_info = analyze_type(obj)
types_list.append(type_info)
except Exception as e:
print(f"Error analyzing type {name}: {e}")
NO_DOCS_TYPES.append(name)

Comment on lines 56 to 59
"examples_section": r'Examples:\s*\n(.*?)(?=\n\s*\n\s*Category:|\Z)',
"args_section": r'Args:\s*\n(.*?)(?=\n\s*\n\s*Examples:|\n\s*\n\s*Details:|\n\s*\n\s*Category:|\Z)',
"details_section": r'Details:\s*\n(.*?)(?=\n\s*\n\s*Examples:|\n\s*\n\s*Category:|\Z)',
"category_section": r'Category:\s*\n\s*(.+?)\s*(?:\n|$)',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: These regex patterns don't handle nested sections properly. Consider using a more robust parsing approach or adding negative lookaheads to prevent incorrect matches

Comment on lines 70 to 71
"filename": "posthog-python-references.json",
"indent": 2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Add a path variable for output directory instead of just filename to make the output location configurable

Comment on lines 154 to 159
def capture(event: str, **kwargs: Unpack[OptionalCaptureArgs]) -> Optional[str]:
"""
Capture allows you to capture anything a user does within your system, which you can later use in PostHog to find patterns in usage, work out which features to improve or where people are giving up.
A `capture` call requires
- `event name` to specify the event
- We recommend using [verb] [noun], like `movie played` or `movie updated` to easily identify what your events mean later on.
Capture takes a number of optional arguments, which are defined by the `OptionalCaptureArgs` type.
Capture anything a user does within your system.
For example:
```python
# Enter a new context (e.g. a request/response cycle, an instance of a background job, etc)
with posthog.new_context():
# Associate this context with some user, by distinct_id
posthog.identify_context('some user')
# Capture an event, associated with the context-level distinct ID ('some user')
posthog.capture('movie started')
# Capture an event associated with some other user (overriding the context-level distinct ID)
posthog.capture('movie joined', distinct_id='some-other-user')
# Capture an event with some properties
posthog.capture('movie played', properties={'movie_id': '123', 'category': 'romcom'})
# Capture an event with some properties
posthog.capture('purchase', properties={'product_id': '123', 'category': 'romcom'})
# Capture an event with some associated group
posthog.capture('purchase', groups={'company': 'id:5'})
# Adding a tag to the current context will cause it to appear on all subsequent events
posthog.tag_context('some-tag', 'some-value')
posthog.capture('another-event') # Will be captured with `'some-tag': 'some-value'` in the properties dict
```
Args:
event: The event name to specify the event
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Parameter 'event' is described in Args section but the rest of the optional parameters from OptionalCaptureArgs are not documented

Suggested change
def capture(event: str, **kwargs: Unpack[OptionalCaptureArgs]) -> Optional[str]:
"""
Capture allows you to capture anything a user does within your system, which you can later use in PostHog to find patterns in usage, work out which features to improve or where people are giving up.
A `capture` call requires
- `event name` to specify the event
- We recommend using [verb] [noun], like `movie played` or `movie updated` to easily identify what your events mean later on.
Capture takes a number of optional arguments, which are defined by the `OptionalCaptureArgs` type.
Capture anything a user does within your system.
For example:
```python
# Enter a new context (e.g. a request/response cycle, an instance of a background job, etc)
with posthog.new_context():
# Associate this context with some user, by distinct_id
posthog.identify_context('some user')
# Capture an event, associated with the context-level distinct ID ('some user')
posthog.capture('movie started')
# Capture an event associated with some other user (overriding the context-level distinct ID)
posthog.capture('movie joined', distinct_id='some-other-user')
# Capture an event with some properties
posthog.capture('movie played', properties={'movie_id': '123', 'category': 'romcom'})
# Capture an event with some properties
posthog.capture('purchase', properties={'product_id': '123', 'category': 'romcom'})
# Capture an event with some associated group
posthog.capture('purchase', groups={'company': 'id:5'})
# Adding a tag to the current context will cause it to appear on all subsequent events
posthog.tag_context('some-tag', 'some-value')
posthog.capture('another-event') # Will be captured with `'some-tag': 'some-value'` in the properties dict
```
Args:
event: The event name to specify the event
def capture(event: str, **kwargs: Unpack[OptionalCaptureArgs]) -> Optional[str]:
"""
Capture anything a user does within your system.
Args:
event: The event name to specify the event
**kwargs: Optional arguments including:
distinct_id: Unique identifier for the user
properties: Dict of event properties
timestamp: When the event occurred
groups: Dict of group types and IDs
disable_geoip: Whether to disable GeoIP lookup

Comment on lines +192 to +203
```python
# Set event properties
from posthog import capture
capture(
"distinct_id_of_the_user",
"user_signed_up",
{
"login_type": "email",
"is_free_trial": "true"
}
)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Example code incorrectly shows 'distinct_id_of_the_user' and 'user_signed_up' as first two args, but capture() requires event name first

Suggested change
```python
# Set event properties
from posthog import capture
capture(
"distinct_id_of_the_user",
"user_signed_up",
{
"login_type": "email",
"is_free_trial": "true"
}
)
```
```python
# Set event properties
from posthog import capture
capture(
"user_signed_up",
distinct_id="distinct_id_of_the_user",
properties={
"login_type": "email",
"is_free_trial": "true"
}
)

Comment on lines +219 to +230
```python
# Set person properties
from posthog import capture
capture(
'distinct_id',
event='event_name',
properties={
'$set': {'name': 'Max Hedgehog'},
'$set_once': {'initial_url': '/blog'}
}
)
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Example for 'set' function incorrectly uses capture() instead of set(). The example is misleading.

Suggested change
```python
# Set person properties
from posthog import capture
capture(
'distinct_id',
event='event_name',
properties={
'$set': {'name': 'Max Hedgehog'},
'$set_once': {'initial_url': '/blog'}
}
)
```
```python
# Set person properties
from posthog import set
set(
distinct_id='distinct_id',
properties={
'name': 'Max Hedgehog',
'initial_url': '/blog'
}
)

Comment on lines 1302 to 1304
"""
Get a FeatureFlagResult object which contains the flag result and payload for a key by evaluating locally or remotely
depending on whether local evaluation is enabled and the flag can be locally evaluated.
This also captures the $feature_flag_called event unless send_feature_flag_events is False.
Get a FeatureFlagResult object
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Method docstring is too minimal. Should include Args section and return type details like other methods for consistency.

Suggested change
"""
Get a FeatureFlagResult object which contains the flag result and payload for a key by evaluating locally or remotely
depending on whether local evaluation is enabled and the flag can be locally evaluated.
This also captures the $feature_flag_called event unless send_feature_flag_events is False.
Get a FeatureFlagResult object
"""
"""
Get a FeatureFlagResult object.
Args:
key: The feature flag key.
distinct_id: The distinct ID of the user.
groups: A dictionary of group information.
person_properties: A dictionary of person properties.
group_properties: A dictionary of group properties.
only_evaluate_locally: Whether to only evaluate locally.
send_feature_flag_events: Whether to send feature flag events.
disable_geoip: Whether to disable GeoIP for this request.
Returns:
Optional[FeatureFlagResult]: The feature flag result or None if disabled/not found.
"""

posthog.capture('another-event') # Will be captured with `'some-tag': 'some-value'` in the properties dict
```
Args:
event: The event name to specify the event
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like greptile suggested, what about the hidden kwargs?

_proxy("flush")


def join():
"""Block program until the client clears the queue"""
"""
Block program until the client clears the queue.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably add something here saying that you should most likely be using shutdown rather than using join directly, and this is a function you'll use in exceptional situations only

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YES I missed join, only had it for flush. Thanks for the reminder

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush can be used in isolation - think guaranteeing you flush all events before a job finishes running, the job runner will keep running, so shutdown doesn't make sense, but we might want events to be 100% sent

depending on whether local evaluation is enabled and the flag can be locally evaluated.
This also captures the $feature_flag_called event unless send_feature_flag_events is False.
Get a FeatureFlagResult object
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one isnt following your proposed structure

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that this is different from other methods here or different form the JS doc comment structure.

If the latter,
This is a good point to discuss, this is closer to google style python doc strings. I think a part of this was to keep this closer to:

  1. what we already use (latest doc strings by olly follows closer to google stype doc strings)
  2. what is native to this language.

I think it would look pretty wild to have ts/jsdoc in python code

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I think you mean the former :P I think I missed a few methods 😆

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The former, yeah, this is just a string, there's no structure like there are for the other methods :)

DOC_DEFAULTS = {
"showDocs": True,
"releaseTag": "public",
"return_type_void": "void",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be None like the bot is suggesting?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah probably, this is a left over from the TS specs. void isn't a thing here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have an RFC that states we should use the bin folder for scripts. Can you add that there instead? I'd be happy with a bin/docs executable, and then maybe a bin/docs folder with your helper scripts. How does that sound?

I'm looking for the RFC and can link it to you if I find it

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes ofc, let me go through suggestions. thanks as always for sitting through reviews of code like this, my eyes started watering near the end of this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants