Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
127 changes: 61 additions & 66 deletions docs/source/components/dataset/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,9 @@
Concepts
========

Library defines couple of building blocks for processing streams of data. It is
advisable to introduce yourself with concepts defined in this document and then
read about how to use and extend the library.
The library defines several building blocks for processing data streams. It is
recommended that you familiarize yourself with the concepts described in this
document before reading about how to use and extend the library.

.. contents::
:depth: 1
Expand All @@ -17,36 +17,33 @@ Stream source

.. _iterable: https://www.php.net/manual/en/language.types.iterable.php

Stream source (or collection) is any iterable which can be iterated through,
which means either an ``array`` or instance of ``\Traversable``. In short,
any iterable_.
A stream source (or collection) is any iterable that can be iterated over, which
means either an ``array`` or an instance of ``\Traversable``. In short, it is
any iterable_.

Each stream source emits some value, indexed by key. Key is usually associated
with ``int`` or ``string`` as we rely on arrays in PHP a lot. However, library
assumes any ``iterable``, which includes, but not limits to:
Each stream source emits values indexed by a key. The key is usually an ``int``
or ``string``, as PHP arrays are commonly used. However, the library assumes any
``iterable``, including, but not limited to:

* ``\Generator``, which may emit anything as a key,
* ``\Generator``, which may emit values with any type of key,
* ``\WeakMap``, which emits objects as a key,
* and so on...
* and so on.

Common denominator for stream source is that it is not rewindable. Generators,
per example, can not be rewind, you can not iterate them twice. For that reason,
even if you use an arrays (or any rewindable stream source), library assumes
that stream source is not rewindable.
The common characteristic of a stream source is that it is not rewindable.
Generators, for example, cannot be rewound; you cannot iterate over them twice.
For this reason, even when using arrays (or any other rewindable stream source),
the library assumes that the stream source is not rewindable.

Data stream, or stream wrapper
------------------------------

Data stream (or stream wrapper) is ``RunOpenCode\Component\Dataset\Stream``
class which wraps stream source providing stream processing using operators,
reducers and collectors.
A data stream (or stream wrapper) is the
``RunOpenCode\Component\Dataset\Stream`` class, which wraps a stream source and
provides stream processing using operators, reducers, collectors, and
aggregators (which will be discussed later in the document).

Class is deliberately not final and allow extension in order for you to be able
to integrate your own custom operators, reducers and collectors - should you
need to do so.

Using object oriented approach, with instance of data stream, you may apply
various operations on your source of data utilizing fluent API.
Using an object-oriented approach, you can apply various operations to your data
source through the fluent API provided by the data stream instance.

.. code-block:: php
:linenos:
Expand All @@ -57,14 +54,14 @@ various operations on your source of data utilizing fluent API.

Stream::create(/* ... */)
->map(/* ... */)
->batch(/* ... */)
->tap(/* ... */)
->takeUntil(/* ... */)
->finally(/* ... */);

.. _pipe operator: https://wiki.php.net/rfc/pipe-operator-v3

Having in mind PHP 8.5, library provides a functions as well to support
functional approach using `pipe operator`_:
With PHP 8.5 in mind, the library also provides functions to support a
functional approach using the `pipe operator`_.

.. code-block:: php
:linenos:
Expand All @@ -73,44 +70,43 @@ functional approach using `pipe operator`_:

use function RunOpenCode\Component\Dataset\stream;
use function RunOpenCode\Component\Dataset\map;
use function RunOpenCode\Component\Dataset\batch;
use function RunOpenCode\Component\Dataset\tap;
use function RunOpenCode\Component\Dataset\takeUntil;
use function RunOpenCode\Component\Dataset\finally;

stream(/* ... */)
|> map(/* ... */)
|> batch(/* ... */)
|> tap(/* ... */)
|> takeUntil(/* ... */)
|> finally(/* ... */);

Data stream is, of course, iterable and none of the operators are applied until
stream is being iterated.
A data stream is, of course, iterable, and none of the operators are applied
until the stream is iterated.

Operators
---------

You use operators to execute some "operations" against the stream of data.
Operators operate on yielded value, one by one, and they yield result of their
operations.
Operators are used to perform specific operations on a data stream. They process
each yielded value one by one and yield the result of their operation.

Library delivers a set of commonly used operators, such as ``map()``,
``filter()``, ``take()``, etc. However, you may expand set of operators by
writing your own.
The library provides a set of commonly used operators, such as ``map()``,
``filter()``, ``take()``, and others. However, you can extend the available set
of operators by implementing your own.

General idea is that with operators, you execute various operations reading
from and/or modifying original stream.
The general idea behind operators is to execute various operations that read
from and/or modify the original stream as it is being iterated.

Reducers
--------

Reducers iterate over the stream of data and reduce all of them into one single
value of any kind. Common examples of reducers are ``sum()``, ``average()``,
``min()``, ``max()``, etc. which are delivered with this library.
Reducers iterate over a data stream and reduce all elements into a single value
of any kind. Common examples of reducers include ``sum()``, ``average()``,
``min()``, ``max()``, all of which are provided by this library.

However, reducers are design to be iterable as well, and may be applied as
aggregators (which is a new concept defined by this library) which enables you
to apply reducer on stream and get both reduced value as well as iterate through
stream.
However, reducers are designed to be iterable as well and can be applied as
aggregators (a concept introduced by this library, explained later in this
document). This allows you to apply a reducer to a stream while still being able
to iterate over it and obtain the reduced value at the same time.

.. code-block:: php
:linenos:
Expand All @@ -126,17 +122,17 @@ stream.
Collectors
----------

When operators (and aggregators) are applied on stream, you can get to stream
data just by iterating.
When operators (and aggregators) are applied to a stream, you can access the
stream data simply by iterating over it.

Sometimes you want to collect all of that data into some data structure to
continue with processing using some other method.
Sometimes, however, you may want to collect all the data into a specific data
structure for further processing using other methods.

Library, in that matter, supports such concept and provides common collectors
such as ``RunOpenCode\Component\Dataset\Collector\ArrayCollector`` which
collects everything into array, or
``RunOpenCode\Component\Dataset\Collector\ListCollector`` which collects
everything into numeric ordered array and so on.
The library supports this concept and provides common collectors, such as
``RunOpenCode\Component\Dataset\Collector\ArrayCollector``, which collects all
items into an array, or
``RunOpenCode\Component\Dataset\Collector\ListCollector``, which collects items
into a numerically ordered array, and more.

.. code-block:: php
:linenos:
Expand All @@ -161,16 +157,16 @@ everything into numeric ordered array and so on.
Aggregators
-----------

Aggregators are concept introduced with this library. General idea is that you
can both iterate stream with applied operators and calculate reduced value
simultaneously.
Aggregators are a concept introduced by this library. The general idea is that
you can iterate over a stream with applied operators while simultaneously
calculating a reduced value in single pass.

This is useful when, per example, you are rendering a table of financial data
and at the bottom of table you want to render total and/or average sum, or
similar.
This is useful, for example, when rendering a table of financial data and you
want to display totals, averages, or similar summary values at the bottom of the
table.

Aggregators are "attached" reducers to a stream and can be accessed when stream
is fully iterated.
Aggregators are essentially "attached" reducers to a stream and can be accessed
once the stream has been fully iterated.

.. code-block:: php
:linenos:
Expand All @@ -190,6 +186,5 @@ is fully iterated.

echo $stream->aggregators('sum');
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code example is using the old API. According to the changes in this PR, the property has been renamed from 'aggregators' to 'aggregated', and it returns an array, not a method call. This should be: $stream->aggregated['sum']

Copilot uses AI. Check for mistakes.

Knowing the concepts applied within this library, you may proceed with further
reading of documentation for this library.

With an understanding of the concepts used in this library, you can now proceed
with the rest of the documentation.
86 changes: 43 additions & 43 deletions docs/source/components/dataset/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,28 @@
Dataset component
=================

This library is heavily inspired by `Java Stream API`_ for dealing with
collections in functionali(ish), declarative way. In some way, it is inspired
with `ReactiveX`_ as well, only with much, much simpler approach and with less
features, of course.
This library is heavily inspired by the `Java Stream API`_ for working with
collections in a functional(ish), declarative way. In some aspects, it is also
inspired by `ReactiveX`_, but with a much simpler approach and far fewer
features.

If your problem can be described as:

I have a data stream from some source (file, database query result, etc.) and
I want to iterate through its records and do some processing with small
memory footprint.
I have a data stream from some source (file, database query result, etc.),
and I want to iterate over its records and process them using a small memory
footprint.

this is the library which can help you achieve that goal by using declarative
approach.
then this library can help you achieve that goal using a declarative approach.

There are several PHP implementations of same idea, however, this implementation
focuses on PHP ``iterable`` assuming that underlying implementation is most
probably instance of `Generator`_. Of course, it will work with ``array`` data
type, or anything which implements ``\Traversable``, however, power of this
library is in its focus of simple declarative data stream processing with small
memory footprint.
There are several PHP implementations of this idea; however, this implementation
focuses on PHP iterable values, assuming that the underlying implementation
is most likely an instance of `Generator`_. Of course, it also works with the
``array`` data type or anything that implements ``\Traversable``. The real
strength of this library lies in its focus on simple, declarative data stream
processing with minimal memory usage.

If you need full fledged `ReactiveX`_ in PHP, please take a look at the official
implementation of the specification at `RxPHP`_.
implementation of the specification: `RxPHP`_.

.. _Java Stream API: https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html
.. _ReactiveX: https://reactivex.io
Expand All @@ -35,13 +34,14 @@ Features
--------

* Declarative approach to process data streams.
* Designed to work with any ``iterable`` using as less as possible memory.
* Provides bunch of operators, reducers and collectors, which can be easily
* Designed to work with any ``iterable`` while using as little memory as
possible.
* Provides a set of operators, reducers, and collectors that can be easily
extended or added as needed.
* Focused on small memory consumption during processing.
* Introduces concept of **aggregators** allowing you to simultaneously process
stream and reduce (aggregate) values during processing without breaking the
data stream.
* Introduces the concept of **aggregators**, allowing you to process a stream
and reduce (aggregate) values simultaneously without interrupting the data
stream.

Table of Contents
-----------------
Expand All @@ -60,10 +60,11 @@ Table of Contents
Quick example
-------------

A simple example of using this library for listing online transactions is given
below. Assume that we want to display list of online transactions executed in
some time period, and we want to show total amount for each individual currency
as well as in total, this would be a way to do that using this library:
A simple example of using this library to list online transactions is shown
below. Assume that we want to display a list of online transactions executed
within a certain time period, and we also want to calculate the total amount for
each currency as well as the overall total. This is how it can be done using
this library:

.. code-block:: php
:linenos:
Expand Down Expand Up @@ -120,29 +121,28 @@ as well as in total, this would be a way to do that using this library:
}
}

**Explanation of the code:** On line no 35 we fetch data from database. PHP
returns iterable which is pointer on the first row of the returned dataset,
which means that no rows are loaded into memory of the PHP virtual machine.
**Explanation of the code:** On line 35 we fetch data from database. PHP returns
an iterable that points to the first row of the result set, which means that no
rows are loaded into the PHP virtual machine's memory upfront.

Line 41 wraps that iterable into instance of
``RunOpenCode\Component\Dataset\Stream`` and then we apply operations which we
want to conduct against the stream during its iteration.
``RunOpenCode\Component\Dataset\Stream``. We then apply the operations that
should be executed while iterating over the stream.

Line 42 applies aggregator which will sum all transactions executed using
``EUR`` as currency, line 43 does that for ``USD``.
Line 42 applies aggregator that sums all transactions executed in ``EUR``. Line
43 does the same for ``USD``.

Line 44 will add new column to the row, ``converted`` which will convert all
amounts to ``EUR`` using given conversion rate.
Line 44 adds new column to the each row, ``converted`` which will convert all
amounts to ``EUR`` using the provided conversion rate.

Lastly, lines 48 and 49 will apply aggregator which will provide us with total
sum of all transactions in ``EUR`` as well as average transaction amount in
``EUR``.
Finally, lines 48 and 49 apply aggregators that calculate the total sum of all
transactions in ``EUR`` as well as the average transaction amount in ``EUR``.

**None of the processing is executed, until you iterate stream**. Iterable is
just wrapped with processing logic, to execute it, you need to iterate it. You
will probably do that in some templating language. However, example below will
just use ``echo`` to demonstrate concept:
**None of the processing is executed, until the stream is iterated**. The
iterable is only wrapped with processing logic. To execute it, you must iterate
over it. In practice, this will often be done in a templating engine.

The example below uses ``echo`` to demonstrate the concept:

.. code-block:: php
:linenos:
Expand All @@ -162,9 +162,9 @@ just use ``echo`` to demonstrate concept:
}

// Since we iterated, our aggregated values are available too.
echo \sprint('Total in EUR: %d', $stream->aggregators['total_converted']);
echo \sprint('Total in EUR: %d', $stream->aggregated['total_converted']);
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name appears to be misspelled. It should be sprintf, not sprint.

Copilot uses AI. Check for mistakes.
echo "\n";
echo \sprint('Average in EUR: %d', $stream->aggregators['average_converted']);
echo \sprint('Average in EUR: %d', $stream->aggregated['average_converted']);
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name appears to be misspelled. It should be sprintf, not sprint.

Copilot uses AI. Check for mistakes.

So, during this process, memory footprint is almost as low as amount of memory
required for storing one row.
2 changes: 1 addition & 1 deletion docs/source/components/dataset/installation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,4 @@ following command in your terminal:
composer require runopencode/dataset

Nothing more is required, no additional initialization and/or configration. Just
Copy link

Copilot AI Dec 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The word is misspelled. It should be "configuration" not "configration".

Copilot uses AI. Check for mistakes.
use the library classes.
use the library classes/functions.
Loading
Loading