Skip to content

Proposal: Add iterable\any(iterable $input, ?callable $cb=null), all(...), none(...), find(...), reduce(...) #6053

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
492 changes: 466 additions & 26 deletions ext/standard/array.c

Large diffs are not rendered by default.

19 changes: 19 additions & 0 deletions ext/standard/basic_functions.stub.php
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,8 @@

/** @generate-class-entries */

namespace {

final class __PHP_Incomplete_Class
{
}
Expand Down Expand Up @@ -1510,3 +1512,20 @@ function sapi_windows_set_ctrl_handler(?callable $handler, bool $add = true): bo

function sapi_windows_generate_ctrl_event(int $event, int $pid = 0): bool {}
#endif
} // end global namespace

namespace iterable {

function all(iterable $iterable, ?callable $callback = null): bool {}

function any(iterable $iterable, ?callable $callback = null): bool {}

function none(iterable $iterable, ?callable $callback = null): bool {}

function find(iterable $iterable, callable $callback, mixed $default = null): mixed {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this one? I'd rather add filter, map, and flatmap if we need more functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

find() is also useful and many other programming languages include both (js, haskell, etc.)

For example, if you have an array of a million elements and only want the first match, it is much more efficient to call find() if the iterable contains a matching value (and there would be less service calls and db calls) compared to reset(array_filter(...))

Additionally, filter() and map() would be waiting on the existence of CachedIterable when it starts, because Traversables can have repeated keys yield 'key' => 1; yield 'key' => 2;

Copy link
Contributor

@morrisonlevi morrisonlevi Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally, filter() and map() would be waiting on the existence of CachedIterable when it starts, because Traversables can have repeated keys yield 'key' => 1; yield 'key' => 2;

This does not require a CachedIterable; they can return any iterator.

Edit: on second thought, they should not return a cached iterable. These routines are often chained together; if every piece of the chain caches their results, it will balloon memory usage.

Copy link
Contributor Author

@TysonAndre TysonAndre Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that there's nothing in the standard library yet that supports rewinding, counting, and especially arbitrary offset access

It seems much more difficult to use without the support for rewindability, random/repeated offset access, countability, etc.

But yes, I suppose you could hide the implementation entirely with InternalIterator and only support a single iteration

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: on second thought, they should not return a cached iterable. These routines are often chained together; if every piece of the chain caches their results, it will balloon memory usage.

CachedIterable is basically the name chosen for a rewindable immutable key-value sequence. It isn't cached permanently, it has a regular object lifetime. I'm referring to https://wiki.php.net/rfc/cachediterable

Copy link
Contributor

@morrisonlevi morrisonlevi Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CachedIterable is basically the name chosen for a rewindable immutable key-value sequence. It isn't cached permanently, it has a regular object lifetime. I'm referring to https://wiki.php.net/rfc/cachediterable

Yes, I am specifically saying they should not return this object. It will ballon memory usage to do it that way. Think about it; you have a map + filter plus some terminator like first_n with n=100. Filter and map will each hold at least 100 values in memory that shouldn't be there while calculating the first_n.

Copy link
Contributor Author

@TysonAndre TysonAndre Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am specifically saying they should not return this object. It will ballon memory usage to do it that way. Think about it; you have a map + filter plus some terminator like first_n with n=100. Filter and map will each hold at least 100 values in memory that shouldn't be there while calculating the first_n.

  1. Many end users would want the standard library to return something that could be iterated over multiple times - this was a source of bugs in spl classes such as https://www.php.net/arrayobject prior to them being switched to iterables
$foo = map(...);
foreach ($foo as $i => $v1) {
    foreach ($foo as $i => $v2) {
        if (some_pair_predicate($v1, $v2)) {
            // do something
        }
    }
}
  1. Library/application authors that are interested in lazy generators could use or implement something such as https://github.com/nikic/iter instead - my opinion is that the standard library should provide something that is easy to understand, debug, serialize or represent, etc.
  2. It would be harder to understand why SomeFrameworkException is thrown in code unrelated to that framework when it is passed to some function that accepts a generic iterable, and harder to write correct exception handling for it if done in a lazy generation style
  3. It is possible to implement a lazy version of CachedIterable that only loads values as needed - but I hadn't proposed it due to doubts that 2/3 of voters would consider it widely useful enough to be included in php rather than as a userland or PECL library

Copy link
Contributor

@morrisonlevi morrisonlevi Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterables are made from arrays and generators. Since there is already a large body of functions which work with arrays, it seems reasonable to assume they would use an Iterable part of the standard library because they also use iterators/generators. One of the main points of generators is to reduce memory. We shouldn't add a corpus of iterable functions that will increase memory; that works directly against the goals of the iterable feature!

No, it's much better to compose something if you want it to be eager, e.g. $foo = CachedIterable::new(map(...$args)); We can include this in the examples in the docs for filter, map, etc.

Copy link
Contributor Author

@TysonAndre TysonAndre Jun 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Iterables are made from arrays and generators

Iterables are made from arrays and Traversable objects such as SplObjectStorage, ArrayObject, user-defined data structures, and Generators

We shouldn't add a corpus of iterable functions that will increase memory; that works directly against the goals of the iterable feature!

In common use cases, the memory increase may be small, especially if the lifetime of the variable or temporary expression holding the result is short.

Additionally, a lazy iterable would require keeping a reference to the previous (possibly lazy) iterable, and the iterables/arrays those reference - if the initial iterable is larger than the result then eagerly evaluating $var = map($cb, temporary()) would end up saving memory

No, it's much better to compose something if you want it to be eager, e.g. $foo = CachedIterable::new(map(...$args)); We can include this in the examples in the docs for filter, map, etc.

That seems error prone and I'm personally opposed to that.

PHP has typically been imperitive rather than functional, and focused on "cater[ing] to the skill-levels and platforms of a wide range of users" as RFC authors are repeatedly reminded in https://wiki.php.net/rfc/template and my interpretation of that is that imperitive would be much more acceptable (aside: The loosely typed language part seems less applicable nowadays)

PHP is and should remain:

  1. a pragmatic web-focused language
  2. a loosely typed language
  3. a language which caters to the skill-levels and platforms of a wide range of users

Lazy data structures would be easy to misuse (consume twice, attempt to serialize or encode, (or var_dump or inspect with Xdebug), easier to attempt to log the full iterable(consume twice) etc) without (or even with) linters and static analyzers, so this really doesn't seem like catering to a wide range of users.


Explicitly using a different family of functions to act on generators internally would probably make more sense than being the default, e.g. https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#findFirst-- and https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html (Streams are separate from java.util.Collection in java, javascript eagerly evaluates https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/Array/Map, etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's much better to compose something if you want it to be eager, e.g. $foo = CachedIterable::new(map(...$args)); We can include this in the examples in the docs for filter, map, etc.

Having the eager version be longer than the lazy version (instead of shorter or the same length) would also encourage the use of the lazy version, which I'd objected to for being error prone and easy to misuse.


//function fold(iterable $iterable, callable $by, mixed $initial): mixed {}

function reduce(iterable $iterable, callable $by): mixed {}

}
32 changes: 31 additions & 1 deletion ext/standard/basic_functions_arginfo.h
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
/* This is a generated file, edit the .stub.php file instead.
* Stub hash: 088e107c889f44d0a9762c47fce668f4d82f28ba */
* Stub hash: c04b8fad2fb795a23786e6340a514c63d9edf5a1 */

ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_set_time_limit, 0, 1, _IS_BOOL, 0)
ZEND_ARG_TYPE_INFO(0, seconds, IS_LONG, 0)
Expand Down Expand Up @@ -2218,6 +2218,26 @@ ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_sapi_windows_generate_ctrl_event
ZEND_END_ARG_INFO()
#endif

ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_iterable_all, 0, 1, _IS_BOOL, 0)
ZEND_ARG_TYPE_INFO(0, iterable, IS_ITERABLE, 0)
ZEND_ARG_TYPE_INFO_WITH_DEFAULT_VALUE(0, callback, IS_CALLABLE, 1, "null")
ZEND_END_ARG_INFO()

#define arginfo_iterable_any arginfo_iterable_all

#define arginfo_iterable_none arginfo_iterable_all

ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_iterable_find, 0, 2, IS_MIXED, 0)
ZEND_ARG_TYPE_INFO(0, iterable, IS_ITERABLE, 0)
ZEND_ARG_TYPE_INFO(0, callback, IS_CALLABLE, 0)
ZEND_ARG_TYPE_INFO_WITH_DEFAULT_VALUE(0, default, IS_MIXED, 0, "null")
ZEND_END_ARG_INFO()

ZEND_BEGIN_ARG_WITH_RETURN_TYPE_INFO_EX(arginfo_iterable_reduce, 0, 2, IS_MIXED, 0)
ZEND_ARG_TYPE_INFO(0, iterable, IS_ITERABLE, 0)
ZEND_ARG_TYPE_INFO(0, by, IS_CALLABLE, 0)
ZEND_END_ARG_INFO()


ZEND_FUNCTION(set_time_limit);
ZEND_FUNCTION(header_register_callback);
Expand Down Expand Up @@ -2839,6 +2859,11 @@ ZEND_FUNCTION(sapi_windows_set_ctrl_handler);
#if defined(PHP_WIN32)
ZEND_FUNCTION(sapi_windows_generate_ctrl_event);
#endif
ZEND_FUNCTION(all);
ZEND_FUNCTION(any);
ZEND_FUNCTION(none);
ZEND_FUNCTION(find);
ZEND_FUNCTION(reduce);


static const zend_function_entry ext_functions[] = {
Expand Down Expand Up @@ -3492,6 +3517,11 @@ static const zend_function_entry ext_functions[] = {
#if defined(PHP_WIN32)
ZEND_FE(sapi_windows_generate_ctrl_event, arginfo_sapi_windows_generate_ctrl_event)
#endif
ZEND_NS_FE("iterable", all, arginfo_iterable_all)
ZEND_NS_FE("iterable", any, arginfo_iterable_any)
ZEND_NS_FE("iterable", none, arginfo_iterable_none)
ZEND_NS_FE("iterable", find, arginfo_iterable_find)
ZEND_NS_FE("iterable", reduce, arginfo_iterable_reduce)
ZEND_FE_END
};

Expand Down
68 changes: 68 additions & 0 deletions ext/standard/tests/iterable/all_array.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
--TEST--
Test all() function
--FILE--
<?php

use function iterable\all;

/*
Prototype: bool all(array $array, mixed $callback);
Description: Iterate array and stop based on return value of callback
*/

function is_int_ex($item)
{
return is_int($item);
}

echo "\n*** Testing not enough or wrong arguments ***\n";

function dump_all(...$args) {
try {
var_dump(all(...$args));
} catch (Error $e) {
printf("Caught %s: %s\n", $e::class, $e->getMessage());
}
}

dump_all();
dump_all(true);
dump_all([]);
dump_all(true, function () {});
dump_all([], true);

echo "\n*** Testing basic functionality ***\n";

dump_all([1, 2, 3], 'is_int_ex');
dump_all(['hello', 1, 2, 3], 'is_int_ex');
$iterations = 0;
dump_all(['hello', 1, 2, 3], function($item) use (&$iterations) {
++$iterations;
return is_int($item);
});
var_dump($iterations);

echo "\n*** Testing edge cases ***\n";

dump_all([], 'is_int_ex');

echo "\nDone";
?>
--EXPECT--
*** Testing not enough or wrong arguments ***
Caught ArgumentCountError: iterable\all() expects at least 1 argument, 0 given
Caught TypeError: iterable\all(): Argument #1 ($iterable) must be of type iterable, bool given
bool(true)
Caught TypeError: iterable\all(): Argument #1 ($iterable) must be of type iterable, bool given
Caught TypeError: iterable\all(): Argument #2 ($callback) must be a valid callback or null, no array or string given

*** Testing basic functionality ***
bool(true)
bool(false)
bool(false)
int(1)

*** Testing edge cases ***
bool(true)

Done
63 changes: 63 additions & 0 deletions ext/standard/tests/iterable/all_traversable.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
--TEST--
Test all() function
--FILE--
<?php

use function iterable\all;

/*
Prototype: bool all(array $array, ?callable $callback = null, int $use_type = 0);
Description: Iterate array and stop based on return value of callback
*/

function is_int_ex($item)
{
return is_int($item);
}

function dump_all(...$args) {
try {
var_dump(all(...$args));
} catch (Error $e) {
printf("Caught %s: %s\n", $e::class, $e->getMessage());
}
}


echo "\n*** Testing not enough or wrong arguments ***\n";

dump_all(new ArrayIterator());
dump_all(new ArrayIterator(), true);

echo "\n*** Testing basic functionality ***\n";

dump_all(new ArrayIterator([1, 2, 3]), 'is_int_ex');
dump_all(new ArrayIterator(['hello', 1, 2, 3]), 'is_int_ex');
$iterations = 0;
dump_all(new ArrayIterator(['hello', 1, 2, 3]), function($item) use (&$iterations) {
++$iterations;
return is_int($item);
});
var_dump($iterations);

echo "\n*** Testing edge cases ***\n";

dump_all(new ArrayIterator(), 'is_int_ex');

echo "\nDone";
?>
--EXPECT--
*** Testing not enough or wrong arguments ***
bool(true)
Caught TypeError: iterable\all(): Argument #2 ($callback) must be a valid callback or null, no array or string given

*** Testing basic functionality ***
bool(true)
bool(false)
bool(false)
int(1)

*** Testing edge cases ***
bool(true)

Done
81 changes: 81 additions & 0 deletions ext/standard/tests/iterable/any_array.phpt
Original file line number Diff line number Diff line change
@@ -0,0 +1,81 @@
--TEST--
Test any() function
--FILE--
<?php

use function iterable\any;

/*
Prototype: bool any(array $iterable, mixed $callback);
Description: Iterate array and stop based on return value of callback
*/

function is_int_ex($nr)
{
return is_int($nr);
}

echo "\n*** Testing not enough or wrong arguments ***\n";

function dump_any(...$args) {
try {
var_dump(any(...$args));
} catch (Error $e) {
printf("Caught %s: %s\n", $e::class, $e->getMessage());
}
}

dump_any();
dump_any(true);
dump_any([]);
dump_any(true, function () {});
dump_any([], true);

echo "\n*** Testing basic functionality ***\n";

dump_any(['hello', 'world'], 'is_int_ex');
dump_any(['hello', 1, 2, 3], 'is_int_ex');
$iterations = 0;
dump_any(['hello', 1, 2, 3], function($item) use (&$iterations) {
++$iterations;
return is_int($item);
});
var_dump($iterations);

echo "\n*** Testing second argument to predicate ***\n";

dump_any([1, 2, 3], function($item, $key) {
var_dump($key);
return false;
});

echo "\n*** Testing edge cases ***\n";

dump_any([], 'is_int_ex');

dump_any(['key' => 'x'], null);

echo "\nDone";
?>
--EXPECT--
*** Testing not enough or wrong arguments ***
Caught ArgumentCountError: iterable\any() expects at least 1 argument, 0 given
Caught TypeError: iterable\any(): Argument #1 ($iterable) must be of type iterable, bool given
bool(false)
Caught TypeError: iterable\any(): Argument #1 ($iterable) must be of type iterable, bool given
Caught TypeError: iterable\any(): Argument #2 ($callback) must be a valid callback or null, no array or string given

*** Testing basic functionality ***
bool(false)
bool(true)
bool(true)
int(2)

*** Testing second argument to predicate ***
Caught ArgumentCountError: Too few arguments to function {closure}(), 1 passed and exactly 2 expected

*** Testing edge cases ***
bool(false)
bool(true)

Done
Loading