explore_eager and new data structure for efficient appends #75

Erotemic · 2017-12-08T03:25:18Z

This PR address an issue I've been experiencing in interactive use. I want to run the maximization function and then try custom values based in its result. However, the only way to do that is to "initialize", and doing that blows away all previous values of self.X and self.Y.

To this end I made a function explore_eager (name pending), which works like explore, but immediately evaluates those values and adds them to your current explored data. It does this in such a way that if self.maximize is called again it uses the new custom points when it fits its GMM.

Now, to make this change I had to do some non-trivial code refactoring. I saw a few issues in the code base that I felt I should address while I was in there. The main problem was the use of np.vstack and np.append after evaluating every single point. These numpy methods are O(n) as opposed to the O(1) python list.append function. With a large sample, these will eventually start causing problems.

To fix this I abstracted the point storage into a class called TargetSpace to manage the bounds, X array, and Y array. Instead of using python lists I found that using custom np.empty over-allocation with numpy views to be an effective solution. This also prevents the need for casting a list of lists to an array when a scipy function is called (which would also be O(n)).

I think the TargetSpace api is pretty clean and it makes the code in bayesian_optimization.py a lot simpler. The main function are:

observe_point(x) : evaluate y = f(x), record and return y. Memoize the result.
add_observation(x, y): add a known x and y point. Raises a KeyError if x was already added.
random_points(num): generates num random points within the bounds.
max_point() - returns the best x/y pair so far
set_bounds(new_bounds) - updates the bounds (for consistency with exiting api)
__len__ - num unique points added so far
__contains__ - returns True if we have seen the point thusfar
X - property which behaves like the old self.X
Y - property which behaves like the old self.Y

Because the class manages only unique points, we can avoid removing duplicates every time a GP is fit. This should also offer some amount of speedup.

Because this is a fairly large change I also added comprehensive unit tests (targeted towards pytest).

Having this structure should make it possible to further simplify the bayesian_optimization.py codebase and api, but I wanted to make those changes fairly minimal at least in this first pass. (For instance it should be possible to seemlessly accept a dict of lists / list of dicts / ndarray / list of lists / pandas dataframe as an input point(s).)

fmfn · 2017-12-08T15:13:56Z

Awesome, thanks for the PR! I'll look into it over the weekend, but I really the idea.

Erotemic · 2017-12-09T01:40:21Z

Cool, thanks.

I made another small add on to this PR. First it fixes a small bug I found if x_init was empty. Then it renames the tests_* files to test_*, so they can be found by pytest. I also added a few tests in test_bayesian_optimization.py because there weren't any. These should catch issues like these.

I also noticed that this project isn't integrated with TravisCI, so I set up a .travis.yml and a pytest.ini, which should allow you to do CI testing with pytest. I think all you have to do is sign in with your github account on https://travis-ci.org and enable the repo for CI testing. It is completely free for open source projects.

fmfn · 2017-12-09T01:47:13Z

Sweet.

Yeah, I created a tentative test file a while ago, but never took the time to actually write anything. I'll figure what needs to be done in order to plug in TravisCI to this project.

Thanks so much for your contributions, really good stuff!

Erotemic and others added 5 commits December 7, 2017 17:04

wip

087cdee

working on adding TargetSpace abstraction for a simpler api

2c95f8c

wip

2e860f6

Added target space and corresponding tests. Simplified optimization code

67fdbf5

wip

6a8f302

Erotemic changed the title ~~Eval points~~ explore_eager and new data structure for efficient appends Dec 8, 2017

Added API compatibility functions

66249ba

Added tests and a TravisCI script

eb4a6d4

fmfn merged commit 6ea60e0 into bayesian-optimization:master Dec 9, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

explore_eager and new data structure for efficient appends #75

explore_eager and new data structure for efficient appends #75

Uh oh!

Erotemic commented Dec 8, 2017 •

edited

Loading

Uh oh!

fmfn commented Dec 8, 2017

Uh oh!

Erotemic commented Dec 9, 2017

Uh oh!

fmfn commented Dec 9, 2017

Uh oh!

Uh oh!

explore_eager and new data structure for efficient appends #75

explore_eager and new data structure for efficient appends #75

Uh oh!

Conversation

Erotemic commented Dec 8, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmfn commented Dec 8, 2017

Uh oh!

Erotemic commented Dec 9, 2017

Uh oh!

fmfn commented Dec 9, 2017

Uh oh!

Uh oh!

Erotemic commented Dec 8, 2017 •

edited

Loading