Skip to content

explore_eager and new data structure for efficient appends #75

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Dec 9, 2017

Conversation

Erotemic
Copy link
Contributor

@Erotemic Erotemic commented Dec 8, 2017

This PR address an issue I've been experiencing in interactive use. I want to run the maximization function and then try custom values based in its result. However, the only way to do that is to "initialize", and doing that blows away all previous values of self.X and self.Y.

To this end I made a function explore_eager (name pending), which works like explore, but immediately evaluates those values and adds them to your current explored data. It does this in such a way that if self.maximize is called again it uses the new custom points when it fits its GMM.

Now, to make this change I had to do some non-trivial code refactoring. I saw a few issues in the code base that I felt I should address while I was in there. The main problem was the use of np.vstack and np.append after evaluating every single point. These numpy methods are O(n) as opposed to the O(1) python list.append function. With a large sample, these will eventually start causing problems.

To fix this I abstracted the point storage into a class called TargetSpace to manage the bounds, X array, and Y array. Instead of using python lists I found that using custom np.empty over-allocation with numpy views to be an effective solution. This also prevents the need for casting a list of lists to an array when a scipy function is called (which would also be O(n)).

I think the TargetSpace api is pretty clean and it makes the code in bayesian_optimization.py a lot simpler. The main function are:

  • observe_point(x) : evaluate y = f(x), record and return y. Memoize the result.
  • add_observation(x, y): add a known x and y point. Raises a KeyError if x was already added.
  • random_points(num): generates num random points within the bounds.
  • max_point() - returns the best x/y pair so far
  • set_bounds(new_bounds) - updates the bounds (for consistency with exiting api)
  • __len__ - num unique points added so far
  • __contains__ - returns True if we have seen the point thusfar
  • X - property which behaves like the old self.X
  • Y - property which behaves like the old self.Y

Because the class manages only unique points, we can avoid removing duplicates every time a GP is fit. This should also offer some amount of speedup.

Because this is a fairly large change I also added comprehensive unit tests (targeted towards pytest).

Having this structure should make it possible to further simplify the bayesian_optimization.py codebase and api, but I wanted to make those changes fairly minimal at least in this first pass. (For instance it should be possible to seemlessly accept a dict of lists / list of dicts / ndarray / list of lists / pandas dataframe as an input point(s).)

@Erotemic Erotemic changed the title Eval points explore_eager and new data structure for efficient appends Dec 8, 2017
@fmfn
Copy link
Member

fmfn commented Dec 8, 2017

Awesome, thanks for the PR! I'll look into it over the weekend, but I really the idea.

@Erotemic
Copy link
Contributor Author

Erotemic commented Dec 9, 2017

Cool, thanks.

I made another small add on to this PR. First it fixes a small bug I found if x_init was empty. Then it renames the tests_* files to test_*, so they can be found by pytest. I also added a few tests in test_bayesian_optimization.py because there weren't any. These should catch issues like these.

I also noticed that this project isn't integrated with TravisCI, so I set up a .travis.yml and a pytest.ini, which should allow you to do CI testing with pytest. I think all you have to do is sign in with your github account on https://travis-ci.org and enable the repo for CI testing. It is completely free for open source projects.

@fmfn
Copy link
Member

fmfn commented Dec 9, 2017

Sweet.

Yeah, I created a tentative test file a while ago, but never took the time to actually write anything. I'll figure what needs to be done in order to plug in TravisCI to this project.

Thanks so much for your contributions, really good stuff!

@fmfn fmfn merged commit 6ea60e0 into bayesian-optimization:master Dec 9, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants