Skip to content

add database support, update print summary #87

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

0800314
Copy link

@0800314 0800314 commented Feb 17, 2018

I find this library extremely useful, however it was not able to load historical data.
If the experiment is stopped because of some unexpected things or by force keyboard interruput, with each observation is quite expensive, then restarting the same experiment with the same parameter setting could be very painful, since there were too much information lost.

I wrote some code to make it able to talk to database, using sqlalchemy, so that this is not dependent to any database. The suggested code to interact to the database is in bayes_opt/examples/database_interactions.ipynb, you can also view it here (https://github.com/yanzhen0923/BayesianOptimization/blob/master/examples/database_interactions.ipynb).

I also updated the print_summary function, so that the historical data can be view at any time, as long as the bo object has loaded them.

@codecov-io
Copy link

codecov-io commented Feb 17, 2018

Codecov Report

Merging #87 into master will increase coverage by 4.58%.
The diff coverage is 88.13%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #87      +/-   ##
==========================================
+ Coverage   82.87%   87.46%   +4.58%     
==========================================
  Files           4        5       +1     
  Lines         292      351      +59     
  Branches       35       38       +3     
==========================================
+ Hits          242      307      +65     
+ Misses         45       40       -5     
+ Partials        5        4       -1
Impacted Files Coverage Δ
bayes_opt/helpers.py 90.12% <ø> (ø) ⬆️
bayes_opt/bayesian_optimization.py 77.61% <78.94%> (+11.52%) ⬆️
bayes_opt/db_manager.py 92.5% <92.5%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 317f735...2773ef1. Read the comment docs.

@yanzhen0923
Copy link

0800314 is my puppet id, sry I did not notice that when I sent this request.

@fmfn fmfn self-requested a review February 21, 2018 19:13
@fmfn
Copy link
Member

fmfn commented Feb 21, 2018

Interesting, I like the idea, and thanks for the PR. Someone recently was asking about a more general logger, and this is a good step in that direction. Give me some time to thing about how this all play out.

@yanzhen0923
Copy link

Well I actually had problems while I was using this library - don't know where to get the summary log, cannot resume the previous (very costly)experiment when it stops accidentally, then I came up with the idea of this PR.

Maybe the description above is not quite sufficient to tell what is going on, I think I could be nice if I do some more explanations here...

  1. Update on print_summary() - mainly in helpers.py
    This function was originally not implemented. I was trying to resemble the style in the original code printing the 'Initialization' part and the 'Bayesian Optimization' part, especially the color and indention/padding style. I pass space.X and space.Y from bayesian_optimization.py for printing, then the results are sorted before print. I did not print the 'Step' and 'Time' though, since they would only make sense during the BO process.

  2. Add database support - add new file db_manager.py, add new APIs to bayesian_optimization.py.
    In db_manager.py, I use sqlalchemy to talk to the database. This is able to talk many databases, especially it supports SQLite - a lightweight database that does not require any configuration, which is simple for experiments. Of course, if someone already has had a database with configurations, just change the style of the connection string. They are all well written in sqlalchemy's doc.

I did four functions inside db_manager.py - init_db, load, save and clear, they are wrapped as APIs in bayesian_optimization.py. The idea is that, using a pre-defined table called 'target_space_table' to store the data.

The connection string is passed to the bo.init_db API as bo.init_db(conn_str), the database is specified after this.

Whenever the experiment is finished, if the user calls bo.save(), the database manager would obtain space.X and space.Y, then pickle them as objects, then save them into the database table.
In this table, there is always 1 row, 3 columns, since multi-dimensional data are pickled as objects. One column for primary key(id), one for space.X and one for space.Y.

If the user calls bo.load(), db_manager.load() would return objects for initialization, then they are added into the BO object as initialization information after calling bo.inititialze(init_objects) and bo.maximize(init_points=0, n_iters=0) .

The bo.clear() API just a way to remove all the previous data programmatically.

The usage including exception handling is written in examples/database_interactions.ipynb.

I also added a test for db_manager.py to assert that saving, loading, then maximizing same steps of the same function periodically would increase the bo initialization size regularly by the number of steps.

Hope that helps!

@fmfn
Copy link
Member

fmfn commented Jul 6, 2018

Sorry for the extreme delay, but I'm finally getting around making some well overdue updates to this library as well and addressing some PRs.

The mindset here was always to keep this library as barebones as possible. More robust alternatives exists for those in need (such as spearmint and others), so I try to keep features and complexity to a minimum here. Nevertheless the need for proper logging and the ability to pick things back up where they stopped can't be ignored.

While I really the code, I believe a simpler approach would be more in line with this library's style. In another PR someone introduced the notion of a callback, which I believe might be the way to go. A genetic object that gets called at every step. I think I will pursue that avenue and not merge this PR as is.

I'll leave this PR open for the time being and I will ping you once progress has been made with callbacks to see how we can provide a DB callback by default.

@fmfn fmfn closed this Nov 7, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants