add database support, update print summary #87

0800314 · 2018-02-17T14:45:47Z

I find this library extremely useful, however it was not able to load historical data.
If the experiment is stopped because of some unexpected things or by force keyboard interruput, with each observation is quite expensive, then restarting the same experiment with the same parameter setting could be very painful, since there were too much information lost.

I wrote some code to make it able to talk to database, using sqlalchemy, so that this is not dependent to any database. The suggested code to interact to the database is in bayes_opt/examples/database_interactions.ipynb, you can also view it here (https://github.com/yanzhen0923/BayesianOptimization/blob/master/examples/database_interactions.ipynb).

I also updated the print_summary function, so that the historical data can be view at any time, as long as the bo object has loaded them.

codecov-io · 2018-02-17T15:07:02Z

Codecov Report

Merging #87 into master will increase coverage by 4.58%.
The diff coverage is 88.13%.

@@            Coverage Diff             @@
##           master      #87      +/-   ##
==========================================
+ Coverage   82.87%   87.46%   +4.58%     
==========================================
  Files           4        5       +1     
  Lines         292      351      +59     
  Branches       35       38       +3     
==========================================
+ Hits          242      307      +65     
+ Misses         45       40       -5     
+ Partials        5        4       -1

Impacted Files	Coverage Δ
bayes_opt/helpers.py	`90.12% <ø> (ø)`	⬆️
bayes_opt/bayesian_optimization.py	`77.61% <78.94%> (+11.52%)`	⬆️
bayes_opt/db_manager.py	`92.5% <92.5%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 317f735...2773ef1. Read the comment docs.

yanzhen0923 · 2018-02-17T15:19:53Z

0800314 is my puppet id, sry I did not notice that when I sent this request.

fmfn · 2018-02-21T19:16:04Z

Interesting, I like the idea, and thanks for the PR. Someone recently was asking about a more general logger, and this is a good step in that direction. Give me some time to thing about how this all play out.

yanzhen0923 · 2018-02-22T10:17:48Z

Well I actually had problems while I was using this library - don't know where to get the summary log, cannot resume the previous (very costly)experiment when it stops accidentally, then I came up with the idea of this PR.

Maybe the description above is not quite sufficient to tell what is going on, I think I could be nice if I do some more explanations here...

Update on print_summary() - mainly in helpers.py
This function was originally not implemented. I was trying to resemble the style in the original code printing the 'Initialization' part and the 'Bayesian Optimization' part, especially the color and indention/padding style. I pass space.X and space.Y from bayesian_optimization.py for printing, then the results are sorted before print. I did not print the 'Step' and 'Time' though, since they would only make sense during the BO process.
Add database support - add new file db_manager.py, add new APIs to bayesian_optimization.py.
In db_manager.py, I use sqlalchemy to talk to the database. This is able to talk many databases, especially it supports SQLite - a lightweight database that does not require any configuration, which is simple for experiments. Of course, if someone already has had a database with configurations, just change the style of the connection string. They are all well written in sqlalchemy's doc.

I did four functions inside db_manager.py - init_db, load, save and clear, they are wrapped as APIs in bayesian_optimization.py. The idea is that, using a pre-defined table called 'target_space_table' to store the data.

The connection string is passed to the bo.init_db API as bo.init_db(conn_str), the database is specified after this.

Whenever the experiment is finished, if the user calls bo.save(), the database manager would obtain space.X and space.Y, then pickle them as objects, then save them into the database table.
In this table, there is always 1 row, 3 columns, since multi-dimensional data are pickled as objects. One column for primary key(id), one for space.X and one for space.Y.

If the user calls bo.load(), db_manager.load() would return objects for initialization, then they are added into the BO object as initialization information after calling bo.inititialze(init_objects) and bo.maximize(init_points=0, n_iters=0) .

The bo.clear() API just a way to remove all the previous data programmatically.

The usage including exception handling is written in examples/database_interactions.ipynb.

I also added a test for db_manager.py to assert that saving, loading, then maximizing same steps of the same function periodically would increase the bo initialization size regularly by the number of steps.

Hope that helps!

fmfn · 2018-07-06T21:28:17Z

Sorry for the extreme delay, but I'm finally getting around making some well overdue updates to this library as well and addressing some PRs.

The mindset here was always to keep this library as barebones as possible. More robust alternatives exists for those in need (such as spearmint and others), so I try to keep features and complexity to a minimum here. Nevertheless the need for proper logging and the ability to pick things back up where they stopped can't be ignored.

While I really the code, I believe a simpler approach would be more in line with this library's style. In another PR someone introduced the notion of a callback, which I believe might be the way to go. A genetic object that gets called at every step. I think I will pursue that avenue and not merge this PR as is.

I'll leave this PR open for the time being and I will ping you once progress has been made with callbacks to see how we can provide a DB callback by default.

yanzhen0923 added 4 commits February 17, 2018 15:18

add database support, update print summary

6263cb7

update db manager test

402b5af

remove test db

0bd2280

update setup.py, add sqlalchemy requirement

2773ef1

fmfn self-requested a review February 21, 2018 19:13

fmfn closed this Nov 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

add database support, update print summary #87

add database support, update print summary #87

Uh oh!

0800314 commented Feb 17, 2018

Uh oh!

codecov-io commented Feb 17, 2018 •

edited

Loading

Uh oh!

yanzhen0923 commented Feb 17, 2018

Uh oh!

fmfn commented Feb 21, 2018

Uh oh!

yanzhen0923 commented Feb 22, 2018

Uh oh!

fmfn commented Jul 6, 2018

Uh oh!

Uh oh!

add database support, update print summary #87

add database support, update print summary #87

Uh oh!

Conversation

0800314 commented Feb 17, 2018

Uh oh!

codecov-io commented Feb 17, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

yanzhen0923 commented Feb 17, 2018

Uh oh!

fmfn commented Feb 21, 2018

Uh oh!

yanzhen0923 commented Feb 22, 2018

Uh oh!

fmfn commented Jul 6, 2018

Uh oh!

Uh oh!

codecov-io commented Feb 17, 2018 •

edited

Loading