javadbchem Wiki

A universal chemistry database system, using Java and any rdbms

Performance

In order to give some idea about the performance, I give some figures measured on a machine with a dual core Intel E5300 2.6 GHz CPU and 4 GB of RAM. On this, a database with 41921 molecule entries (coming from nmrshiftdb2) was used.

A substructure search with Pyrolle took 3 sec altogether. Notice the critical step is the exact search: The prefiltering (fingerprint based) is something like 10 ms. This step gives 3657 entries. In order to get 300 actual structures (300 being the cut-off) 637 entries had to be filtered. The cut-off is crucial if you have very common substructures, you might need to go through large numbers of structures if you have it too high (plus in most cases long result lists will not be of any interest).
A substructure search with a steroid skeleton takes 11s, 241 hits are found by fingerprint, of these 233 are actual hits.
A substructure search with Cholesterol takes 36s, 210 hits found, 74 of these are actual hits.
A substructure search for a complete structure in the database (Brujavanone A) takes around 50 ms - only one structure found by fingerprint, only one structure needs to be screened.

So times vary. They depend on the following factors a) the size of the substructure to search (larger means the subgraph isomorphism is slower) b) the specificity of the substructure ("unusual" bits will give a better prefiltering) and c) on the cut-off. Experience shows that in practice the algorithm works well, but you can always found slow cases (I think this is the case with any substructure search).

Exact structure searches are fast and practically in constant time, since the SMILES is used (in theory the SMILES generation takes longer with larger structures, but this is in the millisecond range).

The similarity searches are also fast, since they only use the prefiltering step (milliseconds in the mentioned setup) is used.

Inserting the mentioned 41921 structures into the database from an sdf took around 40 minutes. This gives around a thousand structures per minute. Time includes time for reading the sdf, generating InChI and (chiral) smiles and calculating weight.

javadbchem Wiki

A universal chemistry database system, using Java and any rdbms

Performance

Related