javadbchem Wiki

A universal chemistry database system, using Java and any rdbms

HowTo Usage

Once you have generated the code as described in the how-to on code generation, you can put the code to use in your actual chemistry project. You need to build this indepently from javadbchem, which just generates jar files and sql scripts for you. A few remarks on the integration here:

Database: javadbchem uses an underlying relational database. If you use a database in your project as well, you can either make one database or run separate databases (or schemas or whatever they are called in the rdbms). If you run them separately, you can access them separately and do not need to worry about the interaction. However, if you want stuff in one database (this is recommended if you want to have references to molecules, atoms or bonds in your entities) then you must integrate things. You can do this by adding your entities to the example-schema.xml file described in the how-to on code generation. You could also change the sql scripts generated by Torque or write a second sql script, but changing the example-schema.xml might be easiest.
Torque: As explained before, Torque is used for accesing the database. Torque is a simple, but (I find) efficient OR mapper. It maps each row of a table to an object, so instead of reading rows from the MOLECULE table you retrieve DBMolecule objects (all Torque objects are prefixed with DB in javadbchem, to avoid confusion). If you want to use javadbchem, you have to use torque (this is because the chemical code is in the generated torque classes). You can still access your tables in any other way and you could even access the javadbchem tables directly, but his might lead to trouble. You should never do it in parallel, so loading a DBMolecule object, keeping this in memory, updating the table directly and then reading from the DBMolecule object will lead to trouble.
CDK: By default, a copy of the Chemistry Development Kit is included with javadbchem and the generated code offers use of it. You do not need to use CDK, this will be explained further down.
For your application to run you need to make sure that the database system is installed on the final target system, that the sql scripts are run as required (see below) and the tanimoto similarity functio (see below) is installed. You must integrate this into your installation procedure.

So, here is the actual how-to. These steps should be followed:

Create your database using the generated sql scripts. In your database (or schema) run src/sql/<yourprojectname>-schema.sql. For mysql it might be necessary to do a SET FOREIGN_KEY_CHECKS=0 before. This creates tables etc. We recommend to use mysql 5.5 or later, InnoDB is then the standard engine and you have foreign keys defined. If you use an older mysql version, best is to add a InnoDB as engine to the sql script before using it. Then run src/sql/populate-mysql.sql. This creates the table used for primary key generation in Torque. You can also use other methods (sequences etc.), check Torque documentation. populate-mysql.sql should also work with other rdbms, perhaps scripts will be added once other rdms have been tried.

Put the required jars in your project. You should have a dist/<yourprojectname>.jar file. Put this and all the jars from lib into your actual project.

Put torque.properties file in a location which is accessible from your source code. All configuration for Torque (this includes the database connection parameters) is a properties file. A sample file can be found in test/net/sf/javadbchem/test. The section TORQUE PROPERTIES is the important one. Notice the "molecules" bit in the properties names is only for the properties, the name of the actual database is in the torque.dsfactory.molecules.connection.url property. The configuration in here must fit with a) what you used when generating Torque files (database type) and b) the configuration of your database instance.

Write actual code. Now you can start producing actual code in your project which writes/reads/searches/structures. You need to initialize the Torque system by calling the Torque.init() method once. This is a static method which takes the location of the torque.properties file as a parameter. If this method has been called once, all calls to torque will access the database configured as torque.database.default in the torque.properties file (you can actually configure several databases in there, see Torque documentation for details. For our purpose, molecules database is default and we use this). The class test/net/sf/javadbchem/test/BaseTests.java shows the possible operations. They are also explained here:

Save structure: If you call a line like: DBMolecule dbmol1 =DBMoleculePeer.saveMolecule(mol, new boolean[mol.getBondCount()], "1-1-1", new String[]{"Test"}); the structure in mol is saved to the database. mol in this case must be an instance of the cdk IAtomContainer interface. If the structure does not yet exist, it will be saved to the database. If it is already there (see the page on structure identity to see what this means) the existing entry for that structure will be saved. So you will not get double entries. In addition to the structure, the following parameters are given:
- boolean[mol.getBondCount()]: This is for identifying double bond configurations (see structure identity). We have all false here.
- "1-1-1": A CAS number. You can also pass "" or null, which means no CAS number. The cas number will be saved with the molecule information, but it will not be used for structure identification in any way.
- new String[]{"Test"}: An array of trivial names. Again, they will be saved, but not used for identification.

The method returns a DBMolecule object. This is a Torque objects representing one line in your MOLECULE table. You can retrieve the fields (specifically, getMoleculeId() will give you the primary key) and you can use convenience methods provided. The methods getAsCDKMoleculeX() and getAsMolFileX() allow you to retrieve the structure. They exist in various versions, see JavaDoc for details. Such an object will be the result of any search described below, and you normally work with these objects and retrieve the structure in your format from them.

The saveMolecule() method also exists in a version with a String instead of the IAtomContainer. It works the same way, but you do not need a CDK object. Together with the getAsMolFile methods you can work without CDK if you like. If you use CDK anyway, we recommend that you use your objects directly.

Have a look at the page on structure identity to see how implicit hydrogens, aromaticity and normalization are handled.

Retrieve structures: The DBMoleculePeer object contains various static methods to retrieve structures (i. e. DBMolecule objects). Note that there are also methods which are provided by Torque, mainly for retrieving by primary key. Since they do not have a chemical meaning, you should not normally use them. You can also change the DBMolecule objects (or any other value in the database), but this is not recommended, since consistency is in danger.
- Retrieve single structure: getMolecule(IAtomContainer, boolean[]) or getMolecule(String, boolean[]). Parameters are as explained for Save structure. This method returns null if the structure does not exist. Alternatively the method searchMolecule(IAtomContainer, boolean[], StructureSearchTypeEnum, int) with mode EXACT can be used. This will either return one result or an empty list.
- Substructure/Superstructure search: Use the method searchMolecule(IAtomContainer, boolean[], StructureSearchTypeEnum, int) with modes SUBSTRUCTURE and SUPERSTRUCTURE for this. It will return all structures of which the input structure is a substructure or a superstructure respectivly. IAtomContainer and boolean[] parameters are as in Save Structure. The int gives a maximum number of hits to return. You should set this to a reasonable value (e. g. a few hundred) since if there is no limit, a search with e. g. a single carbon will return the whole database. This will take long and be of no value to the user. You can use Integer.MAX_VALUE to do an unlimited search. The method will use a fingerprint prefiltering, it should work at reasonable speed (see performance page for details).
- Similarity: The method searchMolecule(IAtomContainer, boolean[], StructureSearchTypeEnum, int) also does similarity search. SIMILARITY mode calculates a tanimoto similarity of the fingerprints of the search structure and the database structures. Structures are ordered by similarity value, the similarity of a structure can be retrieved by the getSimilarity() method of DBMolecule. The limit value will cut off after the highest ranked structures. SUBSTRUCTURE_SIMILARITY mode does the same but restricts search to structures of which the search structure is a substructure, i. e. it will return the same results as SUBSTRUCTURE mode, but results are ordered by tanimoto similarity. The identical structure should come first, then those which have progressively larger. Note that the tanimoto similarity calculation is currently done by a udf in mysql. This is the only bit if code which is probably not easy to port to other rdbms. In order to use the tanimoto similarity in mysql, you need to install the udf (if not done, you will get an sql error). See tanimoto page for details.
- The method searchMolecule() also exists with a molfile (String) as first parameter. Works the same as searchMolecule() with the IAtomContainer.

javadbchem Wiki

A universal chemistry database system, using Java and any rdbms

HowTo Usage

Related