OpenModeller Maxent

From OpenBio Wiki

Jump to: navigation, search

Introduction

The first implementation of a Maximum Entropy algorithm in openModeller (OM-MAXENT) was released in the 0.6 version in 2008. This implementation was based on an existing third-party library and the resulting models did not have the same quality as those produced by other algorithms. Since then, the algorithm was re-written based on a Matlab code provided by S. Phillips, and many other versions were later released. Most of this work was performed by Elisângela Rodrigues as part of her Doctorate. The resulting models improved considerably comparing with the first version, but unfortunately they were still different when compared with the Maxent software.

This page describes the activity under OpenBio to produce yet another version of OM-MAXENT, but this time compatible with the original Maxent software. Compatibility here means acceptable differences, as we are dealing with a complex algorithm being implemented by different software. It is also important to note that not all functionality available from Maxent was implemented in this new version - in particular the jackknifing tool, the collecting bias input and the possibility of using categorical maps are currently not available in OM-MAXENT, as Use Case 2 only needs linear, quadratic, product, threshold, hinge and autofeatures parameters. Therefore, compatibility considering only these parameters was the focus of the new version.

Methods

The strategy for achieving compatibility involved the following steps: 1) Finding a way to guarantee that OM-MAXENT and Maxent are run with exactly the same input; 2) Defining criteria to compare the output produced by the two implementations, including logs and maps; 3) Making successive adjustments to the openModeller code; 4) Comparing the results until an acceptable difference is reached. Being a deterministic algorithm, the task of comparing results was greatly facilitated.

The first step required using the same algorithm parameters, input points, environmental layers and background points. A standard experiment based on the example data provided by openModeller was created: 65 presence points of Thalurania furcata boliviana and 2 environmental layers (rain_coolest and temp_avg). The layers had to be converted into the ArcInfo ASCII Grid format supported by Maxent. Ten thousand background points were previously sampled and the corresponding environmental values were retrieved with om_sampler. To make sure that both software used the same background points, they had to be provided to openModeller as absence points (through the parameter "Use absences as background") and they had to be provided to Maxent with the parameter "-e".

The following criteria were used for comparing results:

  1. Correlation (r) between the produced maps.
  2. Number of iterations.
  3. Proportion of matching best features selected after each iteration.
  4. Final loss .

The next step was to change OM-MAXENT so that both software produced a similar log (actually OM-MAXENT had to produce a more verbose log to help identifying potential issues).

The Maxent version used in this work was 3.3.3e, and the Java VM version was 1.7.0.03

Compatibility was achieved in three stages:

The following differences were in principle considered acceptable for this work: minimum correlation of 0.98, maximum difference of 3 iterations, at least 90% matching features, and a maximum difference of 0.01 in the final loss.

Results

Level 1 compatibility was achieved in revision #5468 (openModeller Subversion repository).

Level 2 compatibility was achieved in revision #5506 (openModeller Subversion repository).

Level 3 compatibility was achieved in revision #5519 (openModeller Subversion repository).

Additional adjustments were made after that, so the latest revision recommended for tests is #5529.

Autofeatures OFF
Parameters Number of iterations Proportion of matching best features Difference in final loss Correlation between the maps Observations
Maxent OM-MAXENT
linear 60 60 98.33% 0.0 1 only the last two best features didn't match due to differences in the 15th decimal place of the delta loss bound.
linear
quadratic
201 201 100% 0.0 1
linear
product
60 60 100% 0.0 0.9999983
linear
quadratic
product
180 180 100% 0.0 1
linear
threshold
161 161 100% 0.0 1
linear
quadratic
threshold
420 420 100% 0.0 1
linear
product
threshold
141 141 100% 0.0 1
linear
quadratic
product
threshold
161 161 100% 0.0 1
linear
hinge
420 420 100% 0.0 1
linear
quadratic
hinge
441 441 100% 0.0 1
linear
product
hinge
341 341 100% 0.0 1
linear
threshold
hinge
420 420 100% 0.0 1
linear
quadratic
threshold
hinge
500 500 100% 0.0 1
linear
product
threshold
hinge
341 341 100% 0.0 1
linear
quadratic
product
hinge
341 341 100% 0.0 1
linear
quadratic
product
threshold
hinge
341 341 100% 0.0 1
Autofeatures ON
Parameters Number of iterations Proportion of matching best features Difference in final loss Correlation between the maps Observations
Maxent OM-MAXENT
linear 60 60 98.33% 0.0 1
linear
quadratic
201 201 100% 0.0 1
linear
product
60 60 98.33% 0.0 1
linear
quadratic
product
201 201 100% 0.0 1
linear
threshold
60 60 98.33% 0.0 1
linear
quadratic
threshold
201 201 100% 0.0 1
linear
product
threshold
60 60 98.33% 0.0 1
linear
quadratic
product
threshold
201 201 100% 0.0 1
linear
hinge
420 420 100% 0.0 1
linear
quadratic
hinge
441 441 100% 0.0 1
linear
product
hinge
420 420 100% 0.0 1
linear
threshold
hinge
420 420 100% 0.0 1
linear
quadratic
threshold
hinge
441 441 100% 0.0 1
linear
product
threshold
hinge
420 420 100% 0.0 1
linear
quadratic
product
hinge
441 441 100% 0.0 1
linear
quadratic
product
threshold
hinge
341 341 100% 0.0 1

The same tests were performed with different numbers of input points as this can influence results, especially when autofeatures is activated. The following number of points were used in the extra tests: 10, 13, 15, 30, 60, 80 and 90. All tests resulted in correlations greater than 0.99, matching features greater than 90% and differences of no more than 2 iterations (these extra tests were performed with Java VM 1.5, when there's usually a greater difference in the number of iterations and matching features).

Personal tools
Wiki sections
Quality assurance