# Discrete Model Genetic Algorithm

The discrete model genetic algorithm is designed for arbitrary systems containing pauci-disperse solutions. It is used as a very robust nonlinear least squares fitting algorithm that is usable even for highly nonlinear problems and guarantees convergence without the typical problems of local minima associated with traditional gradient based nonlinear least squares approaches. The following systems are examples that can be modeled with this method:

### Model Types

• reversible associations
• mixed reversible associations including non-interacting contaminants
• molecules with constant vbar in a co-sedimenting solvent that forms a density and/or viscosity gradient
• compressible solvents
• concentration dependent non-ideality in the sedimentation and diffusion coefficients
• multiple discrete species with different vbar

## Discrete GA overview

The following is a description of the application of discrete GA in the backend US_MPI_Analysis program on HPC resources.

• The high-level discrete GA (discrGA) flow is very similar to normal GA (normGA).
• A set of genes is constructed (100 by default) in each of a set of populations (demes).
• Each gene is a definition of solutes that can be easily formed into a model.
• The fitness of a gene is determined by constructing the implied model, creating a simulation from that model, and computing the fitness by finding the variation (square of RMSD) from comparing the simulation to experimental data.
• At the end of each generation, genes are ordered by fitness and the most fit are communicated to a master that orders the best genes from all demes.
• After the final generation, the most fit gene is used to construct the model that is the output of GA or of a GA-MC iteration.
• Mutations of genes proceed according to analysis parameters that set percentages of mutation types and other controls, such as:
• fraction of genes that self-mutate;
• fraction of genes that cross-mutate;
• fraction of genes that succumb to plague (die, replaced by new genes);
• fraction of genes that are elite (best fits that remain unchanged);
• fraction of genes that emigrate and immigrate (move to other demes).
• Random numbers control mutation actions, such as which gene parts mutate and by how much.
• Discrete GA is initialized by reading a constraints model file that defines a base set of solutes; and defines which attributes may vary and by how much.
• Where normGA defines genes to have as many solutes as there are buckets and mutates by varying a randomly chosen solute by random amounts across X and/or Y; discrGA defines genes to have a set number of components (usually small, like 3) and mutates by varying randomly chosen combinations of floatable attributes within the components and associations.

## The mechanics of discrGA include the following actions

### Initialization

• At job parameter parsing time, a constraints model XML file is read.
• A library routine constructs a constraints object from the file.
• Each initial gene is constructed by starting with the implied base model and applying random variations along the ranges of all float attributes.

### Self-mutation

• A gene to mutate is randomly chosen.
• A random number from 0.0 to 1.0 is generated.
• That number is mapped to a number that specifies which of all possible combinations of floatable attributes may vary.
• Each of the implied currently selected floatable attributes has its value varied by a randomly chosen fraction of its range.
• A mutated gene is thus constructed.

### Cross-mutation

• A gene to mutate and another gene are randomly chosen.
• A combination of attributes to vary is randomly chosen as with self-mutation.
• The values of those attributes are taken from the first gene and all other attributes taken from the second gene to form a new cross-mutated gene.

### Plague, Elite, Immigration

• These function are performed in the same way as normGA.

### Fitness

• The current work model associated with each gene is converted to a full model with unspecified attribute values computed.
• The full model is used with ASTFEM to construct a simulation.
• As with normGA, the sum of difference squares between simulation and experiment data (variation) is computed and that variation value is the fitness value.
• The smallest fitness values are the best.
• The completion of each generation results in a list of genes sorted into best-to-worst fitness order.

In most other ways, discrGA proceeds like normGA.

## Details of float attribute variations

• For normGA, any mutation will involve one of three variations.
• A solute will correspond to a position in a bucket that has an X and a Y range.
• It may mutate by changing X, Y, or both X and Y.
• For discrGA, the mutation variations are more complex.
• If N is the number of float attributes across all components and associations; and
• M is the number of possible attribute variations; then
• M = (2 N) - 1 .
• For example, if there are 3 float attributes -- call them A, B, C -- then
• the number of possible attribute value changes in a mutation is 7 [ (2 3) - 1 ].
• There can be changes to A, B, C, A+B, A+C, B+C, or A+B+C.
• The initial random number governing value change variations is in the range [0.0, 1.0) .
• The value found can be mapped to a range, in our example, of [1.0, 8.0).
• That in turn can be fixed to be one of 1, 2, 3, 4, 5, 6, 7.
• Represented in binary, the choices are 001, 010, 011, 100, 101, 110, 111.
• The binary positions can then be associated with float attributes, so our variation flag is "CBA".
• 001 corresponds to A; 010 to B; 011 to A+B; and so on.
• Like with normGA, discrGA mutation
• chooses the new attribute value using a Gaussian distribution about the current value;
• and gradually biases the choice towards the current value in the later generations.
• For discrGA, some float attributes (MW, kD, k_off) can interpret ranges on a logarithmic scale.