Parametrically Constrained Spectrum Analysis

This document is an exploration of a proposed extension of the Genetic Algorithm to handle the case where multiple parameters may be varied in determining models (genes) to evaluate.

In the present use of GA, genes are constructed by varying sedimentation coefficient (s) and frictional ratio (k) values within buckets. The mutations of genes involve direct random variations of s and/or k.

In what we may call Multiple Parameter Genetic Algorithm (MPGA), it is assumed that there exists a function k = f( s ) where function "f" can use multiple parameters (say A, B, C, D). The random variations in this case involve varying the values of A through D. Thus s and k are indirectly varied.

The high level form of MPGA is the same as GA. That is, a population of genes (a deme) undergoes mutations. After each round of mutations (a generation), the genes are assigned fitness values by astfem_rsa/NNLS determination of the variance value of the residual between experiment and simulation with the model. A ranking of genes by fitness allows a gradual convergence towards the best models.

It is at the lower level of the specifics of mutation that MPGA differs from GA. That is, the mutation by varying s and k is indirect in the MPGA case. What is varied randomly are the multiple parameters (e.g., A through D). A specific function is used in determining the resulting changes to s and k. Where GA buckets are 2-dimensional (ranges of s and k), the MPGA buckets are multi-dimensional (ranges of each parameter A through D).

One example ( c(s) )

One example of MPGA is the cofdistro algorithm in UltraScan II. In this case, the parameters are par_a, par_b, and par_c. The function using these parameters is

k = par_a + par_b/(1 + par_c * s)

The variations in a,b,c depend on a "shape_distro" setting of:

  • 1 - aggregating fibrils
  • 2 - aggregating clathrin
  • 3 - large aggregates are globular
  • 4 - slowly increasing shape

The s value varies linearly from s-min to s-max by an increment dependent on a "resolution" parameter.

c(s) Mutations

In the classic GA case, a deme consists of a set number of genes (default 100). Each gene in that population has as many solutes as there are buckets. A mutation of a single gene involves random selection of buckets (solutes) and random variations of s and/or k within the solute.

In the c(s) type of MPGA, there are also a set number of genes in a deme. For each gene, random selection of par_a, par_b, par_c is made initially. The number of solutes in a gene is equal to the number of points on the k=f(s) curve; that is, equal to the "resolution" setting. A mutation of a gene involves random variation of par_a and/or par_b and/or par_c and, therefore, affects every solute of the gene. The range of s values and the number of resolution points is fixed for all genes of all demes. The range of k values is dependent on f(s), but is limited to reasonable values (e.g., k-min>=1.0).

The final output of such a c(s) run is the single model with the best fitness value. It would consist of solutes whose s and k values all fall on a single k=f(s) curve.

Last modified 6 years ago Last modified on Jul 8, 2014 11:14:17 PM