Opened 7 years ago

# PCSA Tikhonov Regularization

Reported by: Owned by: emre gegorbet major future ultrascan3 review

NNLS solves min |Ax-b|_2
For Tikhonov Regularization (TR)
you additionally want to minimize |x|_2
Add a variable regularization factor "alpha"
After normal setup for NNLS
The matrix A will have the simulations in each column and b will have the experimental data.
Identify the simulations with numbers, 1,2,...,c
create the c by c identity matrix multiplied by alpha and place below A, creating an augmented matrix.
augment b with c zeros

e.g.

A = [ sim1, sim2, ... , simc ] where each sim is a column vector
A_augmented =
[

sim1, sim2, ..., simc
alpha, 0, ..., 0
0, alpha, 0, ..., 0
...
0, 0, ..., alpha

]

b_augmented =
[

b
0
0
0
...
0

]

Place this system into NNLS
(this will not work with subgrids* [we could later, but this will require some careful work], you must solve the one big NNLS system,
that's why I recommend for PCSA right now only).

Once you get the results,
you will want to vary alpha
and produce the parametric (on alpha) plot of
|x|=x_02 + x_12 + ... + x_c2 vs rmsd2.
This will often give an L shaped curve, the sharpness of which will vary from problem-to-problem. The idea is to minimize the rmsd while simultaneously minimizing the norm of x.
The regularization value at the elbow of the L is called the L curve criterion and is frequently chosen as the optimal value of alpha.
This second bit of determining alpha should be available as an option and particularly as we are getting a handle on good alphas.
It may turn out that alpha falls in a good range, but I feel this will likely be somewhat experiment dependent.
Note: alpha=0 is like doing no regularization.
alpha = 1 is probably a reasonable maximum.
I would suggest starting with alpha = 0, .1, .2, ..., 1 and plot th e parametric curve described above.

Let me know if you have any questions.
-e.

### comment:1 Changed 7 years ago by emre

• Description modified (diff)
• Owner changed from dzollars to gegorbet

### comment:2 follow-up: ↓ 3 Changed 7 years ago by emre

• Description modified (diff)

### comment:3 in reply to: ↑ 2 ; follow-up: ↓ 5 Changed 7 years ago by gegorbet

The method for TR with one specific alpha is understood. The initial questions are about how this fits into PCSA. Currently, PCSA first computes and orders results for a fixed set of PC variations. Next, it uses Levenberg-Marquardt to refine the best result from stage one. Would the TR method be a stage three? Would the B matrix be constructed from the final stage-two (LM) result?

As a test of the basic NNLS-TR coding, can it be assumed that any arbitrary alpha (say, 0.4) would give better results than non-regularization (alpha=0)?

### comment:4 Changed 7 years ago by gegorbet

• Status changed from new to assigned

### comment:5 in reply to: ↑ 3 ; follow-up: ↓ 6 Changed 7 years ago by emre

I initially envisioned the usage of TR uniformly (i.e. at every NNLS stage). This could be optional and then you run an additional final NNLS with TR, but I think the best solution would be to do it at every NNLS instance and leave the rest of the code alone.
This because you will get different RMSDs at every NNLS with TR on, and the minimum may move. This is not certain, and there may be
some computational efficiency to postpone TR until the last instance (as a 3rd stage), but it seems possible you may get a different regularized solution.

No, the RMSD will go UP with the additional constraint on the magnitude of NNLS result vector x. What is gained is smoothness
in the vector x, the sharp peaks will be penalized and this is
the effect of TR.

Would the B matrix be constructed from the final stage-two (LM) result?

b is a vector (containing the experimental data augmented with zeros when TR is on).
A is a matrix (containing the simulations augmented with the identity matrix multiplied by alpha when TR is on).
To do a stage three computation (only done when TR is not used for NNLS during stage one and two), you would take the standard b vector, augment with zeros, take best last PCSA A matrix, augument with the identity time alpha and run NNLS again. Your resulting x vector of concentrations would be different.

### comment:6 in reply to: ↑ 5 Changed 7 years ago by gegorbet

I initially envisioned the usage of TR uniformly (i.e. at every NNLS stage). This could be optional and then you run an additional final NNLS with TR, but I think the best solution would be to do it at every NNLS instance and leave the rest of the code alone.
This because you will get different RMSDs at every NNLS with TR on, and the minimum may move. This is not certain, and there may be
some computational efficiency to postpone TR until the last instance (as a 3rd stage), but it seems possible you may get a different regularized solution.

Wouldn't that mean that you would eventually have to construct an outer loop of alpha values and the whole process get repeated for each one? This is going to take a long time. Also: won't we have to change the external routine for returning "fitness" for L-M? Replace low-RMSD with low-X-Norm+low-RMSD?

No, the RMSD will go UP with the additional constraint on the magnitude of NNLS result vector x. What is gained is smoothness
in the vector x, the sharp peaks will be penalized and this is
the effect of TR.

Wow! I don't quite understand this. Minimizing RMSD is no longer the goal? A solution which has a larger RMSD might be regarded as a "better" one? If you actually have just two macro-molecules of interest, you don't want to see sharp peaks representing them?

Would the B matrix be constructed from the final stage-two (LM) result?

b is a vector (containing the experimental data augmented with zeros when TR is on).
A is a matrix (containing the simulations augmented with the identity matrix multiplied by alpha when TR is on).
To do a stage three computation (only done when TR is not used for NNLS during stage one and two), you would take the standard b vector, augment with zeros, take best last PCSA A matrix, augument with the identity time alpha and run NNLS again. Your resulting x vector of concentrations would be different.

I, or course, meant "A". That is, I meant to ask "Would the A matrix be constructed from stage-two result?". And you have answered my question.

### comment:7 Changed 7 years ago by gegorbet

Tikhonov regularization has been implemented in PCSA. It has just been committed. It needs a bit more use and independent verification of its correctness and utility.