Calculate gradient of the spectral norm analytically
$begingroup$
Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:
$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$
How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?
I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?
linear-algebra optimization matrix-calculus gradient-descent spectral-norm
$endgroup$
add a comment |
$begingroup$
Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:
$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$
How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?
I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?
linear-algebra optimization matrix-calculus gradient-descent spectral-norm
$endgroup$
$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08
$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13
$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53
add a comment |
$begingroup$
Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:
$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$
How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?
I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?
linear-algebra optimization matrix-calculus gradient-descent spectral-norm
$endgroup$
Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:
$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$
How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?
I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?
linear-algebra optimization matrix-calculus gradient-descent spectral-norm
linear-algebra optimization matrix-calculus gradient-descent spectral-norm
edited Jan 17 at 14:10
abina shr
asked Jan 17 at 13:09
abina shrabina shr
698
698
$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08
$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13
$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53
add a comment |
$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08
$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13
$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53
$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08
$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08
$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13
$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13
$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53
$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53
add a comment |
1 Answer
1
active
oldest
votes
$begingroup$
Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.
Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB: The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]
Define the variables
$$eqalign{
B &= {rm Diag}(b) cr
X &= F^HBF-A cr
}$$
Given the SVD of $X$
$$eqalign{
X &= USV^H cr
U &= big[,u_1,u_2 ldots u_n,big],,&u_k&in{mathbb C}^{mtimes 1} cr
S &= {rm Diag}(sigma_k),&S&in{mathbb R}^{ntimes n} cr
V &= big[,v_1,v_2 ldots v_n,big],&v_k&in{mathbb C}^{ntimes 1} cr
}$$
where the $sigma_k$ are ordered such that $,,sigma_1>sigma_2geldotsgesigma_nge 0$
The gradient of the spectral norm $phi = |X|_2$ can be written as
$$G = frac{partialphi}{partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$
To find the gradient wrt the vector $b$, write the differential and perform a change of variables.
$$eqalign{
dphi &= G:dX cr
&= G:F^H,dB,F cr
&= F^C GF^T:dB cr
&= F^C GF^T:{rm Diag}(db) cr
&= {rm diag}big(F^CGF^Tbig):db cr
frac{partialphi}{partial b}
&= {rm diag}big(F^CGF^Tbig) cr
&= {rm diag}big((Fu_1)^C(Fv_1)^Tbig) cr
}$$
$endgroup$
$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12
1
$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26
add a comment |
Your Answer
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3076947%2fcalculate-gradient-of-the-spectral-norm-analytically%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
$begingroup$
Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.
Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB: The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]
Define the variables
$$eqalign{
B &= {rm Diag}(b) cr
X &= F^HBF-A cr
}$$
Given the SVD of $X$
$$eqalign{
X &= USV^H cr
U &= big[,u_1,u_2 ldots u_n,big],,&u_k&in{mathbb C}^{mtimes 1} cr
S &= {rm Diag}(sigma_k),&S&in{mathbb R}^{ntimes n} cr
V &= big[,v_1,v_2 ldots v_n,big],&v_k&in{mathbb C}^{ntimes 1} cr
}$$
where the $sigma_k$ are ordered such that $,,sigma_1>sigma_2geldotsgesigma_nge 0$
The gradient of the spectral norm $phi = |X|_2$ can be written as
$$G = frac{partialphi}{partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$
To find the gradient wrt the vector $b$, write the differential and perform a change of variables.
$$eqalign{
dphi &= G:dX cr
&= G:F^H,dB,F cr
&= F^C GF^T:dB cr
&= F^C GF^T:{rm Diag}(db) cr
&= {rm diag}big(F^CGF^Tbig):db cr
frac{partialphi}{partial b}
&= {rm diag}big(F^CGF^Tbig) cr
&= {rm diag}big((Fu_1)^C(Fv_1)^Tbig) cr
}$$
$endgroup$
$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12
1
$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26
add a comment |
$begingroup$
Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.
Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB: The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]
Define the variables
$$eqalign{
B &= {rm Diag}(b) cr
X &= F^HBF-A cr
}$$
Given the SVD of $X$
$$eqalign{
X &= USV^H cr
U &= big[,u_1,u_2 ldots u_n,big],,&u_k&in{mathbb C}^{mtimes 1} cr
S &= {rm Diag}(sigma_k),&S&in{mathbb R}^{ntimes n} cr
V &= big[,v_1,v_2 ldots v_n,big],&v_k&in{mathbb C}^{ntimes 1} cr
}$$
where the $sigma_k$ are ordered such that $,,sigma_1>sigma_2geldotsgesigma_nge 0$
The gradient of the spectral norm $phi = |X|_2$ can be written as
$$G = frac{partialphi}{partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$
To find the gradient wrt the vector $b$, write the differential and perform a change of variables.
$$eqalign{
dphi &= G:dX cr
&= G:F^H,dB,F cr
&= F^C GF^T:dB cr
&= F^C GF^T:{rm Diag}(db) cr
&= {rm diag}big(F^CGF^Tbig):db cr
frac{partialphi}{partial b}
&= {rm diag}big(F^CGF^Tbig) cr
&= {rm diag}big((Fu_1)^C(Fv_1)^Tbig) cr
}$$
$endgroup$
$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12
1
$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26
add a comment |
$begingroup$
Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.
Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB: The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]
Define the variables
$$eqalign{
B &= {rm Diag}(b) cr
X &= F^HBF-A cr
}$$
Given the SVD of $X$
$$eqalign{
X &= USV^H cr
U &= big[,u_1,u_2 ldots u_n,big],,&u_k&in{mathbb C}^{mtimes 1} cr
S &= {rm Diag}(sigma_k),&S&in{mathbb R}^{ntimes n} cr
V &= big[,v_1,v_2 ldots v_n,big],&v_k&in{mathbb C}^{ntimes 1} cr
}$$
where the $sigma_k$ are ordered such that $,,sigma_1>sigma_2geldotsgesigma_nge 0$
The gradient of the spectral norm $phi = |X|_2$ can be written as
$$G = frac{partialphi}{partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$
To find the gradient wrt the vector $b$, write the differential and perform a change of variables.
$$eqalign{
dphi &= G:dX cr
&= G:F^H,dB,F cr
&= F^C GF^T:dB cr
&= F^C GF^T:{rm Diag}(db) cr
&= {rm diag}big(F^CGF^Tbig):db cr
frac{partialphi}{partial b}
&= {rm diag}big(F^CGF^Tbig) cr
&= {rm diag}big((Fu_1)^C(Fv_1)^Tbig) cr
}$$
$endgroup$
Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.
Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB: The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]
Define the variables
$$eqalign{
B &= {rm Diag}(b) cr
X &= F^HBF-A cr
}$$
Given the SVD of $X$
$$eqalign{
X &= USV^H cr
U &= big[,u_1,u_2 ldots u_n,big],,&u_k&in{mathbb C}^{mtimes 1} cr
S &= {rm Diag}(sigma_k),&S&in{mathbb R}^{ntimes n} cr
V &= big[,v_1,v_2 ldots v_n,big],&v_k&in{mathbb C}^{ntimes 1} cr
}$$
where the $sigma_k$ are ordered such that $,,sigma_1>sigma_2geldotsgesigma_nge 0$
The gradient of the spectral norm $phi = |X|_2$ can be written as
$$G = frac{partialphi}{partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$
To find the gradient wrt the vector $b$, write the differential and perform a change of variables.
$$eqalign{
dphi &= G:dX cr
&= G:F^H,dB,F cr
&= F^C GF^T:dB cr
&= F^C GF^T:{rm Diag}(db) cr
&= {rm diag}big(F^CGF^Tbig):db cr
frac{partialphi}{partial b}
&= {rm diag}big(F^CGF^Tbig) cr
&= {rm diag}big((Fu_1)^C(Fv_1)^Tbig) cr
}$$
edited Jan 18 at 22:08
answered Jan 17 at 18:29
greggreg
9,3361825
9,3361825
$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12
1
$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26
add a comment |
$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12
1
$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26
$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12
$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12
1
1
$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26
$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3076947%2fcalculate-gradient-of-the-spectral-norm-analytically%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08
$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13
$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53