Calculate gradient of the spectral norm analytically












1












$begingroup$


Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:



$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$



How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?



I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?










share|cite|improve this question











$endgroup$












  • $begingroup$
    If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
    $endgroup$
    – Aleksejs Fomins
    Jan 17 at 14:08












  • $begingroup$
    @AleksejsFomins Is there an analytic way to do this?
    $endgroup$
    – abina shr
    Jan 17 at 14:13










  • $begingroup$
    To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
    $endgroup$
    – Aleksejs Fomins
    Jan 17 at 14:53


















1












$begingroup$


Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:



$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$



How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?



I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?










share|cite|improve this question











$endgroup$












  • $begingroup$
    If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
    $endgroup$
    – Aleksejs Fomins
    Jan 17 at 14:08












  • $begingroup$
    @AleksejsFomins Is there an analytic way to do this?
    $endgroup$
    – abina shr
    Jan 17 at 14:13










  • $begingroup$
    To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
    $endgroup$
    – Aleksejs Fomins
    Jan 17 at 14:53
















1












1








1


2



$begingroup$


Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:



$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$



How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?



I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?










share|cite|improve this question











$endgroup$




Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:



$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$



How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?



I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?







linear-algebra optimization matrix-calculus gradient-descent spectral-norm






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Jan 17 at 14:10







abina shr

















asked Jan 17 at 13:09









abina shrabina shr

698




698












  • $begingroup$
    If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
    $endgroup$
    – Aleksejs Fomins
    Jan 17 at 14:08












  • $begingroup$
    @AleksejsFomins Is there an analytic way to do this?
    $endgroup$
    – abina shr
    Jan 17 at 14:13










  • $begingroup$
    To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
    $endgroup$
    – Aleksejs Fomins
    Jan 17 at 14:53




















  • $begingroup$
    If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
    $endgroup$
    – Aleksejs Fomins
    Jan 17 at 14:08












  • $begingroup$
    @AleksejsFomins Is there an analytic way to do this?
    $endgroup$
    – abina shr
    Jan 17 at 14:13










  • $begingroup$
    To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
    $endgroup$
    – Aleksejs Fomins
    Jan 17 at 14:53


















$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08






$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08














$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13




$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13












$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53






$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53












1 Answer
1






active

oldest

votes


















2












$begingroup$

Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.



Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB:  The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]



Define the variables
$$eqalign{
B &= {rm Diag}(b) cr
X &= F^HBF-A cr
}$$

Given the SVD of $X$
$$eqalign{
X &= USV^H cr
U &= big[,u_1,u_2 ldots u_n,big],,&u_k&in{mathbb C}^{mtimes 1} cr
S &= {rm Diag}(sigma_k),&S&in{mathbb R}^{ntimes n} cr
V &= big[,v_1,v_2 ldots v_n,big],&v_k&in{mathbb C}^{ntimes 1} cr
}$$

where the $sigma_k$ are ordered such that $,,sigma_1>sigma_2geldotsgesigma_nge 0$



The gradient of the spectral norm $phi = |X|_2$ can be written as
$$G = frac{partialphi}{partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$
To find the gradient wrt the vector $b$, write the differential and perform a change of variables.
$$eqalign{
dphi &= G:dX cr
&= G:F^H,dB,F cr
&= F^C GF^T:dB cr
&= F^C GF^T:{rm Diag}(db) cr
&= {rm diag}big(F^CGF^Tbig):db cr
frac{partialphi}{partial b}
&= {rm diag}big(F^CGF^Tbig) cr
&= {rm diag}big((Fu_1)^C(Fv_1)^Tbig) cr
}$$






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Nice proof. I found this post useful in understanding one of the steps better
    $endgroup$
    – Aleksejs Fomins
    Jan 18 at 17:12






  • 1




    $begingroup$
    @AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
    $endgroup$
    – greg
    Jan 18 at 22:26














Your Answer








StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3076947%2fcalculate-gradient-of-the-spectral-norm-analytically%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









2












$begingroup$

Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.



Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB:  The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]



Define the variables
$$eqalign{
B &= {rm Diag}(b) cr
X &= F^HBF-A cr
}$$

Given the SVD of $X$
$$eqalign{
X &= USV^H cr
U &= big[,u_1,u_2 ldots u_n,big],,&u_k&in{mathbb C}^{mtimes 1} cr
S &= {rm Diag}(sigma_k),&S&in{mathbb R}^{ntimes n} cr
V &= big[,v_1,v_2 ldots v_n,big],&v_k&in{mathbb C}^{ntimes 1} cr
}$$

where the $sigma_k$ are ordered such that $,,sigma_1>sigma_2geldotsgesigma_nge 0$



The gradient of the spectral norm $phi = |X|_2$ can be written as
$$G = frac{partialphi}{partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$
To find the gradient wrt the vector $b$, write the differential and perform a change of variables.
$$eqalign{
dphi &= G:dX cr
&= G:F^H,dB,F cr
&= F^C GF^T:dB cr
&= F^C GF^T:{rm Diag}(db) cr
&= {rm diag}big(F^CGF^Tbig):db cr
frac{partialphi}{partial b}
&= {rm diag}big(F^CGF^Tbig) cr
&= {rm diag}big((Fu_1)^C(Fv_1)^Tbig) cr
}$$






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Nice proof. I found this post useful in understanding one of the steps better
    $endgroup$
    – Aleksejs Fomins
    Jan 18 at 17:12






  • 1




    $begingroup$
    @AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
    $endgroup$
    – greg
    Jan 18 at 22:26


















2












$begingroup$

Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.



Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB:  The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]



Define the variables
$$eqalign{
B &= {rm Diag}(b) cr
X &= F^HBF-A cr
}$$

Given the SVD of $X$
$$eqalign{
X &= USV^H cr
U &= big[,u_1,u_2 ldots u_n,big],,&u_k&in{mathbb C}^{mtimes 1} cr
S &= {rm Diag}(sigma_k),&S&in{mathbb R}^{ntimes n} cr
V &= big[,v_1,v_2 ldots v_n,big],&v_k&in{mathbb C}^{ntimes 1} cr
}$$

where the $sigma_k$ are ordered such that $,,sigma_1>sigma_2geldotsgesigma_nge 0$



The gradient of the spectral norm $phi = |X|_2$ can be written as
$$G = frac{partialphi}{partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$
To find the gradient wrt the vector $b$, write the differential and perform a change of variables.
$$eqalign{
dphi &= G:dX cr
&= G:F^H,dB,F cr
&= F^C GF^T:dB cr
&= F^C GF^T:{rm Diag}(db) cr
&= {rm diag}big(F^CGF^Tbig):db cr
frac{partialphi}{partial b}
&= {rm diag}big(F^CGF^Tbig) cr
&= {rm diag}big((Fu_1)^C(Fv_1)^Tbig) cr
}$$






share|cite|improve this answer











$endgroup$













  • $begingroup$
    Nice proof. I found this post useful in understanding one of the steps better
    $endgroup$
    – Aleksejs Fomins
    Jan 18 at 17:12






  • 1




    $begingroup$
    @AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
    $endgroup$
    – greg
    Jan 18 at 22:26
















2












2








2





$begingroup$

Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.



Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB:  The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]



Define the variables
$$eqalign{
B &= {rm Diag}(b) cr
X &= F^HBF-A cr
}$$

Given the SVD of $X$
$$eqalign{
X &= USV^H cr
U &= big[,u_1,u_2 ldots u_n,big],,&u_k&in{mathbb C}^{mtimes 1} cr
S &= {rm Diag}(sigma_k),&S&in{mathbb R}^{ntimes n} cr
V &= big[,v_1,v_2 ldots v_n,big],&v_k&in{mathbb C}^{ntimes 1} cr
}$$

where the $sigma_k$ are ordered such that $,,sigma_1>sigma_2geldotsgesigma_nge 0$



The gradient of the spectral norm $phi = |X|_2$ can be written as
$$G = frac{partialphi}{partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$
To find the gradient wrt the vector $b$, write the differential and perform a change of variables.
$$eqalign{
dphi &= G:dX cr
&= G:F^H,dB,F cr
&= F^C GF^T:dB cr
&= F^C GF^T:{rm Diag}(db) cr
&= {rm diag}big(F^CGF^Tbig):db cr
frac{partialphi}{partial b}
&= {rm diag}big(F^CGF^Tbig) cr
&= {rm diag}big((Fu_1)^C(Fv_1)^Tbig) cr
}$$






share|cite|improve this answer











$endgroup$



Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.



Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB:  The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]



Define the variables
$$eqalign{
B &= {rm Diag}(b) cr
X &= F^HBF-A cr
}$$

Given the SVD of $X$
$$eqalign{
X &= USV^H cr
U &= big[,u_1,u_2 ldots u_n,big],,&u_k&in{mathbb C}^{mtimes 1} cr
S &= {rm Diag}(sigma_k),&S&in{mathbb R}^{ntimes n} cr
V &= big[,v_1,v_2 ldots v_n,big],&v_k&in{mathbb C}^{ntimes 1} cr
}$$

where the $sigma_k$ are ordered such that $,,sigma_1>sigma_2geldotsgesigma_nge 0$



The gradient of the spectral norm $phi = |X|_2$ can be written as
$$G = frac{partialphi}{partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$
To find the gradient wrt the vector $b$, write the differential and perform a change of variables.
$$eqalign{
dphi &= G:dX cr
&= G:F^H,dB,F cr
&= F^C GF^T:dB cr
&= F^C GF^T:{rm Diag}(db) cr
&= {rm diag}big(F^CGF^Tbig):db cr
frac{partialphi}{partial b}
&= {rm diag}big(F^CGF^Tbig) cr
&= {rm diag}big((Fu_1)^C(Fv_1)^Tbig) cr
}$$







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Jan 18 at 22:08

























answered Jan 17 at 18:29









greggreg

9,3361825




9,3361825












  • $begingroup$
    Nice proof. I found this post useful in understanding one of the steps better
    $endgroup$
    – Aleksejs Fomins
    Jan 18 at 17:12






  • 1




    $begingroup$
    @AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
    $endgroup$
    – greg
    Jan 18 at 22:26




















  • $begingroup$
    Nice proof. I found this post useful in understanding one of the steps better
    $endgroup$
    – Aleksejs Fomins
    Jan 18 at 17:12






  • 1




    $begingroup$
    @AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
    $endgroup$
    – greg
    Jan 18 at 22:26


















$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12




$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12




1




1




$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26






$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26




















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3076947%2fcalculate-gradient-of-the-spectral-norm-analytically%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Human spaceflight

Can not write log (Is /dev/pts mounted?) - openpty in Ubuntu-on-Windows?

張江高科駅