Calculate gradient of the spectral norm analytically

Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:

$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$

How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?

I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?

edited Jan 17 at 14:10

asked Jan 17 at 13:09

abina shr

698

$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08

$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13

$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53

add a comment |

Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:

$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$

How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?

I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?

edited Jan 17 at 14:10

asked Jan 17 at 13:09

abina shr

698

$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08

$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13

$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53

add a comment |

Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:

$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$

How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?

I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?

edited Jan 17 at 14:10

asked Jan 17 at 13:09

abina shr

698

Given a matrix $F in mathbb{C}^{m times n}$ such that a $m>n$ and other matrix $A$ (non-symmetric matrix) of size $n times n$ and spectral norm as:

$$|A-F^*operatorname{diag}(b)F|_2 = sigma_{max}(A-F^*operatorname{diag}(b)F) = sqrt{lambda_{max} left( (A-F^*operatorname{diag}(b)F)^* (A-F^*operatorname{diag}(b)F right)),}$$

How do I compute analytically $nabla_b |A-F^*operatorname{diag}(b)F|_2$, where $b in mathbb{C}^{m times 1}$ is some vector and {$*$} is a sign for conjugate transpose?

I need gradient because I want to find $b$ by minimizing $|A-F^*operatorname{diag}(b)F|_2$ as I would like to find the optimum by using gradient descent. Is it possible?

linear-algebra optimization matrix-calculus gradient-descent spectral-norm

edited Jan 17 at 14:10

asked Jan 17 at 13:09

abina shr

698

edited Jan 17 at 14:10

asked Jan 17 at 13:09

abina shr

698

edited Jan 17 at 14:10

asked Jan 17 at 13:09

abina shr

698

asked Jan 17 at 13:09

abina shr

698

asked Jan 17 at 13:09

abina shr

698

$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08

$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13

$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53

add a comment |

$begingroup$
If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:08

$begingroup$
@AleksejsFomins Is there an analytic way to do this?
$endgroup$
– abina shr
Jan 17 at 14:13

$begingroup$
To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it
$endgroup$
– Aleksejs Fomins
Jan 17 at 14:53

If you only care about result and your problem is not too large, use a monte-carlo based method. They don't converge as fast, but they don't require you to know the gradient. See, for example the metropolis-hastings algorithm. The logic is to compute the value at a random point not too far from the current point, and move to it with a certain probability, that is larger if your result is better than the one before

– Aleksejs Fomins
Jan 17 at 14:08

@AleksejsFomins Is there an analytic way to do this?

– abina shr
Jan 17 at 14:13

To do what? Metropolis-Hastings - no, it is a numerical technique: you plug in a metric you want to minimize, you get back a vector that minimizes it. Gradient Descent that you have expressed the intent of using also is not analytical - it is a numeric technique for finding minimum or maximum. It is a little faster than MH, but it only works if your problem has only 1 maximum/minimum. Perhaps you are asking if it is possible to find the gradient you seek analytically - I don't know. What I try to say is that maybe you don't even need it

– Aleksejs Fomins
Jan 17 at 14:53

add a comment |

1 Answer
1

active

oldest

votes

Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.

Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB: The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]

Define the variables
$$eqalign{
B &= {rm Diag}(b) cr
X &= F^HBF-A cr
}$$
Given the SVD of $X$
$$eqalign{
X &= USV^H cr
U &= big[,u_1,u_2 ldots u_n,big],,&u_k&in{mathbb C}^{mtimes 1} cr
S &= {rm Diag}(sigma_k),&S&in{mathbb R}^{ntimes n} cr
V &= big[,v_1,v_2 ldots v_n,big],&v_k&in{mathbb C}^{ntimes 1} cr
}$$
where the $sigma_k$ are ordered such that $,,sigma_1>sigma_2geldotsgesigma_nge 0$

The gradient of the spectral norm $phi = |X|_2$ can be written as
$$G = frac{partialphi}{partial X} = (u_1v_1^H)^C = u_1^Cv_1^T$$
To find the gradient wrt the vector $b$, write the differential and perform a change of variables.
$$eqalign{
dphi &= G:dX cr
&= G:F^H,dB,F cr
&= F^C GF^T:dB cr
&= F^C GF^T:{rm Diag}(db) cr
&= {rm diag}big(F^CGF^Tbig):db cr
frac{partialphi}{partial b}
&= {rm diag}big(F^CGF^Tbig) cr
&= {rm diag}big((Fu_1)^C(Fv_1)^Tbig) cr
}$$

edited Jan 18 at 22:08

answered Jan 17 at 18:29

greg

9,3361825

$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12

1

$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3076947%2fcalculate-gradient-of-the-spectral-norm-analytically%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.

Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB: The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]

edited Jan 18 at 22:08

answered Jan 17 at 18:29

greg

9,3361825

$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12

1

$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26

add a comment |

Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.

Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB: The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]

edited Jan 18 at 22:08

answered Jan 17 at 18:29

greg

9,3361825

$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12

1

$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26

add a comment |

Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.

Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB: The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]

edited Jan 18 at 22:08

answered Jan 17 at 18:29

greg

9,3361825

Let's use $big{F^T,,F^C,,F^H=(F^C)^Tbig},$ to denote the $big{$Transpose, Complex, Hermitian$big}$ conjugates of $F$, respectively.

Let's also use the Frobenius product (:) notation instead of the Trace function, i.e.
$$A:B = {rm Tr}(A^TB)$$
[NB: The use of $A^T$ (rather than $A^H$) on the RHS is deliberate.]

edited Jan 18 at 22:08

answered Jan 17 at 18:29

greg

9,3361825

edited Jan 18 at 22:08

answered Jan 17 at 18:29

greg

9,3361825

answered Jan 17 at 18:29

greg

9,3361825

answered Jan 17 at 18:29

greg

9,3361825

$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12

1

$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26

add a comment |

$begingroup$
Nice proof. I found this post useful in understanding one of the steps better
$endgroup$
– Aleksejs Fomins
Jan 18 at 17:12

1

$begingroup$
@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.
$endgroup$
– greg
Jan 18 at 22:26

Nice proof. I found this post useful in understanding one of the steps better

– Aleksejs Fomins
Jan 18 at 17:12

@AleksejsFomins Looking at that post (especially python_enthusiast's answer) made me realize that I had mistakenly calculated the complex conjugate of $G$, which I have now corrected. Thanks.

– greg
Jan 18 at 22:26

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Mathematics Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Dtyjlui