Entropy Lower Bound in Terms of $ell_2$ norm
Define
$$
begin{align}
H(p_1, dots, p_n) &= sum_{i=1}^n p_ilog1/p_i\
&=log n+sum_{i=1}^nsum_{k=2}^infty (-1)^{k + 1} n^{k - 1} frac{(p_i - 1/n)^k}{k (k - 1)},
end{align}
$$
where $p_1,dots,p_nge0$ sum to $1$.
Then we have the classic inequality
$H(p_1,p_2)ge(log2)(1-2((p_1-1/2)^2+2(p_2-1/2)^2))=(log 2)(1-2|p-1/2|^2)$, and we might wonder if that could be extended for $n>2$.
In particular with something like
$$begin{align}
H(p_1,dots,p_n)&ge(log n)(1-c_n|p-1/n|^2_2).
end{align}$$
From experiments with $n=3$, it seems like
$c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ suffices, but I don't have a proof of this. It is also slightly inconvenient that it can go below $0$, something that wasn't the case with the $n=2$ case.
Bounding the terms individually, we can get $H(p_1,dots,p_n)ge-2+4sum_{i=1}^nfrac{p_i}{1+p_i}$, which is non-negative, but not as relatable to the $ell_2$ norm. We can also bound $Hge n/4-|p-1/2|_2^2$, but somehow bounding centered in $1/n$ seems more natural.
Is there a well known lower bound like this, relating $H(p)$ with $|p|_2$? Ideally, one that is asymptotically tight at $p_1=dots=p_n=1/n$ and is always positive.
real-analysis inequality summation information-theory entropy
add a comment |
Define
$$
begin{align}
H(p_1, dots, p_n) &= sum_{i=1}^n p_ilog1/p_i\
&=log n+sum_{i=1}^nsum_{k=2}^infty (-1)^{k + 1} n^{k - 1} frac{(p_i - 1/n)^k}{k (k - 1)},
end{align}
$$
where $p_1,dots,p_nge0$ sum to $1$.
Then we have the classic inequality
$H(p_1,p_2)ge(log2)(1-2((p_1-1/2)^2+2(p_2-1/2)^2))=(log 2)(1-2|p-1/2|^2)$, and we might wonder if that could be extended for $n>2$.
In particular with something like
$$begin{align}
H(p_1,dots,p_n)&ge(log n)(1-c_n|p-1/n|^2_2).
end{align}$$
From experiments with $n=3$, it seems like
$c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ suffices, but I don't have a proof of this. It is also slightly inconvenient that it can go below $0$, something that wasn't the case with the $n=2$ case.
Bounding the terms individually, we can get $H(p_1,dots,p_n)ge-2+4sum_{i=1}^nfrac{p_i}{1+p_i}$, which is non-negative, but not as relatable to the $ell_2$ norm. We can also bound $Hge n/4-|p-1/2|_2^2$, but somehow bounding centered in $1/n$ seems more natural.
Is there a well known lower bound like this, relating $H(p)$ with $|p|_2$? Ideally, one that is asymptotically tight at $p_1=dots=p_n=1/n$ and is always positive.
real-analysis inequality summation information-theory entropy
1
Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
– Clement C.
Dec 21 '18 at 1:27
add a comment |
Define
$$
begin{align}
H(p_1, dots, p_n) &= sum_{i=1}^n p_ilog1/p_i\
&=log n+sum_{i=1}^nsum_{k=2}^infty (-1)^{k + 1} n^{k - 1} frac{(p_i - 1/n)^k}{k (k - 1)},
end{align}
$$
where $p_1,dots,p_nge0$ sum to $1$.
Then we have the classic inequality
$H(p_1,p_2)ge(log2)(1-2((p_1-1/2)^2+2(p_2-1/2)^2))=(log 2)(1-2|p-1/2|^2)$, and we might wonder if that could be extended for $n>2$.
In particular with something like
$$begin{align}
H(p_1,dots,p_n)&ge(log n)(1-c_n|p-1/n|^2_2).
end{align}$$
From experiments with $n=3$, it seems like
$c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ suffices, but I don't have a proof of this. It is also slightly inconvenient that it can go below $0$, something that wasn't the case with the $n=2$ case.
Bounding the terms individually, we can get $H(p_1,dots,p_n)ge-2+4sum_{i=1}^nfrac{p_i}{1+p_i}$, which is non-negative, but not as relatable to the $ell_2$ norm. We can also bound $Hge n/4-|p-1/2|_2^2$, but somehow bounding centered in $1/n$ seems more natural.
Is there a well known lower bound like this, relating $H(p)$ with $|p|_2$? Ideally, one that is asymptotically tight at $p_1=dots=p_n=1/n$ and is always positive.
real-analysis inequality summation information-theory entropy
Define
$$
begin{align}
H(p_1, dots, p_n) &= sum_{i=1}^n p_ilog1/p_i\
&=log n+sum_{i=1}^nsum_{k=2}^infty (-1)^{k + 1} n^{k - 1} frac{(p_i - 1/n)^k}{k (k - 1)},
end{align}
$$
where $p_1,dots,p_nge0$ sum to $1$.
Then we have the classic inequality
$H(p_1,p_2)ge(log2)(1-2((p_1-1/2)^2+2(p_2-1/2)^2))=(log 2)(1-2|p-1/2|^2)$, and we might wonder if that could be extended for $n>2$.
In particular with something like
$$begin{align}
H(p_1,dots,p_n)&ge(log n)(1-c_n|p-1/n|^2_2).
end{align}$$
From experiments with $n=3$, it seems like
$c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ suffices, but I don't have a proof of this. It is also slightly inconvenient that it can go below $0$, something that wasn't the case with the $n=2$ case.
Bounding the terms individually, we can get $H(p_1,dots,p_n)ge-2+4sum_{i=1}^nfrac{p_i}{1+p_i}$, which is non-negative, but not as relatable to the $ell_2$ norm. We can also bound $Hge n/4-|p-1/2|_2^2$, but somehow bounding centered in $1/n$ seems more natural.
Is there a well known lower bound like this, relating $H(p)$ with $|p|_2$? Ideally, one that is asymptotically tight at $p_1=dots=p_n=1/n$ and is always positive.
real-analysis inequality summation information-theory entropy
real-analysis inequality summation information-theory entropy
edited Dec 26 '18 at 13:09
asked Dec 20 '18 at 21:49
Thomas Ahle
1,4821320
1,4821320
1
Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
– Clement C.
Dec 21 '18 at 1:27
add a comment |
1
Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
– Clement C.
Dec 21 '18 at 1:27
1
1
Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
– Clement C.
Dec 21 '18 at 1:27
Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
– Clement C.
Dec 21 '18 at 1:27
add a comment |
4 Answers
4
active
oldest
votes
Defining $p_i=1/n+q_i$ we get (using nats):
$$begin{align}
H({bf p}) &=-sum p_i log(p_i)\
&=-sum (1/n +q_i) log(1/n +q_i)\
&=-sum ( 1/n +q_i) [log(1/n ) + log(1+ n q_i )]\
&= log(n) -sum ( 1/n +q_i) log(1+ n q_i)\
&ge log(n) -sum ( 1/n +q_i) n q_i\
& = log(n) - nsum q_i^2\
& = log(n) - n , lVert{bf p}- 1/nrVert^2_2\
end{align}$$
Or, if you prefer
$$ H({bf p}) ge log(n)left(1 - frac{n}{log n}sum q_i^2 right) $$
Of course, the bound is useless if $sum q_i^2ge frac{log(n)}{n} $.
2
As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
– Clement C.
Dec 21 '18 at 1:29
That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
– Thomas Ahle
Dec 21 '18 at 11:28
It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
– leonbloy
Dec 21 '18 at 11:33
@leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
– Thomas Ahle
Dec 21 '18 at 13:08
1
You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
– Clement C.
Dec 21 '18 at 16:34
|
show 2 more comments
Let $(X,upsilon)$ be a finite measure space. Let $sigma=frac{upsilon}{V}$ be the uniform probability distribution on $X$ ($V=upsilon(X)$). Let $rho$ be an absolutely continuous probability distribution with density $p$. Then the inequality
$$begin{align}-h(p)+ln V&leq VlVert p-tfrac{1}{V}rVert_2^2\
h(p)&stackrel{mathrm{def}}{=}-int_X p(x)ln p(x)mathrm{d}upsilon_{x}\
lVert p-qrVert^2_2&stackrel{mathrm{def}}{=}int_Xlvert p(x)-q(x)rvert^2mathrm{d}upsilon_x
end{align}$$
is exactly the inequality between the $chi^2$-divergence and the KL divergence
$$begin{align}D(rhoparallelsigma)&leq chi^2(rhoparallelsigma)text{,}\
D(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}lnfrac{mathrm{d}rho}{mathrm{d}sigma}-frac{mathrm{d}rho}{mathrm{d}sigma}+1right)mathrm{d}sigma_x\
chi^2(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}-1right)^2mathrm{d}sigma_xtext{;}
end{align}$$
this inequality in turn follows from
$$tln t - t + 1 leq (t-1)^2text{.}$$
Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
– stochasticboy321
Dec 23 '18 at 0:37
@stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
– Clement C.
Dec 23 '18 at 22:18
2
@ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
– stochasticboy321
Dec 24 '18 at 18:46
@stochasticboy321 If you made that an answer, I'd upvote it.
– Clement C.
Dec 25 '18 at 0:04
add a comment |
To complement Leon Bloy's answer and show the bound he (and K B Dave) obtained cannot be significantly improved upon (i.e., that $c_n = Omega!left(frac{n}{log n}right)$ is necessary): Fix $varepsilon in (0,1]$, and assume without loss of generality that $n=2m$ is even. Define $mathbf{p}^{(varepsilon)}$ as the probability mass function (over ${1,dots,n}$) such that
$$
mathbf{p}^{(varepsilon)}_i = begin{cases} frac{1+varepsilon}{n} & text{ if } i leq m \
frac{1-varepsilon}{n} & text{ if } i > m
end{cases}
$$
Note that
$$begin{align}
H(mathbf{p}^{(varepsilon)}) &= sum_{i=1}^m frac{1+varepsilon}{n}log frac{n}{1+varepsilon}
+ sum_{i=m+1}^n frac{1-varepsilon}{n}log frac{n}{1-varepsilon} \
&= log n - frac{1}{2}left((1+varepsilon)log(1+varepsilon) + (1-varepsilon)log(1-varepsilon)right) \
&= log n - frac{varepsilon^2}{2} + o(varepsilon^3) tag{1}
end{align}$$
while
$$
lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2 = frac{varepsilon^2}{n} tag{2}
$$
so that
$$
H(mathbf{p}^{(varepsilon)}) = log n left(1 - left(1/2+o_varepsilon(1)right)cdotfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right) tag{3}
$$
If you want to avoid the asymptotics as $varepsilon to 0$, you can still fix $varepsilon = 1$ (for instance) and get that $$H(mathbf{p}^{(varepsilon)}) = log n left(1 - c'_varepsilonfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right)$$ for some constant $c'_varepsilon > 0$.
That's very good to know. Thank you!
– Thomas Ahle
Dec 21 '18 at 18:22
@ThomasAhle You're welcome!
– Clement C.
Dec 21 '18 at 18:23
add a comment |
This is just scribbling down some observations in the comments, to have them more, well, visible.
As observed by K B Dave, the inequality of leonbloy is an instance of $mathrm{KL} le chi^2.$ This can thus be improved by using a tighter inequality of this form. Note that the resulting inequality was mentioned by Clement in the original comments.
We have $$mathrm{KL}(P|Q) = int log frac{mathrm{d}P}{mathrm{d}Q} ,mathrm{d}P le log int left( frac{mathrm{d}P}{mathrm{d}Q}right) ,mathrm{d}P = log int left( frac{mathrm{d}P}{mathrm{d}Q}right)^2 ,mathrm{d}Q = log (1 + chi^2(P|Q)), $$ where the inequality follows by the concavity of $log$ and Jensen's inequality. Applying the above to this case yields $$H(P) = log n - mathrm{KL}(P|mathbf{1}/n) ge log n - log(1 + sum_x frac{(P_x - 1/n)^2}{1/n}) = log n - log (1 + n|P - mathbf{1}/n|_2^2).$$
Note that if $n|P - mathbf{1}/n|_2^2 ll 1,$ we may use $log (1 + x) approx x$ for small values of $x$ to recover the original bound.
We may also state the above bound in the following equivalent way: Observe that $1 + chi^2(P|Q) = mathbb{E}_Q[ ({mathrm{d}P}/{mathrm{d}Q})^2].$ Again plugging in $Q$ uniform on $n$ letters yields that the latter expectation is simply $n|P|_2^2,$ and we get that
$$ H(P) ge - log |P|_2^2,$$ the form stated by Clement. Note that $|P|_2 le 1$ for every distribution, so the form above makes it obvious that the inequalities are non-trivial (in that the RHS is $ge$ 0) for every $P$.
As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
– Clement C.
Dec 27 '18 at 12:54
add a comment |
Your Answer
StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");
StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});
function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});
}
});
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3047998%2fentropy-lower-bound-in-terms-of-ell-2-norm%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Defining $p_i=1/n+q_i$ we get (using nats):
$$begin{align}
H({bf p}) &=-sum p_i log(p_i)\
&=-sum (1/n +q_i) log(1/n +q_i)\
&=-sum ( 1/n +q_i) [log(1/n ) + log(1+ n q_i )]\
&= log(n) -sum ( 1/n +q_i) log(1+ n q_i)\
&ge log(n) -sum ( 1/n +q_i) n q_i\
& = log(n) - nsum q_i^2\
& = log(n) - n , lVert{bf p}- 1/nrVert^2_2\
end{align}$$
Or, if you prefer
$$ H({bf p}) ge log(n)left(1 - frac{n}{log n}sum q_i^2 right) $$
Of course, the bound is useless if $sum q_i^2ge frac{log(n)}{n} $.
2
As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
– Clement C.
Dec 21 '18 at 1:29
That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
– Thomas Ahle
Dec 21 '18 at 11:28
It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
– leonbloy
Dec 21 '18 at 11:33
@leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
– Thomas Ahle
Dec 21 '18 at 13:08
1
You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
– Clement C.
Dec 21 '18 at 16:34
|
show 2 more comments
Defining $p_i=1/n+q_i$ we get (using nats):
$$begin{align}
H({bf p}) &=-sum p_i log(p_i)\
&=-sum (1/n +q_i) log(1/n +q_i)\
&=-sum ( 1/n +q_i) [log(1/n ) + log(1+ n q_i )]\
&= log(n) -sum ( 1/n +q_i) log(1+ n q_i)\
&ge log(n) -sum ( 1/n +q_i) n q_i\
& = log(n) - nsum q_i^2\
& = log(n) - n , lVert{bf p}- 1/nrVert^2_2\
end{align}$$
Or, if you prefer
$$ H({bf p}) ge log(n)left(1 - frac{n}{log n}sum q_i^2 right) $$
Of course, the bound is useless if $sum q_i^2ge frac{log(n)}{n} $.
2
As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
– Clement C.
Dec 21 '18 at 1:29
That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
– Thomas Ahle
Dec 21 '18 at 11:28
It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
– leonbloy
Dec 21 '18 at 11:33
@leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
– Thomas Ahle
Dec 21 '18 at 13:08
1
You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
– Clement C.
Dec 21 '18 at 16:34
|
show 2 more comments
Defining $p_i=1/n+q_i$ we get (using nats):
$$begin{align}
H({bf p}) &=-sum p_i log(p_i)\
&=-sum (1/n +q_i) log(1/n +q_i)\
&=-sum ( 1/n +q_i) [log(1/n ) + log(1+ n q_i )]\
&= log(n) -sum ( 1/n +q_i) log(1+ n q_i)\
&ge log(n) -sum ( 1/n +q_i) n q_i\
& = log(n) - nsum q_i^2\
& = log(n) - n , lVert{bf p}- 1/nrVert^2_2\
end{align}$$
Or, if you prefer
$$ H({bf p}) ge log(n)left(1 - frac{n}{log n}sum q_i^2 right) $$
Of course, the bound is useless if $sum q_i^2ge frac{log(n)}{n} $.
Defining $p_i=1/n+q_i$ we get (using nats):
$$begin{align}
H({bf p}) &=-sum p_i log(p_i)\
&=-sum (1/n +q_i) log(1/n +q_i)\
&=-sum ( 1/n +q_i) [log(1/n ) + log(1+ n q_i )]\
&= log(n) -sum ( 1/n +q_i) log(1+ n q_i)\
&ge log(n) -sum ( 1/n +q_i) n q_i\
& = log(n) - nsum q_i^2\
& = log(n) - n , lVert{bf p}- 1/nrVert^2_2\
end{align}$$
Or, if you prefer
$$ H({bf p}) ge log(n)left(1 - frac{n}{log n}sum q_i^2 right) $$
Of course, the bound is useless if $sum q_i^2ge frac{log(n)}{n} $.
edited Dec 21 '18 at 10:45
answered Dec 20 '18 at 23:11
leonbloy
40.4k645107
40.4k645107
2
As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
– Clement C.
Dec 21 '18 at 1:29
That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
– Thomas Ahle
Dec 21 '18 at 11:28
It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
– leonbloy
Dec 21 '18 at 11:33
@leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
– Thomas Ahle
Dec 21 '18 at 13:08
1
You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
– Clement C.
Dec 21 '18 at 16:34
|
show 2 more comments
2
As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
– Clement C.
Dec 21 '18 at 1:29
That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
– Thomas Ahle
Dec 21 '18 at 11:28
It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
– leonbloy
Dec 21 '18 at 11:33
@leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
– Thomas Ahle
Dec 21 '18 at 13:08
1
You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
– Clement C.
Dec 21 '18 at 16:34
2
2
As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
– Clement C.
Dec 21 '18 at 1:29
As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
– Clement C.
Dec 21 '18 at 1:29
That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
– Thomas Ahle
Dec 21 '18 at 11:28
That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
– Thomas Ahle
Dec 21 '18 at 11:28
It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
– leonbloy
Dec 21 '18 at 11:33
It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
– leonbloy
Dec 21 '18 at 11:33
@leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
– Thomas Ahle
Dec 21 '18 at 13:08
@leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
– Thomas Ahle
Dec 21 '18 at 13:08
1
1
You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
– Clement C.
Dec 21 '18 at 16:34
You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
– Clement C.
Dec 21 '18 at 16:34
|
show 2 more comments
Let $(X,upsilon)$ be a finite measure space. Let $sigma=frac{upsilon}{V}$ be the uniform probability distribution on $X$ ($V=upsilon(X)$). Let $rho$ be an absolutely continuous probability distribution with density $p$. Then the inequality
$$begin{align}-h(p)+ln V&leq VlVert p-tfrac{1}{V}rVert_2^2\
h(p)&stackrel{mathrm{def}}{=}-int_X p(x)ln p(x)mathrm{d}upsilon_{x}\
lVert p-qrVert^2_2&stackrel{mathrm{def}}{=}int_Xlvert p(x)-q(x)rvert^2mathrm{d}upsilon_x
end{align}$$
is exactly the inequality between the $chi^2$-divergence and the KL divergence
$$begin{align}D(rhoparallelsigma)&leq chi^2(rhoparallelsigma)text{,}\
D(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}lnfrac{mathrm{d}rho}{mathrm{d}sigma}-frac{mathrm{d}rho}{mathrm{d}sigma}+1right)mathrm{d}sigma_x\
chi^2(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}-1right)^2mathrm{d}sigma_xtext{;}
end{align}$$
this inequality in turn follows from
$$tln t - t + 1 leq (t-1)^2text{.}$$
Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
– stochasticboy321
Dec 23 '18 at 0:37
@stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
– Clement C.
Dec 23 '18 at 22:18
2
@ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
– stochasticboy321
Dec 24 '18 at 18:46
@stochasticboy321 If you made that an answer, I'd upvote it.
– Clement C.
Dec 25 '18 at 0:04
add a comment |
Let $(X,upsilon)$ be a finite measure space. Let $sigma=frac{upsilon}{V}$ be the uniform probability distribution on $X$ ($V=upsilon(X)$). Let $rho$ be an absolutely continuous probability distribution with density $p$. Then the inequality
$$begin{align}-h(p)+ln V&leq VlVert p-tfrac{1}{V}rVert_2^2\
h(p)&stackrel{mathrm{def}}{=}-int_X p(x)ln p(x)mathrm{d}upsilon_{x}\
lVert p-qrVert^2_2&stackrel{mathrm{def}}{=}int_Xlvert p(x)-q(x)rvert^2mathrm{d}upsilon_x
end{align}$$
is exactly the inequality between the $chi^2$-divergence and the KL divergence
$$begin{align}D(rhoparallelsigma)&leq chi^2(rhoparallelsigma)text{,}\
D(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}lnfrac{mathrm{d}rho}{mathrm{d}sigma}-frac{mathrm{d}rho}{mathrm{d}sigma}+1right)mathrm{d}sigma_x\
chi^2(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}-1right)^2mathrm{d}sigma_xtext{;}
end{align}$$
this inequality in turn follows from
$$tln t - t + 1 leq (t-1)^2text{.}$$
Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
– stochasticboy321
Dec 23 '18 at 0:37
@stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
– Clement C.
Dec 23 '18 at 22:18
2
@ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
– stochasticboy321
Dec 24 '18 at 18:46
@stochasticboy321 If you made that an answer, I'd upvote it.
– Clement C.
Dec 25 '18 at 0:04
add a comment |
Let $(X,upsilon)$ be a finite measure space. Let $sigma=frac{upsilon}{V}$ be the uniform probability distribution on $X$ ($V=upsilon(X)$). Let $rho$ be an absolutely continuous probability distribution with density $p$. Then the inequality
$$begin{align}-h(p)+ln V&leq VlVert p-tfrac{1}{V}rVert_2^2\
h(p)&stackrel{mathrm{def}}{=}-int_X p(x)ln p(x)mathrm{d}upsilon_{x}\
lVert p-qrVert^2_2&stackrel{mathrm{def}}{=}int_Xlvert p(x)-q(x)rvert^2mathrm{d}upsilon_x
end{align}$$
is exactly the inequality between the $chi^2$-divergence and the KL divergence
$$begin{align}D(rhoparallelsigma)&leq chi^2(rhoparallelsigma)text{,}\
D(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}lnfrac{mathrm{d}rho}{mathrm{d}sigma}-frac{mathrm{d}rho}{mathrm{d}sigma}+1right)mathrm{d}sigma_x\
chi^2(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}-1right)^2mathrm{d}sigma_xtext{;}
end{align}$$
this inequality in turn follows from
$$tln t - t + 1 leq (t-1)^2text{.}$$
Let $(X,upsilon)$ be a finite measure space. Let $sigma=frac{upsilon}{V}$ be the uniform probability distribution on $X$ ($V=upsilon(X)$). Let $rho$ be an absolutely continuous probability distribution with density $p$. Then the inequality
$$begin{align}-h(p)+ln V&leq VlVert p-tfrac{1}{V}rVert_2^2\
h(p)&stackrel{mathrm{def}}{=}-int_X p(x)ln p(x)mathrm{d}upsilon_{x}\
lVert p-qrVert^2_2&stackrel{mathrm{def}}{=}int_Xlvert p(x)-q(x)rvert^2mathrm{d}upsilon_x
end{align}$$
is exactly the inequality between the $chi^2$-divergence and the KL divergence
$$begin{align}D(rhoparallelsigma)&leq chi^2(rhoparallelsigma)text{,}\
D(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}lnfrac{mathrm{d}rho}{mathrm{d}sigma}-frac{mathrm{d}rho}{mathrm{d}sigma}+1right)mathrm{d}sigma_x\
chi^2(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}-1right)^2mathrm{d}sigma_xtext{;}
end{align}$$
this inequality in turn follows from
$$tln t - t + 1 leq (t-1)^2text{.}$$
answered Dec 21 '18 at 1:30
K B Dave
3,262217
3,262217
Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
– stochasticboy321
Dec 23 '18 at 0:37
@stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
– Clement C.
Dec 23 '18 at 22:18
2
@ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
– stochasticboy321
Dec 24 '18 at 18:46
@stochasticboy321 If you made that an answer, I'd upvote it.
– Clement C.
Dec 25 '18 at 0:04
add a comment |
Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
– stochasticboy321
Dec 23 '18 at 0:37
@stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
– Clement C.
Dec 23 '18 at 22:18
2
@ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
– stochasticboy321
Dec 24 '18 at 18:46
@stochasticboy321 If you made that an answer, I'd upvote it.
– Clement C.
Dec 25 '18 at 0:04
Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
– stochasticboy321
Dec 23 '18 at 0:37
Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
– stochasticboy321
Dec 23 '18 at 0:37
@stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
– Clement C.
Dec 23 '18 at 22:18
@stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
– Clement C.
Dec 23 '18 at 22:18
2
2
@ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
– stochasticboy321
Dec 24 '18 at 18:46
@ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
– stochasticboy321
Dec 24 '18 at 18:46
@stochasticboy321 If you made that an answer, I'd upvote it.
– Clement C.
Dec 25 '18 at 0:04
@stochasticboy321 If you made that an answer, I'd upvote it.
– Clement C.
Dec 25 '18 at 0:04
add a comment |
To complement Leon Bloy's answer and show the bound he (and K B Dave) obtained cannot be significantly improved upon (i.e., that $c_n = Omega!left(frac{n}{log n}right)$ is necessary): Fix $varepsilon in (0,1]$, and assume without loss of generality that $n=2m$ is even. Define $mathbf{p}^{(varepsilon)}$ as the probability mass function (over ${1,dots,n}$) such that
$$
mathbf{p}^{(varepsilon)}_i = begin{cases} frac{1+varepsilon}{n} & text{ if } i leq m \
frac{1-varepsilon}{n} & text{ if } i > m
end{cases}
$$
Note that
$$begin{align}
H(mathbf{p}^{(varepsilon)}) &= sum_{i=1}^m frac{1+varepsilon}{n}log frac{n}{1+varepsilon}
+ sum_{i=m+1}^n frac{1-varepsilon}{n}log frac{n}{1-varepsilon} \
&= log n - frac{1}{2}left((1+varepsilon)log(1+varepsilon) + (1-varepsilon)log(1-varepsilon)right) \
&= log n - frac{varepsilon^2}{2} + o(varepsilon^3) tag{1}
end{align}$$
while
$$
lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2 = frac{varepsilon^2}{n} tag{2}
$$
so that
$$
H(mathbf{p}^{(varepsilon)}) = log n left(1 - left(1/2+o_varepsilon(1)right)cdotfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right) tag{3}
$$
If you want to avoid the asymptotics as $varepsilon to 0$, you can still fix $varepsilon = 1$ (for instance) and get that $$H(mathbf{p}^{(varepsilon)}) = log n left(1 - c'_varepsilonfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right)$$ for some constant $c'_varepsilon > 0$.
That's very good to know. Thank you!
– Thomas Ahle
Dec 21 '18 at 18:22
@ThomasAhle You're welcome!
– Clement C.
Dec 21 '18 at 18:23
add a comment |
To complement Leon Bloy's answer and show the bound he (and K B Dave) obtained cannot be significantly improved upon (i.e., that $c_n = Omega!left(frac{n}{log n}right)$ is necessary): Fix $varepsilon in (0,1]$, and assume without loss of generality that $n=2m$ is even. Define $mathbf{p}^{(varepsilon)}$ as the probability mass function (over ${1,dots,n}$) such that
$$
mathbf{p}^{(varepsilon)}_i = begin{cases} frac{1+varepsilon}{n} & text{ if } i leq m \
frac{1-varepsilon}{n} & text{ if } i > m
end{cases}
$$
Note that
$$begin{align}
H(mathbf{p}^{(varepsilon)}) &= sum_{i=1}^m frac{1+varepsilon}{n}log frac{n}{1+varepsilon}
+ sum_{i=m+1}^n frac{1-varepsilon}{n}log frac{n}{1-varepsilon} \
&= log n - frac{1}{2}left((1+varepsilon)log(1+varepsilon) + (1-varepsilon)log(1-varepsilon)right) \
&= log n - frac{varepsilon^2}{2} + o(varepsilon^3) tag{1}
end{align}$$
while
$$
lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2 = frac{varepsilon^2}{n} tag{2}
$$
so that
$$
H(mathbf{p}^{(varepsilon)}) = log n left(1 - left(1/2+o_varepsilon(1)right)cdotfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right) tag{3}
$$
If you want to avoid the asymptotics as $varepsilon to 0$, you can still fix $varepsilon = 1$ (for instance) and get that $$H(mathbf{p}^{(varepsilon)}) = log n left(1 - c'_varepsilonfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right)$$ for some constant $c'_varepsilon > 0$.
That's very good to know. Thank you!
– Thomas Ahle
Dec 21 '18 at 18:22
@ThomasAhle You're welcome!
– Clement C.
Dec 21 '18 at 18:23
add a comment |
To complement Leon Bloy's answer and show the bound he (and K B Dave) obtained cannot be significantly improved upon (i.e., that $c_n = Omega!left(frac{n}{log n}right)$ is necessary): Fix $varepsilon in (0,1]$, and assume without loss of generality that $n=2m$ is even. Define $mathbf{p}^{(varepsilon)}$ as the probability mass function (over ${1,dots,n}$) such that
$$
mathbf{p}^{(varepsilon)}_i = begin{cases} frac{1+varepsilon}{n} & text{ if } i leq m \
frac{1-varepsilon}{n} & text{ if } i > m
end{cases}
$$
Note that
$$begin{align}
H(mathbf{p}^{(varepsilon)}) &= sum_{i=1}^m frac{1+varepsilon}{n}log frac{n}{1+varepsilon}
+ sum_{i=m+1}^n frac{1-varepsilon}{n}log frac{n}{1-varepsilon} \
&= log n - frac{1}{2}left((1+varepsilon)log(1+varepsilon) + (1-varepsilon)log(1-varepsilon)right) \
&= log n - frac{varepsilon^2}{2} + o(varepsilon^3) tag{1}
end{align}$$
while
$$
lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2 = frac{varepsilon^2}{n} tag{2}
$$
so that
$$
H(mathbf{p}^{(varepsilon)}) = log n left(1 - left(1/2+o_varepsilon(1)right)cdotfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right) tag{3}
$$
If you want to avoid the asymptotics as $varepsilon to 0$, you can still fix $varepsilon = 1$ (for instance) and get that $$H(mathbf{p}^{(varepsilon)}) = log n left(1 - c'_varepsilonfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right)$$ for some constant $c'_varepsilon > 0$.
To complement Leon Bloy's answer and show the bound he (and K B Dave) obtained cannot be significantly improved upon (i.e., that $c_n = Omega!left(frac{n}{log n}right)$ is necessary): Fix $varepsilon in (0,1]$, and assume without loss of generality that $n=2m$ is even. Define $mathbf{p}^{(varepsilon)}$ as the probability mass function (over ${1,dots,n}$) such that
$$
mathbf{p}^{(varepsilon)}_i = begin{cases} frac{1+varepsilon}{n} & text{ if } i leq m \
frac{1-varepsilon}{n} & text{ if } i > m
end{cases}
$$
Note that
$$begin{align}
H(mathbf{p}^{(varepsilon)}) &= sum_{i=1}^m frac{1+varepsilon}{n}log frac{n}{1+varepsilon}
+ sum_{i=m+1}^n frac{1-varepsilon}{n}log frac{n}{1-varepsilon} \
&= log n - frac{1}{2}left((1+varepsilon)log(1+varepsilon) + (1-varepsilon)log(1-varepsilon)right) \
&= log n - frac{varepsilon^2}{2} + o(varepsilon^3) tag{1}
end{align}$$
while
$$
lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2 = frac{varepsilon^2}{n} tag{2}
$$
so that
$$
H(mathbf{p}^{(varepsilon)}) = log n left(1 - left(1/2+o_varepsilon(1)right)cdotfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right) tag{3}
$$
If you want to avoid the asymptotics as $varepsilon to 0$, you can still fix $varepsilon = 1$ (for instance) and get that $$H(mathbf{p}^{(varepsilon)}) = log n left(1 - c'_varepsilonfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right)$$ for some constant $c'_varepsilon > 0$.
answered Dec 21 '18 at 18:17
Clement C.
49.7k33886
49.7k33886
That's very good to know. Thank you!
– Thomas Ahle
Dec 21 '18 at 18:22
@ThomasAhle You're welcome!
– Clement C.
Dec 21 '18 at 18:23
add a comment |
That's very good to know. Thank you!
– Thomas Ahle
Dec 21 '18 at 18:22
@ThomasAhle You're welcome!
– Clement C.
Dec 21 '18 at 18:23
That's very good to know. Thank you!
– Thomas Ahle
Dec 21 '18 at 18:22
That's very good to know. Thank you!
– Thomas Ahle
Dec 21 '18 at 18:22
@ThomasAhle You're welcome!
– Clement C.
Dec 21 '18 at 18:23
@ThomasAhle You're welcome!
– Clement C.
Dec 21 '18 at 18:23
add a comment |
This is just scribbling down some observations in the comments, to have them more, well, visible.
As observed by K B Dave, the inequality of leonbloy is an instance of $mathrm{KL} le chi^2.$ This can thus be improved by using a tighter inequality of this form. Note that the resulting inequality was mentioned by Clement in the original comments.
We have $$mathrm{KL}(P|Q) = int log frac{mathrm{d}P}{mathrm{d}Q} ,mathrm{d}P le log int left( frac{mathrm{d}P}{mathrm{d}Q}right) ,mathrm{d}P = log int left( frac{mathrm{d}P}{mathrm{d}Q}right)^2 ,mathrm{d}Q = log (1 + chi^2(P|Q)), $$ where the inequality follows by the concavity of $log$ and Jensen's inequality. Applying the above to this case yields $$H(P) = log n - mathrm{KL}(P|mathbf{1}/n) ge log n - log(1 + sum_x frac{(P_x - 1/n)^2}{1/n}) = log n - log (1 + n|P - mathbf{1}/n|_2^2).$$
Note that if $n|P - mathbf{1}/n|_2^2 ll 1,$ we may use $log (1 + x) approx x$ for small values of $x$ to recover the original bound.
We may also state the above bound in the following equivalent way: Observe that $1 + chi^2(P|Q) = mathbb{E}_Q[ ({mathrm{d}P}/{mathrm{d}Q})^2].$ Again plugging in $Q$ uniform on $n$ letters yields that the latter expectation is simply $n|P|_2^2,$ and we get that
$$ H(P) ge - log |P|_2^2,$$ the form stated by Clement. Note that $|P|_2 le 1$ for every distribution, so the form above makes it obvious that the inequalities are non-trivial (in that the RHS is $ge$ 0) for every $P$.
As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
– Clement C.
Dec 27 '18 at 12:54
add a comment |
This is just scribbling down some observations in the comments, to have them more, well, visible.
As observed by K B Dave, the inequality of leonbloy is an instance of $mathrm{KL} le chi^2.$ This can thus be improved by using a tighter inequality of this form. Note that the resulting inequality was mentioned by Clement in the original comments.
We have $$mathrm{KL}(P|Q) = int log frac{mathrm{d}P}{mathrm{d}Q} ,mathrm{d}P le log int left( frac{mathrm{d}P}{mathrm{d}Q}right) ,mathrm{d}P = log int left( frac{mathrm{d}P}{mathrm{d}Q}right)^2 ,mathrm{d}Q = log (1 + chi^2(P|Q)), $$ where the inequality follows by the concavity of $log$ and Jensen's inequality. Applying the above to this case yields $$H(P) = log n - mathrm{KL}(P|mathbf{1}/n) ge log n - log(1 + sum_x frac{(P_x - 1/n)^2}{1/n}) = log n - log (1 + n|P - mathbf{1}/n|_2^2).$$
Note that if $n|P - mathbf{1}/n|_2^2 ll 1,$ we may use $log (1 + x) approx x$ for small values of $x$ to recover the original bound.
We may also state the above bound in the following equivalent way: Observe that $1 + chi^2(P|Q) = mathbb{E}_Q[ ({mathrm{d}P}/{mathrm{d}Q})^2].$ Again plugging in $Q$ uniform on $n$ letters yields that the latter expectation is simply $n|P|_2^2,$ and we get that
$$ H(P) ge - log |P|_2^2,$$ the form stated by Clement. Note that $|P|_2 le 1$ for every distribution, so the form above makes it obvious that the inequalities are non-trivial (in that the RHS is $ge$ 0) for every $P$.
As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
– Clement C.
Dec 27 '18 at 12:54
add a comment |
This is just scribbling down some observations in the comments, to have them more, well, visible.
As observed by K B Dave, the inequality of leonbloy is an instance of $mathrm{KL} le chi^2.$ This can thus be improved by using a tighter inequality of this form. Note that the resulting inequality was mentioned by Clement in the original comments.
We have $$mathrm{KL}(P|Q) = int log frac{mathrm{d}P}{mathrm{d}Q} ,mathrm{d}P le log int left( frac{mathrm{d}P}{mathrm{d}Q}right) ,mathrm{d}P = log int left( frac{mathrm{d}P}{mathrm{d}Q}right)^2 ,mathrm{d}Q = log (1 + chi^2(P|Q)), $$ where the inequality follows by the concavity of $log$ and Jensen's inequality. Applying the above to this case yields $$H(P) = log n - mathrm{KL}(P|mathbf{1}/n) ge log n - log(1 + sum_x frac{(P_x - 1/n)^2}{1/n}) = log n - log (1 + n|P - mathbf{1}/n|_2^2).$$
Note that if $n|P - mathbf{1}/n|_2^2 ll 1,$ we may use $log (1 + x) approx x$ for small values of $x$ to recover the original bound.
We may also state the above bound in the following equivalent way: Observe that $1 + chi^2(P|Q) = mathbb{E}_Q[ ({mathrm{d}P}/{mathrm{d}Q})^2].$ Again plugging in $Q$ uniform on $n$ letters yields that the latter expectation is simply $n|P|_2^2,$ and we get that
$$ H(P) ge - log |P|_2^2,$$ the form stated by Clement. Note that $|P|_2 le 1$ for every distribution, so the form above makes it obvious that the inequalities are non-trivial (in that the RHS is $ge$ 0) for every $P$.
This is just scribbling down some observations in the comments, to have them more, well, visible.
As observed by K B Dave, the inequality of leonbloy is an instance of $mathrm{KL} le chi^2.$ This can thus be improved by using a tighter inequality of this form. Note that the resulting inequality was mentioned by Clement in the original comments.
We have $$mathrm{KL}(P|Q) = int log frac{mathrm{d}P}{mathrm{d}Q} ,mathrm{d}P le log int left( frac{mathrm{d}P}{mathrm{d}Q}right) ,mathrm{d}P = log int left( frac{mathrm{d}P}{mathrm{d}Q}right)^2 ,mathrm{d}Q = log (1 + chi^2(P|Q)), $$ where the inequality follows by the concavity of $log$ and Jensen's inequality. Applying the above to this case yields $$H(P) = log n - mathrm{KL}(P|mathbf{1}/n) ge log n - log(1 + sum_x frac{(P_x - 1/n)^2}{1/n}) = log n - log (1 + n|P - mathbf{1}/n|_2^2).$$
Note that if $n|P - mathbf{1}/n|_2^2 ll 1,$ we may use $log (1 + x) approx x$ for small values of $x$ to recover the original bound.
We may also state the above bound in the following equivalent way: Observe that $1 + chi^2(P|Q) = mathbb{E}_Q[ ({mathrm{d}P}/{mathrm{d}Q})^2].$ Again plugging in $Q$ uniform on $n$ letters yields that the latter expectation is simply $n|P|_2^2,$ and we get that
$$ H(P) ge - log |P|_2^2,$$ the form stated by Clement. Note that $|P|_2 le 1$ for every distribution, so the form above makes it obvious that the inequalities are non-trivial (in that the RHS is $ge$ 0) for every $P$.
answered Dec 25 '18 at 18:09
community wiki
stochasticboy321
As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
– Clement C.
Dec 27 '18 at 12:54
add a comment |
As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
– Clement C.
Dec 27 '18 at 12:54
As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
– Clement C.
Dec 27 '18 at 12:54
As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
– Clement C.
Dec 27 '18 at 12:54
add a comment |
Thanks for contributing an answer to Mathematics Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
Use MathJax to format equations. MathJax reference.
To learn more, see our tips on writing great answers.
Some of your past answers have not been well-received, and you're in danger of being blocked from answering.
Please pay close attention to the following guidance:
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3047998%2fentropy-lower-bound-in-terms-of-ell-2-norm%23new-answer', 'question_page');
}
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
1
Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
– Clement C.
Dec 21 '18 at 1:27