Entropy Lower Bound in Terms of $ell_2$ norm












6














Define
$$
begin{align}
H(p_1, dots, p_n) &= sum_{i=1}^n p_ilog1/p_i\
&=log n+sum_{i=1}^nsum_{k=2}^infty (-1)^{k + 1} n^{k - 1} frac{(p_i - 1/n)^k}{k (k - 1)},
end{align}
$$

where $p_1,dots,p_nge0$ sum to $1$.



Then we have the classic inequality
$H(p_1,p_2)ge(log2)(1-2((p_1-1/2)^2+2(p_2-1/2)^2))=(log 2)(1-2|p-1/2|^2)$, and we might wonder if that could be extended for $n>2$.
In particular with something like
$$begin{align}
H(p_1,dots,p_n)&ge(log n)(1-c_n|p-1/n|^2_2).
end{align}$$

From experiments with $n=3$, it seems like
$c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ suffices, but I don't have a proof of this. It is also slightly inconvenient that it can go below $0$, something that wasn't the case with the $n=2$ case.



Bounding the terms individually, we can get $H(p_1,dots,p_n)ge-2+4sum_{i=1}^nfrac{p_i}{1+p_i}$, which is non-negative, but not as relatable to the $ell_2$ norm. We can also bound $Hge n/4-|p-1/2|_2^2$, but somehow bounding centered in $1/n$ seems more natural.



Is there a well known lower bound like this, relating $H(p)$ with $|p|_2$? Ideally, one that is asymptotically tight at $p_1=dots=p_n=1/n$ and is always positive.










share|cite|improve this question




















  • 1




    Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
    – Clement C.
    Dec 21 '18 at 1:27


















6














Define
$$
begin{align}
H(p_1, dots, p_n) &= sum_{i=1}^n p_ilog1/p_i\
&=log n+sum_{i=1}^nsum_{k=2}^infty (-1)^{k + 1} n^{k - 1} frac{(p_i - 1/n)^k}{k (k - 1)},
end{align}
$$

where $p_1,dots,p_nge0$ sum to $1$.



Then we have the classic inequality
$H(p_1,p_2)ge(log2)(1-2((p_1-1/2)^2+2(p_2-1/2)^2))=(log 2)(1-2|p-1/2|^2)$, and we might wonder if that could be extended for $n>2$.
In particular with something like
$$begin{align}
H(p_1,dots,p_n)&ge(log n)(1-c_n|p-1/n|^2_2).
end{align}$$

From experiments with $n=3$, it seems like
$c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ suffices, but I don't have a proof of this. It is also slightly inconvenient that it can go below $0$, something that wasn't the case with the $n=2$ case.



Bounding the terms individually, we can get $H(p_1,dots,p_n)ge-2+4sum_{i=1}^nfrac{p_i}{1+p_i}$, which is non-negative, but not as relatable to the $ell_2$ norm. We can also bound $Hge n/4-|p-1/2|_2^2$, but somehow bounding centered in $1/n$ seems more natural.



Is there a well known lower bound like this, relating $H(p)$ with $|p|_2$? Ideally, one that is asymptotically tight at $p_1=dots=p_n=1/n$ and is always positive.










share|cite|improve this question




















  • 1




    Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
    – Clement C.
    Dec 21 '18 at 1:27
















6












6








6


1





Define
$$
begin{align}
H(p_1, dots, p_n) &= sum_{i=1}^n p_ilog1/p_i\
&=log n+sum_{i=1}^nsum_{k=2}^infty (-1)^{k + 1} n^{k - 1} frac{(p_i - 1/n)^k}{k (k - 1)},
end{align}
$$

where $p_1,dots,p_nge0$ sum to $1$.



Then we have the classic inequality
$H(p_1,p_2)ge(log2)(1-2((p_1-1/2)^2+2(p_2-1/2)^2))=(log 2)(1-2|p-1/2|^2)$, and we might wonder if that could be extended for $n>2$.
In particular with something like
$$begin{align}
H(p_1,dots,p_n)&ge(log n)(1-c_n|p-1/n|^2_2).
end{align}$$

From experiments with $n=3$, it seems like
$c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ suffices, but I don't have a proof of this. It is also slightly inconvenient that it can go below $0$, something that wasn't the case with the $n=2$ case.



Bounding the terms individually, we can get $H(p_1,dots,p_n)ge-2+4sum_{i=1}^nfrac{p_i}{1+p_i}$, which is non-negative, but not as relatable to the $ell_2$ norm. We can also bound $Hge n/4-|p-1/2|_2^2$, but somehow bounding centered in $1/n$ seems more natural.



Is there a well known lower bound like this, relating $H(p)$ with $|p|_2$? Ideally, one that is asymptotically tight at $p_1=dots=p_n=1/n$ and is always positive.










share|cite|improve this question















Define
$$
begin{align}
H(p_1, dots, p_n) &= sum_{i=1}^n p_ilog1/p_i\
&=log n+sum_{i=1}^nsum_{k=2}^infty (-1)^{k + 1} n^{k - 1} frac{(p_i - 1/n)^k}{k (k - 1)},
end{align}
$$

where $p_1,dots,p_nge0$ sum to $1$.



Then we have the classic inequality
$H(p_1,p_2)ge(log2)(1-2((p_1-1/2)^2+2(p_2-1/2)^2))=(log 2)(1-2|p-1/2|^2)$, and we might wonder if that could be extended for $n>2$.
In particular with something like
$$begin{align}
H(p_1,dots,p_n)&ge(log n)(1-c_n|p-1/n|^2_2).
end{align}$$

From experiments with $n=3$, it seems like
$c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ suffices, but I don't have a proof of this. It is also slightly inconvenient that it can go below $0$, something that wasn't the case with the $n=2$ case.



Bounding the terms individually, we can get $H(p_1,dots,p_n)ge-2+4sum_{i=1}^nfrac{p_i}{1+p_i}$, which is non-negative, but not as relatable to the $ell_2$ norm. We can also bound $Hge n/4-|p-1/2|_2^2$, but somehow bounding centered in $1/n$ seems more natural.



Is there a well known lower bound like this, relating $H(p)$ with $|p|_2$? Ideally, one that is asymptotically tight at $p_1=dots=p_n=1/n$ and is always positive.







real-analysis inequality summation information-theory entropy






share|cite|improve this question















share|cite|improve this question













share|cite|improve this question




share|cite|improve this question








edited Dec 26 '18 at 13:09

























asked Dec 20 '18 at 21:49









Thomas Ahle

1,4821320




1,4821320








  • 1




    Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
    – Clement C.
    Dec 21 '18 at 1:27
















  • 1




    Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
    – Clement C.
    Dec 21 '18 at 1:27










1




1




Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
– Clement C.
Dec 21 '18 at 1:27






Not really what you want, but you have the inequality $$log n geq H(p) geq log frac{1}{lVert prVert_2^2}$$ based on the standard relation between Rényi entropies.
– Clement C.
Dec 21 '18 at 1:27












4 Answers
4






active

oldest

votes


















5














Defining $p_i=1/n+q_i$ we get (using nats):



$$begin{align}
H({bf p}) &=-sum p_i log(p_i)\
&=-sum (1/n +q_i) log(1/n +q_i)\
&=-sum ( 1/n +q_i) [log(1/n ) + log(1+ n q_i )]\
&= log(n) -sum ( 1/n +q_i) log(1+ n q_i)\
&ge log(n) -sum ( 1/n +q_i) n q_i\
& = log(n) - nsum q_i^2\
& = log(n) - n , lVert{bf p}- 1/nrVert^2_2\
end{align}$$



Or, if you prefer



$$ H({bf p}) ge log(n)left(1 - frac{n}{log n}sum q_i^2 right) $$



Of course, the bound is useless if $sum q_i^2ge frac{log(n)}{n} $.






share|cite|improve this answer



















  • 2




    As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
    – Clement C.
    Dec 21 '18 at 1:29










  • That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
    – Thomas Ahle
    Dec 21 '18 at 11:28










  • It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
    – leonbloy
    Dec 21 '18 at 11:33












  • @leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
    – Thomas Ahle
    Dec 21 '18 at 13:08






  • 1




    You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
    – Clement C.
    Dec 21 '18 at 16:34





















4














Let $(X,upsilon)$ be a finite measure space. Let $sigma=frac{upsilon}{V}$ be the uniform probability distribution on $X$ ($V=upsilon(X)$). Let $rho$ be an absolutely continuous probability distribution with density $p$. Then the inequality
$$begin{align}-h(p)+ln V&leq VlVert p-tfrac{1}{V}rVert_2^2\
h(p)&stackrel{mathrm{def}}{=}-int_X p(x)ln p(x)mathrm{d}upsilon_{x}\
lVert p-qrVert^2_2&stackrel{mathrm{def}}{=}int_Xlvert p(x)-q(x)rvert^2mathrm{d}upsilon_x
end{align}$$

is exactly the inequality between the $chi^2$-divergence and the KL divergence
$$begin{align}D(rhoparallelsigma)&leq chi^2(rhoparallelsigma)text{,}\
D(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}lnfrac{mathrm{d}rho}{mathrm{d}sigma}-frac{mathrm{d}rho}{mathrm{d}sigma}+1right)mathrm{d}sigma_x\
chi^2(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}-1right)^2mathrm{d}sigma_xtext{;}
end{align}$$

this inequality in turn follows from
$$tln t - t + 1 leq (t-1)^2text{.}$$






share|cite|improve this answer





















  • Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
    – stochasticboy321
    Dec 23 '18 at 0:37










  • @stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
    – Clement C.
    Dec 23 '18 at 22:18






  • 2




    @ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
    – stochasticboy321
    Dec 24 '18 at 18:46












  • @stochasticboy321 If you made that an answer, I'd upvote it.
    – Clement C.
    Dec 25 '18 at 0:04



















4














To complement Leon Bloy's answer and show the bound he (and K B Dave) obtained cannot be significantly improved upon (i.e., that $c_n = Omega!left(frac{n}{log n}right)$ is necessary): Fix $varepsilon in (0,1]$, and assume without loss of generality that $n=2m$ is even. Define $mathbf{p}^{(varepsilon)}$ as the probability mass function (over ${1,dots,n}$) such that
$$
mathbf{p}^{(varepsilon)}_i = begin{cases} frac{1+varepsilon}{n} & text{ if } i leq m \
frac{1-varepsilon}{n} & text{ if } i > m
end{cases}
$$

Note that
$$begin{align}
H(mathbf{p}^{(varepsilon)}) &= sum_{i=1}^m frac{1+varepsilon}{n}log frac{n}{1+varepsilon}
+ sum_{i=m+1}^n frac{1-varepsilon}{n}log frac{n}{1-varepsilon} \
&= log n - frac{1}{2}left((1+varepsilon)log(1+varepsilon) + (1-varepsilon)log(1-varepsilon)right) \
&= log n - frac{varepsilon^2}{2} + o(varepsilon^3) tag{1}
end{align}$$

while
$$
lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2 = frac{varepsilon^2}{n} tag{2}
$$

so that
$$
H(mathbf{p}^{(varepsilon)}) = log n left(1 - left(1/2+o_varepsilon(1)right)cdotfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right) tag{3}
$$



If you want to avoid the asymptotics as $varepsilon to 0$, you can still fix $varepsilon = 1$ (for instance) and get that $$H(mathbf{p}^{(varepsilon)}) = log n left(1 - c'_varepsilonfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right)$$ for some constant $c'_varepsilon > 0$.






share|cite|improve this answer





















  • That's very good to know. Thank you!
    – Thomas Ahle
    Dec 21 '18 at 18:22










  • @ThomasAhle You're welcome!
    – Clement C.
    Dec 21 '18 at 18:23



















3














This is just scribbling down some observations in the comments, to have them more, well, visible.





As observed by K B Dave, the inequality of leonbloy is an instance of $mathrm{KL} le chi^2.$ This can thus be improved by using a tighter inequality of this form. Note that the resulting inequality was mentioned by Clement in the original comments.



We have $$mathrm{KL}(P|Q) = int log frac{mathrm{d}P}{mathrm{d}Q} ,mathrm{d}P le log int left( frac{mathrm{d}P}{mathrm{d}Q}right) ,mathrm{d}P = log int left( frac{mathrm{d}P}{mathrm{d}Q}right)^2 ,mathrm{d}Q = log (1 + chi^2(P|Q)), $$ where the inequality follows by the concavity of $log$ and Jensen's inequality. Applying the above to this case yields $$H(P) = log n - mathrm{KL}(P|mathbf{1}/n) ge log n - log(1 + sum_x frac{(P_x - 1/n)^2}{1/n}) = log n - log (1 + n|P - mathbf{1}/n|_2^2).$$



Note that if $n|P - mathbf{1}/n|_2^2 ll 1,$ we may use $log (1 + x) approx x$ for small values of $x$ to recover the original bound.



We may also state the above bound in the following equivalent way: Observe that $1 + chi^2(P|Q) = mathbb{E}_Q[ ({mathrm{d}P}/{mathrm{d}Q})^2].$ Again plugging in $Q$ uniform on $n$ letters yields that the latter expectation is simply $n|P|_2^2,$ and we get that



$$ H(P) ge - log |P|_2^2,$$ the form stated by Clement. Note that $|P|_2 le 1$ for every distribution, so the form above makes it obvious that the inequalities are non-trivial (in that the RHS is $ge$ 0) for every $P$.






share|cite|improve this answer























  • As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
    – Clement C.
    Dec 27 '18 at 12:54













Your Answer





StackExchange.ifUsing("editor", function () {
return StackExchange.using("mathjaxEditing", function () {
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
});
});
}, "mathjax-editing");

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "69"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});


}
});














draft saved

draft discarded


















StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3047998%2fentropy-lower-bound-in-terms-of-ell-2-norm%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown

























4 Answers
4






active

oldest

votes








4 Answers
4






active

oldest

votes









active

oldest

votes






active

oldest

votes









5














Defining $p_i=1/n+q_i$ we get (using nats):



$$begin{align}
H({bf p}) &=-sum p_i log(p_i)\
&=-sum (1/n +q_i) log(1/n +q_i)\
&=-sum ( 1/n +q_i) [log(1/n ) + log(1+ n q_i )]\
&= log(n) -sum ( 1/n +q_i) log(1+ n q_i)\
&ge log(n) -sum ( 1/n +q_i) n q_i\
& = log(n) - nsum q_i^2\
& = log(n) - n , lVert{bf p}- 1/nrVert^2_2\
end{align}$$



Or, if you prefer



$$ H({bf p}) ge log(n)left(1 - frac{n}{log n}sum q_i^2 right) $$



Of course, the bound is useless if $sum q_i^2ge frac{log(n)}{n} $.






share|cite|improve this answer



















  • 2




    As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
    – Clement C.
    Dec 21 '18 at 1:29










  • That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
    – Thomas Ahle
    Dec 21 '18 at 11:28










  • It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
    – leonbloy
    Dec 21 '18 at 11:33












  • @leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
    – Thomas Ahle
    Dec 21 '18 at 13:08






  • 1




    You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
    – Clement C.
    Dec 21 '18 at 16:34


















5














Defining $p_i=1/n+q_i$ we get (using nats):



$$begin{align}
H({bf p}) &=-sum p_i log(p_i)\
&=-sum (1/n +q_i) log(1/n +q_i)\
&=-sum ( 1/n +q_i) [log(1/n ) + log(1+ n q_i )]\
&= log(n) -sum ( 1/n +q_i) log(1+ n q_i)\
&ge log(n) -sum ( 1/n +q_i) n q_i\
& = log(n) - nsum q_i^2\
& = log(n) - n , lVert{bf p}- 1/nrVert^2_2\
end{align}$$



Or, if you prefer



$$ H({bf p}) ge log(n)left(1 - frac{n}{log n}sum q_i^2 right) $$



Of course, the bound is useless if $sum q_i^2ge frac{log(n)}{n} $.






share|cite|improve this answer



















  • 2




    As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
    – Clement C.
    Dec 21 '18 at 1:29










  • That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
    – Thomas Ahle
    Dec 21 '18 at 11:28










  • It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
    – leonbloy
    Dec 21 '18 at 11:33












  • @leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
    – Thomas Ahle
    Dec 21 '18 at 13:08






  • 1




    You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
    – Clement C.
    Dec 21 '18 at 16:34
















5












5








5






Defining $p_i=1/n+q_i$ we get (using nats):



$$begin{align}
H({bf p}) &=-sum p_i log(p_i)\
&=-sum (1/n +q_i) log(1/n +q_i)\
&=-sum ( 1/n +q_i) [log(1/n ) + log(1+ n q_i )]\
&= log(n) -sum ( 1/n +q_i) log(1+ n q_i)\
&ge log(n) -sum ( 1/n +q_i) n q_i\
& = log(n) - nsum q_i^2\
& = log(n) - n , lVert{bf p}- 1/nrVert^2_2\
end{align}$$



Or, if you prefer



$$ H({bf p}) ge log(n)left(1 - frac{n}{log n}sum q_i^2 right) $$



Of course, the bound is useless if $sum q_i^2ge frac{log(n)}{n} $.






share|cite|improve this answer














Defining $p_i=1/n+q_i$ we get (using nats):



$$begin{align}
H({bf p}) &=-sum p_i log(p_i)\
&=-sum (1/n +q_i) log(1/n +q_i)\
&=-sum ( 1/n +q_i) [log(1/n ) + log(1+ n q_i )]\
&= log(n) -sum ( 1/n +q_i) log(1+ n q_i)\
&ge log(n) -sum ( 1/n +q_i) n q_i\
& = log(n) - nsum q_i^2\
& = log(n) - n , lVert{bf p}- 1/nrVert^2_2\
end{align}$$



Or, if you prefer



$$ H({bf p}) ge log(n)left(1 - frac{n}{log n}sum q_i^2 right) $$



Of course, the bound is useless if $sum q_i^2ge frac{log(n)}{n} $.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








edited Dec 21 '18 at 10:45

























answered Dec 20 '18 at 23:11









leonbloy

40.4k645107




40.4k645107








  • 2




    As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
    – Clement C.
    Dec 21 '18 at 1:29










  • That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
    – Thomas Ahle
    Dec 21 '18 at 11:28










  • It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
    – leonbloy
    Dec 21 '18 at 11:33












  • @leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
    – Thomas Ahle
    Dec 21 '18 at 13:08






  • 1




    You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
    – Clement C.
    Dec 21 '18 at 16:34
















  • 2




    As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
    – Clement C.
    Dec 21 '18 at 1:29










  • That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
    – Thomas Ahle
    Dec 21 '18 at 11:28










  • It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
    – leonbloy
    Dec 21 '18 at 11:33












  • @leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
    – Thomas Ahle
    Dec 21 '18 at 13:08






  • 1




    You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
    – Clement C.
    Dec 21 '18 at 16:34










2




2




As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
– Clement C.
Dec 21 '18 at 1:29




As a side note, $$lVert mathbf{p} - 1/nrVert_2^2 = lVert mathbf{p}rVert_2^2 - 1/n$$ (in case the OP insists on having $n lVert mathbf{p}rVert_2^2 - 1$ instead of $nlVert mathbf{p} - 1/nrVert_2^2$).
– Clement C.
Dec 21 '18 at 1:29












That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
– Thomas Ahle
Dec 21 '18 at 11:28




That's a nice derivation. I suppose I could try to use some stronger upper bounds to $log(1+n q_i)$, but it doesn't seem like there is any hope of improving $n/log n$ to something closer to 2?
– Thomas Ahle
Dec 21 '18 at 11:28












It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
– leonbloy
Dec 21 '18 at 11:33






It seems doubtful to me... I also wonder how you could conjecture a bound like $c_ngefrac{2 n (log n/2)}{(n-2) log n}=2(1-O(1/log n))$ experimenting only with $n=3$...
– leonbloy
Dec 21 '18 at 11:33














@leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
– Thomas Ahle
Dec 21 '18 at 13:08




@leonbloy I know, it's a bit tongue in cheek, but for $n=3$ I can plot it, and it seemed that the worst case was when all $p=0$ except for two with $p=1/2$, so I optimized for that case, and then checked numerically for larger $n$.
– Thomas Ahle
Dec 21 '18 at 13:08




1




1




You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
– Clement C.
Dec 21 '18 at 16:34






You can also check it $c_n$ cannot be a constant by taken another extreme example, $mathbf{p}$ uniform on a subset of $n/2$ element (and $0$ elsewhere). Then $lVert mathbf{p} - 1/nrVert_2^2 = 1/n$, so you want $$logfrac{n}{2}geq log nleft(1 - frac{c_n}{n}right)$$
– Clement C.
Dec 21 '18 at 16:34













4














Let $(X,upsilon)$ be a finite measure space. Let $sigma=frac{upsilon}{V}$ be the uniform probability distribution on $X$ ($V=upsilon(X)$). Let $rho$ be an absolutely continuous probability distribution with density $p$. Then the inequality
$$begin{align}-h(p)+ln V&leq VlVert p-tfrac{1}{V}rVert_2^2\
h(p)&stackrel{mathrm{def}}{=}-int_X p(x)ln p(x)mathrm{d}upsilon_{x}\
lVert p-qrVert^2_2&stackrel{mathrm{def}}{=}int_Xlvert p(x)-q(x)rvert^2mathrm{d}upsilon_x
end{align}$$

is exactly the inequality between the $chi^2$-divergence and the KL divergence
$$begin{align}D(rhoparallelsigma)&leq chi^2(rhoparallelsigma)text{,}\
D(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}lnfrac{mathrm{d}rho}{mathrm{d}sigma}-frac{mathrm{d}rho}{mathrm{d}sigma}+1right)mathrm{d}sigma_x\
chi^2(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}-1right)^2mathrm{d}sigma_xtext{;}
end{align}$$

this inequality in turn follows from
$$tln t - t + 1 leq (t-1)^2text{.}$$






share|cite|improve this answer





















  • Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
    – stochasticboy321
    Dec 23 '18 at 0:37










  • @stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
    – Clement C.
    Dec 23 '18 at 22:18






  • 2




    @ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
    – stochasticboy321
    Dec 24 '18 at 18:46












  • @stochasticboy321 If you made that an answer, I'd upvote it.
    – Clement C.
    Dec 25 '18 at 0:04
















4














Let $(X,upsilon)$ be a finite measure space. Let $sigma=frac{upsilon}{V}$ be the uniform probability distribution on $X$ ($V=upsilon(X)$). Let $rho$ be an absolutely continuous probability distribution with density $p$. Then the inequality
$$begin{align}-h(p)+ln V&leq VlVert p-tfrac{1}{V}rVert_2^2\
h(p)&stackrel{mathrm{def}}{=}-int_X p(x)ln p(x)mathrm{d}upsilon_{x}\
lVert p-qrVert^2_2&stackrel{mathrm{def}}{=}int_Xlvert p(x)-q(x)rvert^2mathrm{d}upsilon_x
end{align}$$

is exactly the inequality between the $chi^2$-divergence and the KL divergence
$$begin{align}D(rhoparallelsigma)&leq chi^2(rhoparallelsigma)text{,}\
D(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}lnfrac{mathrm{d}rho}{mathrm{d}sigma}-frac{mathrm{d}rho}{mathrm{d}sigma}+1right)mathrm{d}sigma_x\
chi^2(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}-1right)^2mathrm{d}sigma_xtext{;}
end{align}$$

this inequality in turn follows from
$$tln t - t + 1 leq (t-1)^2text{.}$$






share|cite|improve this answer





















  • Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
    – stochasticboy321
    Dec 23 '18 at 0:37










  • @stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
    – Clement C.
    Dec 23 '18 at 22:18






  • 2




    @ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
    – stochasticboy321
    Dec 24 '18 at 18:46












  • @stochasticboy321 If you made that an answer, I'd upvote it.
    – Clement C.
    Dec 25 '18 at 0:04














4












4








4






Let $(X,upsilon)$ be a finite measure space. Let $sigma=frac{upsilon}{V}$ be the uniform probability distribution on $X$ ($V=upsilon(X)$). Let $rho$ be an absolutely continuous probability distribution with density $p$. Then the inequality
$$begin{align}-h(p)+ln V&leq VlVert p-tfrac{1}{V}rVert_2^2\
h(p)&stackrel{mathrm{def}}{=}-int_X p(x)ln p(x)mathrm{d}upsilon_{x}\
lVert p-qrVert^2_2&stackrel{mathrm{def}}{=}int_Xlvert p(x)-q(x)rvert^2mathrm{d}upsilon_x
end{align}$$

is exactly the inequality between the $chi^2$-divergence and the KL divergence
$$begin{align}D(rhoparallelsigma)&leq chi^2(rhoparallelsigma)text{,}\
D(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}lnfrac{mathrm{d}rho}{mathrm{d}sigma}-frac{mathrm{d}rho}{mathrm{d}sigma}+1right)mathrm{d}sigma_x\
chi^2(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}-1right)^2mathrm{d}sigma_xtext{;}
end{align}$$

this inequality in turn follows from
$$tln t - t + 1 leq (t-1)^2text{.}$$






share|cite|improve this answer












Let $(X,upsilon)$ be a finite measure space. Let $sigma=frac{upsilon}{V}$ be the uniform probability distribution on $X$ ($V=upsilon(X)$). Let $rho$ be an absolutely continuous probability distribution with density $p$. Then the inequality
$$begin{align}-h(p)+ln V&leq VlVert p-tfrac{1}{V}rVert_2^2\
h(p)&stackrel{mathrm{def}}{=}-int_X p(x)ln p(x)mathrm{d}upsilon_{x}\
lVert p-qrVert^2_2&stackrel{mathrm{def}}{=}int_Xlvert p(x)-q(x)rvert^2mathrm{d}upsilon_x
end{align}$$

is exactly the inequality between the $chi^2$-divergence and the KL divergence
$$begin{align}D(rhoparallelsigma)&leq chi^2(rhoparallelsigma)text{,}\
D(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}lnfrac{mathrm{d}rho}{mathrm{d}sigma}-frac{mathrm{d}rho}{mathrm{d}sigma}+1right)mathrm{d}sigma_x\
chi^2(rhoparallelsigma)&stackrel{text{def}}{=}int_Xleft(frac{mathrm{d}rho}{mathrm{d}sigma}-1right)^2mathrm{d}sigma_xtext{;}
end{align}$$

this inequality in turn follows from
$$tln t - t + 1 leq (t-1)^2text{.}$$







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Dec 21 '18 at 1:30









K B Dave

3,262217




3,262217












  • Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
    – stochasticboy321
    Dec 23 '18 at 0:37










  • @stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
    – Clement C.
    Dec 23 '18 at 22:18






  • 2




    @ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
    – stochasticboy321
    Dec 24 '18 at 18:46












  • @stochasticboy321 If you made that an answer, I'd upvote it.
    – Clement C.
    Dec 25 '18 at 0:04


















  • Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
    – stochasticboy321
    Dec 23 '18 at 0:37










  • @stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
    – Clement C.
    Dec 23 '18 at 22:18






  • 2




    @ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
    – stochasticboy321
    Dec 24 '18 at 18:46












  • @stochasticboy321 If you made that an answer, I'd upvote it.
    – Clement C.
    Dec 25 '18 at 0:04
















Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
– stochasticboy321
Dec 23 '18 at 0:37




Doesn't then using $KL le log(1 + chi^2)$ lead to something stronger? Presumably one should get something like $H(P) ge - log |P|_2^2.$
– stochasticboy321
Dec 23 '18 at 0:37












@stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
– Clement C.
Dec 23 '18 at 22:18




@stochasticboy321 that would be the Rényi entropy inequality mentioned in a comment under the OP. It's not really stronger though, more like a different bound.
– Clement C.
Dec 23 '18 at 22:18




2




2




@ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
– stochasticboy321
Dec 24 '18 at 18:46






@ClementC. Oh, missed that. It is strictly an improvement though. One way to see this is by simply noting by the above that the bound correponds to $KLle chi^2,$ which is looser than $KL le log 1+ chi^2$. Also, on noting that $( P -mathbf{1}/n ) perp mathbf{1}/n$, one can rearrange this to $H(P) ge log n - log(1 + | nP - mathbf{1}|_2^2)$. This has the nice advantage of the base of the log not affecting the bound. Of course for $P approx mathbf{1}/n$ these are equivalent, and your tightness analysis stands.
– stochasticboy321
Dec 24 '18 at 18:46














@stochasticboy321 If you made that an answer, I'd upvote it.
– Clement C.
Dec 25 '18 at 0:04




@stochasticboy321 If you made that an answer, I'd upvote it.
– Clement C.
Dec 25 '18 at 0:04











4














To complement Leon Bloy's answer and show the bound he (and K B Dave) obtained cannot be significantly improved upon (i.e., that $c_n = Omega!left(frac{n}{log n}right)$ is necessary): Fix $varepsilon in (0,1]$, and assume without loss of generality that $n=2m$ is even. Define $mathbf{p}^{(varepsilon)}$ as the probability mass function (over ${1,dots,n}$) such that
$$
mathbf{p}^{(varepsilon)}_i = begin{cases} frac{1+varepsilon}{n} & text{ if } i leq m \
frac{1-varepsilon}{n} & text{ if } i > m
end{cases}
$$

Note that
$$begin{align}
H(mathbf{p}^{(varepsilon)}) &= sum_{i=1}^m frac{1+varepsilon}{n}log frac{n}{1+varepsilon}
+ sum_{i=m+1}^n frac{1-varepsilon}{n}log frac{n}{1-varepsilon} \
&= log n - frac{1}{2}left((1+varepsilon)log(1+varepsilon) + (1-varepsilon)log(1-varepsilon)right) \
&= log n - frac{varepsilon^2}{2} + o(varepsilon^3) tag{1}
end{align}$$

while
$$
lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2 = frac{varepsilon^2}{n} tag{2}
$$

so that
$$
H(mathbf{p}^{(varepsilon)}) = log n left(1 - left(1/2+o_varepsilon(1)right)cdotfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right) tag{3}
$$



If you want to avoid the asymptotics as $varepsilon to 0$, you can still fix $varepsilon = 1$ (for instance) and get that $$H(mathbf{p}^{(varepsilon)}) = log n left(1 - c'_varepsilonfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right)$$ for some constant $c'_varepsilon > 0$.






share|cite|improve this answer





















  • That's very good to know. Thank you!
    – Thomas Ahle
    Dec 21 '18 at 18:22










  • @ThomasAhle You're welcome!
    – Clement C.
    Dec 21 '18 at 18:23
















4














To complement Leon Bloy's answer and show the bound he (and K B Dave) obtained cannot be significantly improved upon (i.e., that $c_n = Omega!left(frac{n}{log n}right)$ is necessary): Fix $varepsilon in (0,1]$, and assume without loss of generality that $n=2m$ is even. Define $mathbf{p}^{(varepsilon)}$ as the probability mass function (over ${1,dots,n}$) such that
$$
mathbf{p}^{(varepsilon)}_i = begin{cases} frac{1+varepsilon}{n} & text{ if } i leq m \
frac{1-varepsilon}{n} & text{ if } i > m
end{cases}
$$

Note that
$$begin{align}
H(mathbf{p}^{(varepsilon)}) &= sum_{i=1}^m frac{1+varepsilon}{n}log frac{n}{1+varepsilon}
+ sum_{i=m+1}^n frac{1-varepsilon}{n}log frac{n}{1-varepsilon} \
&= log n - frac{1}{2}left((1+varepsilon)log(1+varepsilon) + (1-varepsilon)log(1-varepsilon)right) \
&= log n - frac{varepsilon^2}{2} + o(varepsilon^3) tag{1}
end{align}$$

while
$$
lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2 = frac{varepsilon^2}{n} tag{2}
$$

so that
$$
H(mathbf{p}^{(varepsilon)}) = log n left(1 - left(1/2+o_varepsilon(1)right)cdotfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right) tag{3}
$$



If you want to avoid the asymptotics as $varepsilon to 0$, you can still fix $varepsilon = 1$ (for instance) and get that $$H(mathbf{p}^{(varepsilon)}) = log n left(1 - c'_varepsilonfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right)$$ for some constant $c'_varepsilon > 0$.






share|cite|improve this answer





















  • That's very good to know. Thank you!
    – Thomas Ahle
    Dec 21 '18 at 18:22










  • @ThomasAhle You're welcome!
    – Clement C.
    Dec 21 '18 at 18:23














4












4








4






To complement Leon Bloy's answer and show the bound he (and K B Dave) obtained cannot be significantly improved upon (i.e., that $c_n = Omega!left(frac{n}{log n}right)$ is necessary): Fix $varepsilon in (0,1]$, and assume without loss of generality that $n=2m$ is even. Define $mathbf{p}^{(varepsilon)}$ as the probability mass function (over ${1,dots,n}$) such that
$$
mathbf{p}^{(varepsilon)}_i = begin{cases} frac{1+varepsilon}{n} & text{ if } i leq m \
frac{1-varepsilon}{n} & text{ if } i > m
end{cases}
$$

Note that
$$begin{align}
H(mathbf{p}^{(varepsilon)}) &= sum_{i=1}^m frac{1+varepsilon}{n}log frac{n}{1+varepsilon}
+ sum_{i=m+1}^n frac{1-varepsilon}{n}log frac{n}{1-varepsilon} \
&= log n - frac{1}{2}left((1+varepsilon)log(1+varepsilon) + (1-varepsilon)log(1-varepsilon)right) \
&= log n - frac{varepsilon^2}{2} + o(varepsilon^3) tag{1}
end{align}$$

while
$$
lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2 = frac{varepsilon^2}{n} tag{2}
$$

so that
$$
H(mathbf{p}^{(varepsilon)}) = log n left(1 - left(1/2+o_varepsilon(1)right)cdotfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right) tag{3}
$$



If you want to avoid the asymptotics as $varepsilon to 0$, you can still fix $varepsilon = 1$ (for instance) and get that $$H(mathbf{p}^{(varepsilon)}) = log n left(1 - c'_varepsilonfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right)$$ for some constant $c'_varepsilon > 0$.






share|cite|improve this answer












To complement Leon Bloy's answer and show the bound he (and K B Dave) obtained cannot be significantly improved upon (i.e., that $c_n = Omega!left(frac{n}{log n}right)$ is necessary): Fix $varepsilon in (0,1]$, and assume without loss of generality that $n=2m$ is even. Define $mathbf{p}^{(varepsilon)}$ as the probability mass function (over ${1,dots,n}$) such that
$$
mathbf{p}^{(varepsilon)}_i = begin{cases} frac{1+varepsilon}{n} & text{ if } i leq m \
frac{1-varepsilon}{n} & text{ if } i > m
end{cases}
$$

Note that
$$begin{align}
H(mathbf{p}^{(varepsilon)}) &= sum_{i=1}^m frac{1+varepsilon}{n}log frac{n}{1+varepsilon}
+ sum_{i=m+1}^n frac{1-varepsilon}{n}log frac{n}{1-varepsilon} \
&= log n - frac{1}{2}left((1+varepsilon)log(1+varepsilon) + (1-varepsilon)log(1-varepsilon)right) \
&= log n - frac{varepsilon^2}{2} + o(varepsilon^3) tag{1}
end{align}$$

while
$$
lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2 = frac{varepsilon^2}{n} tag{2}
$$

so that
$$
H(mathbf{p}^{(varepsilon)}) = log n left(1 - left(1/2+o_varepsilon(1)right)cdotfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right) tag{3}
$$



If you want to avoid the asymptotics as $varepsilon to 0$, you can still fix $varepsilon = 1$ (for instance) and get that $$H(mathbf{p}^{(varepsilon)}) = log n left(1 - c'_varepsilonfrac{n}{log n}lVert mathbf{p}^{(varepsilon)} - mathbf{u}_nrVert_2^2right)$$ for some constant $c'_varepsilon > 0$.







share|cite|improve this answer












share|cite|improve this answer



share|cite|improve this answer










answered Dec 21 '18 at 18:17









Clement C.

49.7k33886




49.7k33886












  • That's very good to know. Thank you!
    – Thomas Ahle
    Dec 21 '18 at 18:22










  • @ThomasAhle You're welcome!
    – Clement C.
    Dec 21 '18 at 18:23


















  • That's very good to know. Thank you!
    – Thomas Ahle
    Dec 21 '18 at 18:22










  • @ThomasAhle You're welcome!
    – Clement C.
    Dec 21 '18 at 18:23
















That's very good to know. Thank you!
– Thomas Ahle
Dec 21 '18 at 18:22




That's very good to know. Thank you!
– Thomas Ahle
Dec 21 '18 at 18:22












@ThomasAhle You're welcome!
– Clement C.
Dec 21 '18 at 18:23




@ThomasAhle You're welcome!
– Clement C.
Dec 21 '18 at 18:23











3














This is just scribbling down some observations in the comments, to have them more, well, visible.





As observed by K B Dave, the inequality of leonbloy is an instance of $mathrm{KL} le chi^2.$ This can thus be improved by using a tighter inequality of this form. Note that the resulting inequality was mentioned by Clement in the original comments.



We have $$mathrm{KL}(P|Q) = int log frac{mathrm{d}P}{mathrm{d}Q} ,mathrm{d}P le log int left( frac{mathrm{d}P}{mathrm{d}Q}right) ,mathrm{d}P = log int left( frac{mathrm{d}P}{mathrm{d}Q}right)^2 ,mathrm{d}Q = log (1 + chi^2(P|Q)), $$ where the inequality follows by the concavity of $log$ and Jensen's inequality. Applying the above to this case yields $$H(P) = log n - mathrm{KL}(P|mathbf{1}/n) ge log n - log(1 + sum_x frac{(P_x - 1/n)^2}{1/n}) = log n - log (1 + n|P - mathbf{1}/n|_2^2).$$



Note that if $n|P - mathbf{1}/n|_2^2 ll 1,$ we may use $log (1 + x) approx x$ for small values of $x$ to recover the original bound.



We may also state the above bound in the following equivalent way: Observe that $1 + chi^2(P|Q) = mathbb{E}_Q[ ({mathrm{d}P}/{mathrm{d}Q})^2].$ Again plugging in $Q$ uniform on $n$ letters yields that the latter expectation is simply $n|P|_2^2,$ and we get that



$$ H(P) ge - log |P|_2^2,$$ the form stated by Clement. Note that $|P|_2 le 1$ for every distribution, so the form above makes it obvious that the inequalities are non-trivial (in that the RHS is $ge$ 0) for every $P$.






share|cite|improve this answer























  • As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
    – Clement C.
    Dec 27 '18 at 12:54


















3














This is just scribbling down some observations in the comments, to have them more, well, visible.





As observed by K B Dave, the inequality of leonbloy is an instance of $mathrm{KL} le chi^2.$ This can thus be improved by using a tighter inequality of this form. Note that the resulting inequality was mentioned by Clement in the original comments.



We have $$mathrm{KL}(P|Q) = int log frac{mathrm{d}P}{mathrm{d}Q} ,mathrm{d}P le log int left( frac{mathrm{d}P}{mathrm{d}Q}right) ,mathrm{d}P = log int left( frac{mathrm{d}P}{mathrm{d}Q}right)^2 ,mathrm{d}Q = log (1 + chi^2(P|Q)), $$ where the inequality follows by the concavity of $log$ and Jensen's inequality. Applying the above to this case yields $$H(P) = log n - mathrm{KL}(P|mathbf{1}/n) ge log n - log(1 + sum_x frac{(P_x - 1/n)^2}{1/n}) = log n - log (1 + n|P - mathbf{1}/n|_2^2).$$



Note that if $n|P - mathbf{1}/n|_2^2 ll 1,$ we may use $log (1 + x) approx x$ for small values of $x$ to recover the original bound.



We may also state the above bound in the following equivalent way: Observe that $1 + chi^2(P|Q) = mathbb{E}_Q[ ({mathrm{d}P}/{mathrm{d}Q})^2].$ Again plugging in $Q$ uniform on $n$ letters yields that the latter expectation is simply $n|P|_2^2,$ and we get that



$$ H(P) ge - log |P|_2^2,$$ the form stated by Clement. Note that $|P|_2 le 1$ for every distribution, so the form above makes it obvious that the inequalities are non-trivial (in that the RHS is $ge$ 0) for every $P$.






share|cite|improve this answer























  • As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
    – Clement C.
    Dec 27 '18 at 12:54
















3












3








3






This is just scribbling down some observations in the comments, to have them more, well, visible.





As observed by K B Dave, the inequality of leonbloy is an instance of $mathrm{KL} le chi^2.$ This can thus be improved by using a tighter inequality of this form. Note that the resulting inequality was mentioned by Clement in the original comments.



We have $$mathrm{KL}(P|Q) = int log frac{mathrm{d}P}{mathrm{d}Q} ,mathrm{d}P le log int left( frac{mathrm{d}P}{mathrm{d}Q}right) ,mathrm{d}P = log int left( frac{mathrm{d}P}{mathrm{d}Q}right)^2 ,mathrm{d}Q = log (1 + chi^2(P|Q)), $$ where the inequality follows by the concavity of $log$ and Jensen's inequality. Applying the above to this case yields $$H(P) = log n - mathrm{KL}(P|mathbf{1}/n) ge log n - log(1 + sum_x frac{(P_x - 1/n)^2}{1/n}) = log n - log (1 + n|P - mathbf{1}/n|_2^2).$$



Note that if $n|P - mathbf{1}/n|_2^2 ll 1,$ we may use $log (1 + x) approx x$ for small values of $x$ to recover the original bound.



We may also state the above bound in the following equivalent way: Observe that $1 + chi^2(P|Q) = mathbb{E}_Q[ ({mathrm{d}P}/{mathrm{d}Q})^2].$ Again plugging in $Q$ uniform on $n$ letters yields that the latter expectation is simply $n|P|_2^2,$ and we get that



$$ H(P) ge - log |P|_2^2,$$ the form stated by Clement. Note that $|P|_2 le 1$ for every distribution, so the form above makes it obvious that the inequalities are non-trivial (in that the RHS is $ge$ 0) for every $P$.






share|cite|improve this answer














This is just scribbling down some observations in the comments, to have them more, well, visible.





As observed by K B Dave, the inequality of leonbloy is an instance of $mathrm{KL} le chi^2.$ This can thus be improved by using a tighter inequality of this form. Note that the resulting inequality was mentioned by Clement in the original comments.



We have $$mathrm{KL}(P|Q) = int log frac{mathrm{d}P}{mathrm{d}Q} ,mathrm{d}P le log int left( frac{mathrm{d}P}{mathrm{d}Q}right) ,mathrm{d}P = log int left( frac{mathrm{d}P}{mathrm{d}Q}right)^2 ,mathrm{d}Q = log (1 + chi^2(P|Q)), $$ where the inequality follows by the concavity of $log$ and Jensen's inequality. Applying the above to this case yields $$H(P) = log n - mathrm{KL}(P|mathbf{1}/n) ge log n - log(1 + sum_x frac{(P_x - 1/n)^2}{1/n}) = log n - log (1 + n|P - mathbf{1}/n|_2^2).$$



Note that if $n|P - mathbf{1}/n|_2^2 ll 1,$ we may use $log (1 + x) approx x$ for small values of $x$ to recover the original bound.



We may also state the above bound in the following equivalent way: Observe that $1 + chi^2(P|Q) = mathbb{E}_Q[ ({mathrm{d}P}/{mathrm{d}Q})^2].$ Again plugging in $Q$ uniform on $n$ letters yields that the latter expectation is simply $n|P|_2^2,$ and we get that



$$ H(P) ge - log |P|_2^2,$$ the form stated by Clement. Note that $|P|_2 le 1$ for every distribution, so the form above makes it obvious that the inequalities are non-trivial (in that the RHS is $ge$ 0) for every $P$.







share|cite|improve this answer














share|cite|improve this answer



share|cite|improve this answer








answered Dec 25 '18 at 18:09


























community wiki





stochasticboy321













  • As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
    – Clement C.
    Dec 27 '18 at 12:54




















  • As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
    – Clement C.
    Dec 27 '18 at 12:54


















As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
– Clement C.
Dec 27 '18 at 12:54






As a side terminology note, $H(P)$ is the Shannon entropy (Rényi entropy for $alpha=1$) while $-log lVert PrVert_2^2$ is the Rényi entropy of order $alpha=2$. Thus, the inequality follows from monotonicity of Rényi entropies.
– Clement C.
Dec 27 '18 at 12:54




















draft saved

draft discarded




















































Thanks for contributing an answer to Mathematics Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.





Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


Please pay close attention to the following guidance:


  • Please be sure to answer the question. Provide details and share your research!

But avoid



  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3047998%2fentropy-lower-bound-in-terms-of-ell-2-norm%23new-answer', 'question_page');
}
);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown







Popular posts from this blog

Human spaceflight

Can not write log (Is /dev/pts mounted?) - openpty in Ubuntu-on-Windows?

File:DeusFollowingSea.jpg