What does it mean for an expression to be an orthogonal projection onto the latent space












1












$begingroup$


On page 576 in Bishop's PRML, it is stated that



$$
(mathbf{W}_{ML}^Tmathbf{W}_{ML})^{-1}mathbf{W}^T_{ML}(mathbf{x} - mathbf{bar{x}})
$$



represents an orthogonal projection of the data point $mathbf{x}$ onto the latent space.



$mathbf{W}$ is a $Dtimes M$ matrix. $mathbf{x}$ is $Dtimes 1$. The latent space is $M$-dimensional.



What does it mean that the expression represents an orthogonal projection (and how do we know that it is one) onto the latent space and why is it important?










share|cite|improve this question











$endgroup$

















    1












    $begingroup$


    On page 576 in Bishop's PRML, it is stated that



    $$
    (mathbf{W}_{ML}^Tmathbf{W}_{ML})^{-1}mathbf{W}^T_{ML}(mathbf{x} - mathbf{bar{x}})
    $$



    represents an orthogonal projection of the data point $mathbf{x}$ onto the latent space.



    $mathbf{W}$ is a $Dtimes M$ matrix. $mathbf{x}$ is $Dtimes 1$. The latent space is $M$-dimensional.



    What does it mean that the expression represents an orthogonal projection (and how do we know that it is one) onto the latent space and why is it important?










    share|cite|improve this question











    $endgroup$















      1












      1








      1





      $begingroup$


      On page 576 in Bishop's PRML, it is stated that



      $$
      (mathbf{W}_{ML}^Tmathbf{W}_{ML})^{-1}mathbf{W}^T_{ML}(mathbf{x} - mathbf{bar{x}})
      $$



      represents an orthogonal projection of the data point $mathbf{x}$ onto the latent space.



      $mathbf{W}$ is a $Dtimes M$ matrix. $mathbf{x}$ is $Dtimes 1$. The latent space is $M$-dimensional.



      What does it mean that the expression represents an orthogonal projection (and how do we know that it is one) onto the latent space and why is it important?










      share|cite|improve this question











      $endgroup$




      On page 576 in Bishop's PRML, it is stated that



      $$
      (mathbf{W}_{ML}^Tmathbf{W}_{ML})^{-1}mathbf{W}^T_{ML}(mathbf{x} - mathbf{bar{x}})
      $$



      represents an orthogonal projection of the data point $mathbf{x}$ onto the latent space.



      $mathbf{W}$ is a $Dtimes M$ matrix. $mathbf{x}$ is $Dtimes 1$. The latent space is $M$-dimensional.



      What does it mean that the expression represents an orthogonal projection (and how do we know that it is one) onto the latent space and why is it important?







      linear-algebra machine-learning






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Jan 2 at 14:03







      Sandi

















      asked Jan 2 at 13:48









      SandiSandi

      255112




      255112






















          2 Answers
          2






          active

          oldest

          votes


















          0












          $begingroup$

          This question arises in the context of principal component analysis. To keep our notation simple, let's assume that our dataset is centred at the origin, i.e. $mathbf {bar x }= 0$. The goal is to approximate a datapoint $mathbf x in mathbb R^n$ as closely as possible using a point chosen from the $d$-dimensional subspace spanned by the columns of an $n times d$ matrix $mathbf W$. In other words, we wish to find
          $$ mathbf x_star := mathbf Wmathbf z_star,$$



          where
          $$ mathbf z_{star}:= {rm argmin}_{mathbf z in mathbb R^d}| mathbf x - mathbf W mathbf z|^2.$$



          [This $mathbf x_star$ is called the "orthogonal projection" of $mathbf x$ onto the hyperplane spanned by the columns of $mathbf W$, because, if computed correctly, $mathbf x_star$ will be orthogonal to the displacement vector from $mathbf x_star$ to $mathbf x$. Geometrically, this is quite intuitive.]



          Let's go ahead and compute this approximation. First, let's find $mathbf z_star$, by differentiation:
          $$ mathbf 0 = left( frac{partial}{partial mathbf z} | mathbf x - mathbf W mathbf z|^2 right)vert_{{mathbf z = mathbf z_star}} = -2mathbf W^T(mathbf x - mathbf Wmathbf z_star) implies mathbf z_star = (mathbf W^T mathbf W)^{-1}mathbf W^T mathbf x.$$



          In machine learning, this $mathbf z_{star}$ is the latent vector for this datapoint, and corresponds to the expression in your question (assuming $mathbf {bar x} = 0$). The approximation $mathbf x_star$ is then given by $mathbf x_star = mathbf W mathbf z_star$.



          [Just for fun, let's verify that $mathbf x_star$ and $mathbf x - mathbf x - mathbf x_star$ are orthogonal, justifying the phrase "orthogonal projection":
          begin{align} mathbf x_star . (mathbf x - mathbf x_star) &= mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^Tleft( mathbf x-mathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf xright) \ &= mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf x- mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf x \ &= 0. end{align}
          ]






          share|cite|improve this answer











          $endgroup$





















            0












            $begingroup$

            I don't know what machine learning or a latent space is, but I understand orthogonal projections. x is a data point and you want to find the closest point y in the latent space. This is done by orthogonal projection. Geometrically, you can write x as a linear combination of ($M$+1) orthogonal vectors, $M$ of which are in the latent space. The remaining vector, call it z is the orthogonal complement of x.



            Note $midmid$z$midmid$ is the distance from x to the latent space.






            share|cite|improve this answer









            $endgroup$













              Your Answer





              StackExchange.ifUsing("editor", function () {
              return StackExchange.using("mathjaxEditing", function () {
              StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix) {
              StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
              });
              });
              }, "mathjax-editing");

              StackExchange.ready(function() {
              var channelOptions = {
              tags: "".split(" "),
              id: "69"
              };
              initTagRenderer("".split(" "), "".split(" "), channelOptions);

              StackExchange.using("externalEditor", function() {
              // Have to fire editor after snippets, if snippets enabled
              if (StackExchange.settings.snippets.snippetsEnabled) {
              StackExchange.using("snippets", function() {
              createEditor();
              });
              }
              else {
              createEditor();
              }
              });

              function createEditor() {
              StackExchange.prepareEditor({
              heartbeatType: 'answer',
              autoActivateHeartbeat: false,
              convertImagesToLinks: true,
              noModals: true,
              showLowRepImageUploadWarning: true,
              reputationToPostImages: 10,
              bindNavPrevention: true,
              postfix: "",
              imageUploader: {
              brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
              contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
              allowUrls: true
              },
              noCode: true, onDemand: true,
              discardSelector: ".discard-answer"
              ,immediatelyShowMarkdownHelp:true
              });


              }
              });














              draft saved

              draft discarded


















              StackExchange.ready(
              function () {
              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3059497%2fwhat-does-it-mean-for-an-expression-to-be-an-orthogonal-projection-onto-the-late%23new-answer', 'question_page');
              }
              );

              Post as a guest















              Required, but never shown

























              2 Answers
              2






              active

              oldest

              votes








              2 Answers
              2






              active

              oldest

              votes









              active

              oldest

              votes






              active

              oldest

              votes









              0












              $begingroup$

              This question arises in the context of principal component analysis. To keep our notation simple, let's assume that our dataset is centred at the origin, i.e. $mathbf {bar x }= 0$. The goal is to approximate a datapoint $mathbf x in mathbb R^n$ as closely as possible using a point chosen from the $d$-dimensional subspace spanned by the columns of an $n times d$ matrix $mathbf W$. In other words, we wish to find
              $$ mathbf x_star := mathbf Wmathbf z_star,$$



              where
              $$ mathbf z_{star}:= {rm argmin}_{mathbf z in mathbb R^d}| mathbf x - mathbf W mathbf z|^2.$$



              [This $mathbf x_star$ is called the "orthogonal projection" of $mathbf x$ onto the hyperplane spanned by the columns of $mathbf W$, because, if computed correctly, $mathbf x_star$ will be orthogonal to the displacement vector from $mathbf x_star$ to $mathbf x$. Geometrically, this is quite intuitive.]



              Let's go ahead and compute this approximation. First, let's find $mathbf z_star$, by differentiation:
              $$ mathbf 0 = left( frac{partial}{partial mathbf z} | mathbf x - mathbf W mathbf z|^2 right)vert_{{mathbf z = mathbf z_star}} = -2mathbf W^T(mathbf x - mathbf Wmathbf z_star) implies mathbf z_star = (mathbf W^T mathbf W)^{-1}mathbf W^T mathbf x.$$



              In machine learning, this $mathbf z_{star}$ is the latent vector for this datapoint, and corresponds to the expression in your question (assuming $mathbf {bar x} = 0$). The approximation $mathbf x_star$ is then given by $mathbf x_star = mathbf W mathbf z_star$.



              [Just for fun, let's verify that $mathbf x_star$ and $mathbf x - mathbf x - mathbf x_star$ are orthogonal, justifying the phrase "orthogonal projection":
              begin{align} mathbf x_star . (mathbf x - mathbf x_star) &= mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^Tleft( mathbf x-mathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf xright) \ &= mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf x- mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf x \ &= 0. end{align}
              ]






              share|cite|improve this answer











              $endgroup$


















                0












                $begingroup$

                This question arises in the context of principal component analysis. To keep our notation simple, let's assume that our dataset is centred at the origin, i.e. $mathbf {bar x }= 0$. The goal is to approximate a datapoint $mathbf x in mathbb R^n$ as closely as possible using a point chosen from the $d$-dimensional subspace spanned by the columns of an $n times d$ matrix $mathbf W$. In other words, we wish to find
                $$ mathbf x_star := mathbf Wmathbf z_star,$$



                where
                $$ mathbf z_{star}:= {rm argmin}_{mathbf z in mathbb R^d}| mathbf x - mathbf W mathbf z|^2.$$



                [This $mathbf x_star$ is called the "orthogonal projection" of $mathbf x$ onto the hyperplane spanned by the columns of $mathbf W$, because, if computed correctly, $mathbf x_star$ will be orthogonal to the displacement vector from $mathbf x_star$ to $mathbf x$. Geometrically, this is quite intuitive.]



                Let's go ahead and compute this approximation. First, let's find $mathbf z_star$, by differentiation:
                $$ mathbf 0 = left( frac{partial}{partial mathbf z} | mathbf x - mathbf W mathbf z|^2 right)vert_{{mathbf z = mathbf z_star}} = -2mathbf W^T(mathbf x - mathbf Wmathbf z_star) implies mathbf z_star = (mathbf W^T mathbf W)^{-1}mathbf W^T mathbf x.$$



                In machine learning, this $mathbf z_{star}$ is the latent vector for this datapoint, and corresponds to the expression in your question (assuming $mathbf {bar x} = 0$). The approximation $mathbf x_star$ is then given by $mathbf x_star = mathbf W mathbf z_star$.



                [Just for fun, let's verify that $mathbf x_star$ and $mathbf x - mathbf x - mathbf x_star$ are orthogonal, justifying the phrase "orthogonal projection":
                begin{align} mathbf x_star . (mathbf x - mathbf x_star) &= mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^Tleft( mathbf x-mathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf xright) \ &= mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf x- mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf x \ &= 0. end{align}
                ]






                share|cite|improve this answer











                $endgroup$
















                  0












                  0








                  0





                  $begingroup$

                  This question arises in the context of principal component analysis. To keep our notation simple, let's assume that our dataset is centred at the origin, i.e. $mathbf {bar x }= 0$. The goal is to approximate a datapoint $mathbf x in mathbb R^n$ as closely as possible using a point chosen from the $d$-dimensional subspace spanned by the columns of an $n times d$ matrix $mathbf W$. In other words, we wish to find
                  $$ mathbf x_star := mathbf Wmathbf z_star,$$



                  where
                  $$ mathbf z_{star}:= {rm argmin}_{mathbf z in mathbb R^d}| mathbf x - mathbf W mathbf z|^2.$$



                  [This $mathbf x_star$ is called the "orthogonal projection" of $mathbf x$ onto the hyperplane spanned by the columns of $mathbf W$, because, if computed correctly, $mathbf x_star$ will be orthogonal to the displacement vector from $mathbf x_star$ to $mathbf x$. Geometrically, this is quite intuitive.]



                  Let's go ahead and compute this approximation. First, let's find $mathbf z_star$, by differentiation:
                  $$ mathbf 0 = left( frac{partial}{partial mathbf z} | mathbf x - mathbf W mathbf z|^2 right)vert_{{mathbf z = mathbf z_star}} = -2mathbf W^T(mathbf x - mathbf Wmathbf z_star) implies mathbf z_star = (mathbf W^T mathbf W)^{-1}mathbf W^T mathbf x.$$



                  In machine learning, this $mathbf z_{star}$ is the latent vector for this datapoint, and corresponds to the expression in your question (assuming $mathbf {bar x} = 0$). The approximation $mathbf x_star$ is then given by $mathbf x_star = mathbf W mathbf z_star$.



                  [Just for fun, let's verify that $mathbf x_star$ and $mathbf x - mathbf x - mathbf x_star$ are orthogonal, justifying the phrase "orthogonal projection":
                  begin{align} mathbf x_star . (mathbf x - mathbf x_star) &= mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^Tleft( mathbf x-mathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf xright) \ &= mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf x- mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf x \ &= 0. end{align}
                  ]






                  share|cite|improve this answer











                  $endgroup$



                  This question arises in the context of principal component analysis. To keep our notation simple, let's assume that our dataset is centred at the origin, i.e. $mathbf {bar x }= 0$. The goal is to approximate a datapoint $mathbf x in mathbb R^n$ as closely as possible using a point chosen from the $d$-dimensional subspace spanned by the columns of an $n times d$ matrix $mathbf W$. In other words, we wish to find
                  $$ mathbf x_star := mathbf Wmathbf z_star,$$



                  where
                  $$ mathbf z_{star}:= {rm argmin}_{mathbf z in mathbb R^d}| mathbf x - mathbf W mathbf z|^2.$$



                  [This $mathbf x_star$ is called the "orthogonal projection" of $mathbf x$ onto the hyperplane spanned by the columns of $mathbf W$, because, if computed correctly, $mathbf x_star$ will be orthogonal to the displacement vector from $mathbf x_star$ to $mathbf x$. Geometrically, this is quite intuitive.]



                  Let's go ahead and compute this approximation. First, let's find $mathbf z_star$, by differentiation:
                  $$ mathbf 0 = left( frac{partial}{partial mathbf z} | mathbf x - mathbf W mathbf z|^2 right)vert_{{mathbf z = mathbf z_star}} = -2mathbf W^T(mathbf x - mathbf Wmathbf z_star) implies mathbf z_star = (mathbf W^T mathbf W)^{-1}mathbf W^T mathbf x.$$



                  In machine learning, this $mathbf z_{star}$ is the latent vector for this datapoint, and corresponds to the expression in your question (assuming $mathbf {bar x} = 0$). The approximation $mathbf x_star$ is then given by $mathbf x_star = mathbf W mathbf z_star$.



                  [Just for fun, let's verify that $mathbf x_star$ and $mathbf x - mathbf x - mathbf x_star$ are orthogonal, justifying the phrase "orthogonal projection":
                  begin{align} mathbf x_star . (mathbf x - mathbf x_star) &= mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^Tleft( mathbf x-mathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf xright) \ &= mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf x- mathbf xmathbf W(mathbf W^Tmathbf W)^{-1}mathbf W^T mathbf x \ &= 0. end{align}
                  ]







                  share|cite|improve this answer














                  share|cite|improve this answer



                  share|cite|improve this answer








                  edited Jan 6 at 22:25

























                  answered Jan 6 at 22:20









                  Kenny WongKenny Wong

                  18.6k21439




                  18.6k21439























                      0












                      $begingroup$

                      I don't know what machine learning or a latent space is, but I understand orthogonal projections. x is a data point and you want to find the closest point y in the latent space. This is done by orthogonal projection. Geometrically, you can write x as a linear combination of ($M$+1) orthogonal vectors, $M$ of which are in the latent space. The remaining vector, call it z is the orthogonal complement of x.



                      Note $midmid$z$midmid$ is the distance from x to the latent space.






                      share|cite|improve this answer









                      $endgroup$


















                        0












                        $begingroup$

                        I don't know what machine learning or a latent space is, but I understand orthogonal projections. x is a data point and you want to find the closest point y in the latent space. This is done by orthogonal projection. Geometrically, you can write x as a linear combination of ($M$+1) orthogonal vectors, $M$ of which are in the latent space. The remaining vector, call it z is the orthogonal complement of x.



                        Note $midmid$z$midmid$ is the distance from x to the latent space.






                        share|cite|improve this answer









                        $endgroup$
















                          0












                          0








                          0





                          $begingroup$

                          I don't know what machine learning or a latent space is, but I understand orthogonal projections. x is a data point and you want to find the closest point y in the latent space. This is done by orthogonal projection. Geometrically, you can write x as a linear combination of ($M$+1) orthogonal vectors, $M$ of which are in the latent space. The remaining vector, call it z is the orthogonal complement of x.



                          Note $midmid$z$midmid$ is the distance from x to the latent space.






                          share|cite|improve this answer









                          $endgroup$



                          I don't know what machine learning or a latent space is, but I understand orthogonal projections. x is a data point and you want to find the closest point y in the latent space. This is done by orthogonal projection. Geometrically, you can write x as a linear combination of ($M$+1) orthogonal vectors, $M$ of which are in the latent space. The remaining vector, call it z is the orthogonal complement of x.



                          Note $midmid$z$midmid$ is the distance from x to the latent space.







                          share|cite|improve this answer












                          share|cite|improve this answer



                          share|cite|improve this answer










                          answered Jan 2 at 14:08









                          Joel PereiraJoel Pereira

                          74519




                          74519






























                              draft saved

                              draft discarded




















































                              Thanks for contributing an answer to Mathematics Stack Exchange!


                              • Please be sure to answer the question. Provide details and share your research!

                              But avoid



                              • Asking for help, clarification, or responding to other answers.

                              • Making statements based on opinion; back them up with references or personal experience.


                              Use MathJax to format equations. MathJax reference.


                              To learn more, see our tips on writing great answers.




                              draft saved


                              draft discarded














                              StackExchange.ready(
                              function () {
                              StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmath.stackexchange.com%2fquestions%2f3059497%2fwhat-does-it-mean-for-an-expression-to-be-an-orthogonal-projection-onto-the-late%23new-answer', 'question_page');
                              }
                              );

                              Post as a guest















                              Required, but never shown





















































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown

































                              Required, but never shown














                              Required, but never shown












                              Required, but never shown







                              Required, but never shown







                              Popular posts from this blog

                              Human spaceflight

                              Can not write log (Is /dev/pts mounted?) - openpty in Ubuntu-on-Windows?

                              File:DeusFollowingSea.jpg