How can I merge PDF files without duplicating fonts?












1














I need to merge about a 100 PDF files into one where each file uses more or less the same unsubsetted fonts. All the options I have tried so far (pdfunite, gs, etc.) are not intelligent about font duplication and the merged PDF ends up with a 100 copies of the same font and is therefore much larger than it needs to be.



Is there a way to do any one of the following:




  1. Merge the PDFs without duplicating fonts?

  2. De-duplicate the fonts in the PDF later?

  3. Remove fonts from the PDF entirely?


The ideal solution will have a commercial friendly open source license (eg. not APGL).










share|improve this question




















  • 2




    stackoverflow.com/questions/21979200/…
    – Tom Brossman
    Nov 2 '18 at 19:24










  • @TomBrossman iText's PdfSmartCopy that the solution you linked to relies on would have been an option, except for the AGPL license.
    – user2771609
    Nov 2 '18 at 20:23












  • @TomBrossman You are not wrong, but please don't make askubuntu toxic and be polite, you are violating the code of conduct.
    – user2771609
    Nov 3 '18 at 15:38








  • 1




    Thank you for identifying this 'toxic' matter, I suggest you flag any code of conduct breaches you identify to the moderators of this site so they can take a look at them.
    – Tom Brossman
    Nov 3 '18 at 17:33
















1














I need to merge about a 100 PDF files into one where each file uses more or less the same unsubsetted fonts. All the options I have tried so far (pdfunite, gs, etc.) are not intelligent about font duplication and the merged PDF ends up with a 100 copies of the same font and is therefore much larger than it needs to be.



Is there a way to do any one of the following:




  1. Merge the PDFs without duplicating fonts?

  2. De-duplicate the fonts in the PDF later?

  3. Remove fonts from the PDF entirely?


The ideal solution will have a commercial friendly open source license (eg. not APGL).










share|improve this question




















  • 2




    stackoverflow.com/questions/21979200/…
    – Tom Brossman
    Nov 2 '18 at 19:24










  • @TomBrossman iText's PdfSmartCopy that the solution you linked to relies on would have been an option, except for the AGPL license.
    – user2771609
    Nov 2 '18 at 20:23












  • @TomBrossman You are not wrong, but please don't make askubuntu toxic and be polite, you are violating the code of conduct.
    – user2771609
    Nov 3 '18 at 15:38








  • 1




    Thank you for identifying this 'toxic' matter, I suggest you flag any code of conduct breaches you identify to the moderators of this site so they can take a look at them.
    – Tom Brossman
    Nov 3 '18 at 17:33














1












1








1


1





I need to merge about a 100 PDF files into one where each file uses more or less the same unsubsetted fonts. All the options I have tried so far (pdfunite, gs, etc.) are not intelligent about font duplication and the merged PDF ends up with a 100 copies of the same font and is therefore much larger than it needs to be.



Is there a way to do any one of the following:




  1. Merge the PDFs without duplicating fonts?

  2. De-duplicate the fonts in the PDF later?

  3. Remove fonts from the PDF entirely?


The ideal solution will have a commercial friendly open source license (eg. not APGL).










share|improve this question















I need to merge about a 100 PDF files into one where each file uses more or less the same unsubsetted fonts. All the options I have tried so far (pdfunite, gs, etc.) are not intelligent about font duplication and the merged PDF ends up with a 100 copies of the same font and is therefore much larger than it needs to be.



Is there a way to do any one of the following:




  1. Merge the PDFs without duplicating fonts?

  2. De-duplicate the fonts in the PDF later?

  3. Remove fonts from the PDF entirely?


The ideal solution will have a commercial friendly open source license (eg. not APGL).







pdf ghostscript poppler






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Dec 31 '18 at 19:53









Kurt Pfeifle

1,050711




1,050711










asked Nov 1 '18 at 21:32









user2771609user2771609

1094




1094








  • 2




    stackoverflow.com/questions/21979200/…
    – Tom Brossman
    Nov 2 '18 at 19:24










  • @TomBrossman iText's PdfSmartCopy that the solution you linked to relies on would have been an option, except for the AGPL license.
    – user2771609
    Nov 2 '18 at 20:23












  • @TomBrossman You are not wrong, but please don't make askubuntu toxic and be polite, you are violating the code of conduct.
    – user2771609
    Nov 3 '18 at 15:38








  • 1




    Thank you for identifying this 'toxic' matter, I suggest you flag any code of conduct breaches you identify to the moderators of this site so they can take a look at them.
    – Tom Brossman
    Nov 3 '18 at 17:33














  • 2




    stackoverflow.com/questions/21979200/…
    – Tom Brossman
    Nov 2 '18 at 19:24










  • @TomBrossman iText's PdfSmartCopy that the solution you linked to relies on would have been an option, except for the AGPL license.
    – user2771609
    Nov 2 '18 at 20:23












  • @TomBrossman You are not wrong, but please don't make askubuntu toxic and be polite, you are violating the code of conduct.
    – user2771609
    Nov 3 '18 at 15:38








  • 1




    Thank you for identifying this 'toxic' matter, I suggest you flag any code of conduct breaches you identify to the moderators of this site so they can take a look at them.
    – Tom Brossman
    Nov 3 '18 at 17:33








2




2




stackoverflow.com/questions/21979200/…
– Tom Brossman
Nov 2 '18 at 19:24




stackoverflow.com/questions/21979200/…
– Tom Brossman
Nov 2 '18 at 19:24












@TomBrossman iText's PdfSmartCopy that the solution you linked to relies on would have been an option, except for the AGPL license.
– user2771609
Nov 2 '18 at 20:23






@TomBrossman iText's PdfSmartCopy that the solution you linked to relies on would have been an option, except for the AGPL license.
– user2771609
Nov 2 '18 at 20:23














@TomBrossman You are not wrong, but please don't make askubuntu toxic and be polite, you are violating the code of conduct.
– user2771609
Nov 3 '18 at 15:38






@TomBrossman You are not wrong, but please don't make askubuntu toxic and be polite, you are violating the code of conduct.
– user2771609
Nov 3 '18 at 15:38






1




1




Thank you for identifying this 'toxic' matter, I suggest you flag any code of conduct breaches you identify to the moderators of this site so they can take a look at them.
– Tom Brossman
Nov 3 '18 at 17:33




Thank you for identifying this 'toxic' matter, I suggest you flag any code of conduct breaches you identify to the moderators of this site so they can take a look at them.
– Tom Brossman
Nov 3 '18 at 17:33










1 Answer
1






active

oldest

votes


















0














Contrary to what you say, recent versions of Ghostscript have become quite efficient when it comes to merging multiple PDFs into a single one, and at the same time avoiding to embed an identical font multiple times.



Inputs



Here are the details about 3 input PDFs, which I'll merge into a single output:




for i in {1..3}; do pdffonts ${i}.pdf ; echo ; done

name type encoding emb sub uni object ID
-------------------------- ----------------- ---------------- --- --- --- ---------
Helvetica Type 1C WinAnsi yes no no 8 0

name type encoding emb sub uni object ID
-------------------------- ----------------- ---------------- --- --- --- ---------
Helvetica Type 1C WinAnsi yes no no 8 0

name type encoding emb sub uni object ID
-------------------------- ----------------- ---------------- --- --- --- ---------
Helvetica Type 1C WinAnsi yes no no 8 0


Merging



Now merge these three PDF input files with the help of pdftk.




pdftk 1.pdf 2.pdf 3.pdf cat output merged.pdf


Output



Now check the font status of the output merged.pdf:




pdffonts merged.pdf

name type encoding emb sub uni object ID
-------------------------- ----------------- ---------------- --- --- --- ---------
Helvetica Type 1C WinAnsi yes no no 5 0
Helvetica Type 1C WinAnsi yes no no 14 0
Helvetica Type 1C WinAnsi yes no no 23 0


Ok, not yet there...



Optimize with Ghostscript




gs -o optim.pdf -sDEVICE=pdfwrite merged.pdf

GPL Ghostscript GIT PRERELEASE 9.27 (2018-11-20)
Copyright (C) 2018 Artifex Software, Inc. All rights reserved.
This software comes with NO WARRANTY: see the file PUBLIC for details.
Processing pages 1 through 3.
Page 1
Page 2
Page 3


Check font statuses and file sizes




ls -lh {1..3}.pdf merged.pdf optim.pdf

-rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 1.pdf
-rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 2.pdf
-rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 3.pdf
-rw-r--r-- 1 kurtpfeifle staff 147K Dec 31 20:32 merged.pdf
-rw-r--r-- 1 kurtpfeifle staff 7.5K Dec 31 20:34 optim.pdf


Conclusion



I tested this with Ghostscript v9.25.



If this doesn't work for you, you'll need to...




  1. ...tell us the version of Ghostscript you are using;

  2. ...provide a link to (some of) your input PDFs for more detailed analysis.




I'm aware that this answer does not provide you with a solution that meets exactly your license requirements. -- But your false statement about Ghostscript prompted me to give this answer anyway, so other people interested in this topic can still benefit from it...






share|improve this answer





















    Your Answer








    StackExchange.ready(function() {
    var channelOptions = {
    tags: "".split(" "),
    id: "89"
    };
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function() {
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled) {
    StackExchange.using("snippets", function() {
    createEditor();
    });
    }
    else {
    createEditor();
    }
    });

    function createEditor() {
    StackExchange.prepareEditor({
    heartbeatType: 'answer',
    autoActivateHeartbeat: false,
    convertImagesToLinks: true,
    noModals: true,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    imageUploader: {
    brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
    contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
    allowUrls: true
    },
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    });


    }
    });














    draft saved

    draft discarded


















    StackExchange.ready(
    function () {
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1089320%2fhow-can-i-merge-pdf-files-without-duplicating-fonts%23new-answer', 'question_page');
    }
    );

    Post as a guest















    Required, but never shown

























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes









    0














    Contrary to what you say, recent versions of Ghostscript have become quite efficient when it comes to merging multiple PDFs into a single one, and at the same time avoiding to embed an identical font multiple times.



    Inputs



    Here are the details about 3 input PDFs, which I'll merge into a single output:




    for i in {1..3}; do pdffonts ${i}.pdf ; echo ; done

    name type encoding emb sub uni object ID
    -------------------------- ----------------- ---------------- --- --- --- ---------
    Helvetica Type 1C WinAnsi yes no no 8 0

    name type encoding emb sub uni object ID
    -------------------------- ----------------- ---------------- --- --- --- ---------
    Helvetica Type 1C WinAnsi yes no no 8 0

    name type encoding emb sub uni object ID
    -------------------------- ----------------- ---------------- --- --- --- ---------
    Helvetica Type 1C WinAnsi yes no no 8 0


    Merging



    Now merge these three PDF input files with the help of pdftk.




    pdftk 1.pdf 2.pdf 3.pdf cat output merged.pdf


    Output



    Now check the font status of the output merged.pdf:




    pdffonts merged.pdf

    name type encoding emb sub uni object ID
    -------------------------- ----------------- ---------------- --- --- --- ---------
    Helvetica Type 1C WinAnsi yes no no 5 0
    Helvetica Type 1C WinAnsi yes no no 14 0
    Helvetica Type 1C WinAnsi yes no no 23 0


    Ok, not yet there...



    Optimize with Ghostscript




    gs -o optim.pdf -sDEVICE=pdfwrite merged.pdf

    GPL Ghostscript GIT PRERELEASE 9.27 (2018-11-20)
    Copyright (C) 2018 Artifex Software, Inc. All rights reserved.
    This software comes with NO WARRANTY: see the file PUBLIC for details.
    Processing pages 1 through 3.
    Page 1
    Page 2
    Page 3


    Check font statuses and file sizes




    ls -lh {1..3}.pdf merged.pdf optim.pdf

    -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 1.pdf
    -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 2.pdf
    -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 3.pdf
    -rw-r--r-- 1 kurtpfeifle staff 147K Dec 31 20:32 merged.pdf
    -rw-r--r-- 1 kurtpfeifle staff 7.5K Dec 31 20:34 optim.pdf


    Conclusion



    I tested this with Ghostscript v9.25.



    If this doesn't work for you, you'll need to...




    1. ...tell us the version of Ghostscript you are using;

    2. ...provide a link to (some of) your input PDFs for more detailed analysis.




    I'm aware that this answer does not provide you with a solution that meets exactly your license requirements. -- But your false statement about Ghostscript prompted me to give this answer anyway, so other people interested in this topic can still benefit from it...






    share|improve this answer


























      0














      Contrary to what you say, recent versions of Ghostscript have become quite efficient when it comes to merging multiple PDFs into a single one, and at the same time avoiding to embed an identical font multiple times.



      Inputs



      Here are the details about 3 input PDFs, which I'll merge into a single output:




      for i in {1..3}; do pdffonts ${i}.pdf ; echo ; done

      name type encoding emb sub uni object ID
      -------------------------- ----------------- ---------------- --- --- --- ---------
      Helvetica Type 1C WinAnsi yes no no 8 0

      name type encoding emb sub uni object ID
      -------------------------- ----------------- ---------------- --- --- --- ---------
      Helvetica Type 1C WinAnsi yes no no 8 0

      name type encoding emb sub uni object ID
      -------------------------- ----------------- ---------------- --- --- --- ---------
      Helvetica Type 1C WinAnsi yes no no 8 0


      Merging



      Now merge these three PDF input files with the help of pdftk.




      pdftk 1.pdf 2.pdf 3.pdf cat output merged.pdf


      Output



      Now check the font status of the output merged.pdf:




      pdffonts merged.pdf

      name type encoding emb sub uni object ID
      -------------------------- ----------------- ---------------- --- --- --- ---------
      Helvetica Type 1C WinAnsi yes no no 5 0
      Helvetica Type 1C WinAnsi yes no no 14 0
      Helvetica Type 1C WinAnsi yes no no 23 0


      Ok, not yet there...



      Optimize with Ghostscript




      gs -o optim.pdf -sDEVICE=pdfwrite merged.pdf

      GPL Ghostscript GIT PRERELEASE 9.27 (2018-11-20)
      Copyright (C) 2018 Artifex Software, Inc. All rights reserved.
      This software comes with NO WARRANTY: see the file PUBLIC for details.
      Processing pages 1 through 3.
      Page 1
      Page 2
      Page 3


      Check font statuses and file sizes




      ls -lh {1..3}.pdf merged.pdf optim.pdf

      -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 1.pdf
      -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 2.pdf
      -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 3.pdf
      -rw-r--r-- 1 kurtpfeifle staff 147K Dec 31 20:32 merged.pdf
      -rw-r--r-- 1 kurtpfeifle staff 7.5K Dec 31 20:34 optim.pdf


      Conclusion



      I tested this with Ghostscript v9.25.



      If this doesn't work for you, you'll need to...




      1. ...tell us the version of Ghostscript you are using;

      2. ...provide a link to (some of) your input PDFs for more detailed analysis.




      I'm aware that this answer does not provide you with a solution that meets exactly your license requirements. -- But your false statement about Ghostscript prompted me to give this answer anyway, so other people interested in this topic can still benefit from it...






      share|improve this answer
























        0












        0








        0






        Contrary to what you say, recent versions of Ghostscript have become quite efficient when it comes to merging multiple PDFs into a single one, and at the same time avoiding to embed an identical font multiple times.



        Inputs



        Here are the details about 3 input PDFs, which I'll merge into a single output:




        for i in {1..3}; do pdffonts ${i}.pdf ; echo ; done

        name type encoding emb sub uni object ID
        -------------------------- ----------------- ---------------- --- --- --- ---------
        Helvetica Type 1C WinAnsi yes no no 8 0

        name type encoding emb sub uni object ID
        -------------------------- ----------------- ---------------- --- --- --- ---------
        Helvetica Type 1C WinAnsi yes no no 8 0

        name type encoding emb sub uni object ID
        -------------------------- ----------------- ---------------- --- --- --- ---------
        Helvetica Type 1C WinAnsi yes no no 8 0


        Merging



        Now merge these three PDF input files with the help of pdftk.




        pdftk 1.pdf 2.pdf 3.pdf cat output merged.pdf


        Output



        Now check the font status of the output merged.pdf:




        pdffonts merged.pdf

        name type encoding emb sub uni object ID
        -------------------------- ----------------- ---------------- --- --- --- ---------
        Helvetica Type 1C WinAnsi yes no no 5 0
        Helvetica Type 1C WinAnsi yes no no 14 0
        Helvetica Type 1C WinAnsi yes no no 23 0


        Ok, not yet there...



        Optimize with Ghostscript




        gs -o optim.pdf -sDEVICE=pdfwrite merged.pdf

        GPL Ghostscript GIT PRERELEASE 9.27 (2018-11-20)
        Copyright (C) 2018 Artifex Software, Inc. All rights reserved.
        This software comes with NO WARRANTY: see the file PUBLIC for details.
        Processing pages 1 through 3.
        Page 1
        Page 2
        Page 3


        Check font statuses and file sizes




        ls -lh {1..3}.pdf merged.pdf optim.pdf

        -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 1.pdf
        -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 2.pdf
        -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 3.pdf
        -rw-r--r-- 1 kurtpfeifle staff 147K Dec 31 20:32 merged.pdf
        -rw-r--r-- 1 kurtpfeifle staff 7.5K Dec 31 20:34 optim.pdf


        Conclusion



        I tested this with Ghostscript v9.25.



        If this doesn't work for you, you'll need to...




        1. ...tell us the version of Ghostscript you are using;

        2. ...provide a link to (some of) your input PDFs for more detailed analysis.




        I'm aware that this answer does not provide you with a solution that meets exactly your license requirements. -- But your false statement about Ghostscript prompted me to give this answer anyway, so other people interested in this topic can still benefit from it...






        share|improve this answer












        Contrary to what you say, recent versions of Ghostscript have become quite efficient when it comes to merging multiple PDFs into a single one, and at the same time avoiding to embed an identical font multiple times.



        Inputs



        Here are the details about 3 input PDFs, which I'll merge into a single output:




        for i in {1..3}; do pdffonts ${i}.pdf ; echo ; done

        name type encoding emb sub uni object ID
        -------------------------- ----------------- ---------------- --- --- --- ---------
        Helvetica Type 1C WinAnsi yes no no 8 0

        name type encoding emb sub uni object ID
        -------------------------- ----------------- ---------------- --- --- --- ---------
        Helvetica Type 1C WinAnsi yes no no 8 0

        name type encoding emb sub uni object ID
        -------------------------- ----------------- ---------------- --- --- --- ---------
        Helvetica Type 1C WinAnsi yes no no 8 0


        Merging



        Now merge these three PDF input files with the help of pdftk.




        pdftk 1.pdf 2.pdf 3.pdf cat output merged.pdf


        Output



        Now check the font status of the output merged.pdf:




        pdffonts merged.pdf

        name type encoding emb sub uni object ID
        -------------------------- ----------------- ---------------- --- --- --- ---------
        Helvetica Type 1C WinAnsi yes no no 5 0
        Helvetica Type 1C WinAnsi yes no no 14 0
        Helvetica Type 1C WinAnsi yes no no 23 0


        Ok, not yet there...



        Optimize with Ghostscript




        gs -o optim.pdf -sDEVICE=pdfwrite merged.pdf

        GPL Ghostscript GIT PRERELEASE 9.27 (2018-11-20)
        Copyright (C) 2018 Artifex Software, Inc. All rights reserved.
        This software comes with NO WARRANTY: see the file PUBLIC for details.
        Processing pages 1 through 3.
        Page 1
        Page 2
        Page 3


        Check font statuses and file sizes




        ls -lh {1..3}.pdf merged.pdf optim.pdf

        -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 1.pdf
        -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 2.pdf
        -rw-r--r-- 1 kurtpfeifle staff 51K Dec 31 20:25 3.pdf
        -rw-r--r-- 1 kurtpfeifle staff 147K Dec 31 20:32 merged.pdf
        -rw-r--r-- 1 kurtpfeifle staff 7.5K Dec 31 20:34 optim.pdf


        Conclusion



        I tested this with Ghostscript v9.25.



        If this doesn't work for you, you'll need to...




        1. ...tell us the version of Ghostscript you are using;

        2. ...provide a link to (some of) your input PDFs for more detailed analysis.




        I'm aware that this answer does not provide you with a solution that meets exactly your license requirements. -- But your false statement about Ghostscript prompted me to give this answer anyway, so other people interested in this topic can still benefit from it...







        share|improve this answer












        share|improve this answer



        share|improve this answer










        answered Dec 31 '18 at 19:43









        Kurt PfeifleKurt Pfeifle

        1,050711




        1,050711






























            draft saved

            draft discarded




















































            Thanks for contributing an answer to Ask Ubuntu!


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.





            Some of your past answers have not been well-received, and you're in danger of being blocked from answering.


            Please pay close attention to the following guidance:


            • Please be sure to answer the question. Provide details and share your research!

            But avoid



            • Asking for help, clarification, or responding to other answers.

            • Making statements based on opinion; back them up with references or personal experience.


            To learn more, see our tips on writing great answers.




            draft saved


            draft discarded














            StackExchange.ready(
            function () {
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1089320%2fhow-can-i-merge-pdf-files-without-duplicating-fonts%23new-answer', 'question_page');
            }
            );

            Post as a guest















            Required, but never shown





















































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown

































            Required, but never shown














            Required, but never shown












            Required, but never shown







            Required, but never shown







            Popular posts from this blog

            Human spaceflight

            Can not write log (Is /dev/pts mounted?) - openpty in Ubuntu-on-Windows?

            張江高科駅