#1707 Move binary blobs out of Git

Open
opened 3 months ago by repentinus · 2 comments
Owner

We have hundreds of megabytes of binary blobs in our git repo. Some of them have been moved to the picture base, the rest should also be moved out of our VCS tree. When done, the repository history should be rewritten to eliminate these blobs. This would significantly speed up new clones of the repo and likely also reduce the load on our servers.

=== All extensions by reverse size ===
Format: unpacked size, packed size, date deleted, extension name
   420825177  407494099 <present>  .jpg
   282831498  240183426 <present>  .png
   200025350  140198446 <present>  .pdf
    40690120   40291361 <present>  .JPG
   554714454   33399215 <present>  .xhtml
    32919326   32238870 2007-07-01 .avi
    21956981   20637520 <present>  .jpeg
    29936321   16258652 2020-04-29 .odp
    12134889   12036549 2007-07-01 .ogg
    11105875   11099316 <present>  .gz
    79663199    7967165 <present>  .svg
   137773355    7691499 <present>  .xml
     6788476    6743746 <present>  .gif
    41093062    4787443 <present>  .xcf
     3304687    2843225 <present>  .odt
     1797460    1796178 2006-05-09 .zip
     6083434    1650096 <present>  .js
     1637068    1631693 <present>  .woff
     6165974    1528792 2007-08-23 .ps
     3707895    1142600 2006-10-13 .db
    35462955     598727 <present>  .xsl
   116742461     580130 <present>  .css
     3908431     569805 <present>  .html
     1675776     544438 2020-05-05 .doc
      735354     515661 <present>  .asc
     1448755     472992 <present>  .psd
      452428     452586 2004-06-01 .tgz
      363120     363405 <present>  .woff2
     3872323     334409 <present>  <no extension>
     1777265     265731 <present>  .txt
      917270     260720 2006-05-09 .eps
      448636     256434 <present>  .ttf
      229854     228281 <present>  .torrent
      296472     215976 2020-09-22 .otf
      278077     210504 <present>  .eot
      356373     157843 <present>  .skp
     7795698     140199 <present>  .less
     3043292     130376 <present>  .sh
     1841331     126954 <present>  .php
     1204688     120332 <present>  .sla
     5589042     114092 <present>  .pl
      287789     111734 <present>  .skb
      270368     108239 <present>  .map
      782423      90567 2020-07-03 .tex
       92466      88633 2006-05-09 .sxw
      499712      50123 <present>  .indd
     2031302      43846 2018-10-30 .texi
       45874      40878 <present>  .ott
       53208      38075 2019-06-24 .ods
       53144      35126 2020-09-22 .TTF
       96334      29418 <present>  .sources
      173029      22985 2006-05-09 .rtf
       98895      19150 <present>  .orig
       91624      16612 2017-07-14 .xhtml~
       56571      16295 <present>  .md
       40449      15355 <present>  .disabled
       40892      13251 2016-11-22 .inc
      144001      11710 2004-06-01 .shtml
      224404      11561 <present>  .draft
       27132      11515 <present>  .py
      168024      11353 <present>  .pm
       12044      10018 <present>  .old
       30340       9967 2020-07-22 .srt
       43495       9164 <present>  .xhml
       29125       9119 2015-10-29 .outdated-translations-patch
       41085       8007 <present>  .diff
       54865       6408 2014-06-17 .patch
       13983       5631 2002-05-27 .sgml
       18604       4924 <present>  .ico
       16253       4082 2004-05-26 .pxhtml
        8913       3995 <present>  .xthml
       22621       3883 <present>  .xhmtl
       15351       3613 2019-12-20 .rss
       14488       3579 2004-06-01 .dtd
       17362       3304 2015-02-24 .bak
       15110       2547 2004-06-01 .dsssl
       14822       2287 2004-06-01 .xalan
        4980       1626 2005-06-09 .broken
        3801       1572 <present>  .json
        9734       1294 <present>  .tt2
        6766       1281 2019-03-13 .pot
        6848       1275 2019-03-13 .po
        2914       1252 2004-06-01 .dsl
        7083       1183 2002-05-27 .IST
        5664       1166 2020-04-23 .PL
        3806       1138 2013-01-17 .xml~
        2236       1015 2004-12-02 .stp
        4511        988 2020-04-23 .cgi
       21738        839 2010-04-02 .conf
        3904        831 2004-09-03 .mk
        1659        817 2004-06-01 .sample
        3122        810 2002-05-27 .hutton
        2413        611 2002-05-27 .fig
        1726        545 2002-05-27 .dot
       12341        504 2003-02-01 .xsltproc
        2120        475 2004-06-01 .sig
         808        471 2004-05-26 .preproc
        1393        455 <present>  .yml
        1987        446 <present>  .xtml
       13795        420 <present>  .xhtm
         892        399 2004-06-01 .java
         665        367 2014-12-21 .x
         510        300 2006-05-09 .thermo
         821        296 2004-05-24 .kpweb
         276        266 2020-11-12 .license
         198        160 2019-05-26 .prod
         102        118 <present>  .translation
          94        114 2003-02-17 .lang
          82         84 2016-11-22 .csv
          80         82 2019-03-11 .tmp
          84         81 <present>  .directory
        5637         62 <present>  .×html
          33         41 <present>  .txt~
We have hundreds of megabytes of binary blobs in our git repo. Some of them have been moved to the picture base, the rest should also be moved out of our VCS tree. When done, the repository history should be rewritten to eliminate these blobs. This would significantly speed up new clones of the repo and likely also reduce the load on our servers. ``` === All extensions by reverse size === Format: unpacked size, packed size, date deleted, extension name 420825177 407494099 <present> .jpg 282831498 240183426 <present> .png 200025350 140198446 <present> .pdf 40690120 40291361 <present> .JPG 554714454 33399215 <present> .xhtml 32919326 32238870 2007-07-01 .avi 21956981 20637520 <present> .jpeg 29936321 16258652 2020-04-29 .odp 12134889 12036549 2007-07-01 .ogg 11105875 11099316 <present> .gz 79663199 7967165 <present> .svg 137773355 7691499 <present> .xml 6788476 6743746 <present> .gif 41093062 4787443 <present> .xcf 3304687 2843225 <present> .odt 1797460 1796178 2006-05-09 .zip 6083434 1650096 <present> .js 1637068 1631693 <present> .woff 6165974 1528792 2007-08-23 .ps 3707895 1142600 2006-10-13 .db 35462955 598727 <present> .xsl 116742461 580130 <present> .css 3908431 569805 <present> .html 1675776 544438 2020-05-05 .doc 735354 515661 <present> .asc 1448755 472992 <present> .psd 452428 452586 2004-06-01 .tgz 363120 363405 <present> .woff2 3872323 334409 <present> <no extension> 1777265 265731 <present> .txt 917270 260720 2006-05-09 .eps 448636 256434 <present> .ttf 229854 228281 <present> .torrent 296472 215976 2020-09-22 .otf 278077 210504 <present> .eot 356373 157843 <present> .skp 7795698 140199 <present> .less 3043292 130376 <present> .sh 1841331 126954 <present> .php 1204688 120332 <present> .sla 5589042 114092 <present> .pl 287789 111734 <present> .skb 270368 108239 <present> .map 782423 90567 2020-07-03 .tex 92466 88633 2006-05-09 .sxw 499712 50123 <present> .indd 2031302 43846 2018-10-30 .texi 45874 40878 <present> .ott 53208 38075 2019-06-24 .ods 53144 35126 2020-09-22 .TTF 96334 29418 <present> .sources 173029 22985 2006-05-09 .rtf 98895 19150 <present> .orig 91624 16612 2017-07-14 .xhtml~ 56571 16295 <present> .md 40449 15355 <present> .disabled 40892 13251 2016-11-22 .inc 144001 11710 2004-06-01 .shtml 224404 11561 <present> .draft 27132 11515 <present> .py 168024 11353 <present> .pm 12044 10018 <present> .old 30340 9967 2020-07-22 .srt 43495 9164 <present> .xhml 29125 9119 2015-10-29 .outdated-translations-patch 41085 8007 <present> .diff 54865 6408 2014-06-17 .patch 13983 5631 2002-05-27 .sgml 18604 4924 <present> .ico 16253 4082 2004-05-26 .pxhtml 8913 3995 <present> .xthml 22621 3883 <present> .xhmtl 15351 3613 2019-12-20 .rss 14488 3579 2004-06-01 .dtd 17362 3304 2015-02-24 .bak 15110 2547 2004-06-01 .dsssl 14822 2287 2004-06-01 .xalan 4980 1626 2005-06-09 .broken 3801 1572 <present> .json 9734 1294 <present> .tt2 6766 1281 2019-03-13 .pot 6848 1275 2019-03-13 .po 2914 1252 2004-06-01 .dsl 7083 1183 2002-05-27 .IST 5664 1166 2020-04-23 .PL 3806 1138 2013-01-17 .xml~ 2236 1015 2004-12-02 .stp 4511 988 2020-04-23 .cgi 21738 839 2010-04-02 .conf 3904 831 2004-09-03 .mk 1659 817 2004-06-01 .sample 3122 810 2002-05-27 .hutton 2413 611 2002-05-27 .fig 1726 545 2002-05-27 .dot 12341 504 2003-02-01 .xsltproc 2120 475 2004-06-01 .sig 808 471 2004-05-26 .preproc 1393 455 <present> .yml 1987 446 <present> .xtml 13795 420 <present> .xhtm 892 399 2004-06-01 .java 665 367 2014-12-21 .x 510 300 2006-05-09 .thermo 821 296 2004-05-24 .kpweb 276 266 2020-11-12 .license 198 160 2019-05-26 .prod 102 118 <present> .translation 94 114 2003-02-17 .lang 82 84 2016-11-22 .csv 80 82 2019-03-11 .tmp 84 81 <present> .directory 5637 62 <present> .×html 33 41 <present> .txt~ ```
repentinus added the
disruptive
label 3 months ago
Poster
Owner

Thanks for compiling this information!

I am all in favour of deleting blobs from /picturebase (as all of them are in pics.fsfe.org), and we can also think about removing unused resources both from the main branch as well as from history. The same goes for PDF files, which I would rather put on download.fsfe.org.

However, I would not say that no binary files should ever be present in the tree. Small logos, icons etc can definitely stay there IMHO, also to allow people without access to our new picturebase to amend these things.

Thanks for compiling this information! I am all in favour of deleting blobs from /picturebase (as all of them are in pics.fsfe.org), and we can also think about removing unused resources both from the main branch as well as from history. The same goes for PDF files, which I would rather put on download.fsfe.org. However, I would not say that no binary files should ever be present in the tree. Small logos, icons etc can definitely stay there IMHO, also to allow people without access to our new picturebase to amend these things.
Poster
Collaborator

However, I would not say that no binary files should ever be present in the tree. Small logos, icons etc can definitely stay there IMHO, also to allow people without access to our new picturebase to amend these things.

I'd like to underline this for use cases like the /order directory which contains pictures of the merchandise items. Sometimes I think the benefit of having things that belong together in one place outweighs the few Megabytes of additional data in git.

> However, I would not say that no binary files should ever be present in the tree. Small logos, icons etc can definitely stay there IMHO, also to allow people without access to our new picturebase to amend these things. I'd like to underline this for use cases like the /order directory which contains pictures of the merchandise items. Sometimes I think the benefit of having things that belong together in one place outweighs the few Megabytes of additional data in git.
Sign in to join this conversation.
No Milestone
No Assignees
3 Participants
Notifications
Due Date

No due date set.

Dependencies

This issue currently doesn't have any dependencies.

Loading…
There is no content yet.