read only filestore

This is one of the projects which I created to test a number of things and also to deal with an annoyance when working with backups. The central idea in this problem, is that we work with read only files which we copy and want to store somewhere. On one hand this is typically a backup situation, on another hand we also find the same concept back in functional programming languages and the idea here was to be able to speed up calculations substantially if we would be able to distinguish truly different inputs into functions. That ios to say: if the calculation was performed on a file with the same content then it shoudl not be redone. Of course these is the pretext. The small tests below come not even close to solving such memoiziation problem efficiently. Actually, the problem I discovered with this approach is that the file content comparison will take a long time when the store grows, even so that it is better to just compare file at the proper place instead of comparing them every opportuinity we have.

Hard linking duplicate files

Anyway, I'm getting sidetracked here. From a historic point of view I first took the program fdupes written by Adrian Lopez and since I liked it (I actually still like it), I modified it somewhat such that it would make hardlinks between duplicate files. This makes it possible to have an efficient store on disk without any alteration to the directory content.

ftp://ftp.yellowcouch.org/tools/
md5: 3ea39e35ed7a87492f504e07f3498e68

An observation with larger stores is that this program does it's job nicely but that it still will take more time when the store grows. In the end this little program neede 6 hours to go through 13Gb of around 500000 files, which is too long. Especially if one realizes that most of those files were alrteady compared against one anpother and that there is not such a good reason to compare those files _again_ at a later stage. As such I set out to create a tool that would sort files into a store and work incrementally. Each new file would be 'imported' in the store.

Importing files into a Read-Only Store

The source below contains a program that will take a directory and import the full content of that directory into a store by linking from within the store to the files in that directoty. This means that none of the files in the direcorty to import can be writeable. The4y should all be treated as readonly. The advantage of this system is that it is faster than finding files afterwards. The disadvantage is that those files _should not change_. So if you are undisciplined and just think you can start modifying these files: it will screw up your entire store foreverer. A change to one file is disastrous and cannot be solved. So be careful.

ftp://ftp.yellowcouch.org/tools/rofiles.tgz
md5: 41ebcd4c11b048e56970c75be4e3b0a1

An example of a run is something like

    import-files ~/Test

which will import each of the files directly or indirectly under Test and store them into the current directory, which is consiudered the target store. Each file will be assigned a unique id which can then be used further in rtelational databases or so. Below is an output of such a run

+1    /D/werner/Test/ProteomeEffect/Algorithms/FKRP1/3dmap-flat    aa/a
+2    /D/werner/Test/ProteomeEffect/Algorithms/FKRP1/3dmap    ba/a
+3    /D/werner/Test/ProteomeEffect/Algorithms/FKRP1/cols.cpp    ca/a
+4    /D/werner/Test/ProteomeEffect/Algorithms/FKRP1/cols    da/a
+5    /D/werner/Test/ProteomeEffect/Algorithms/FKRP1/3dmap-flat.cpp    ea/a
+6    /D/werner/Test/ProteomeEffect/Algorithms/FKRP1/3dmap-flat.cpp~    fa/a
+7    /D/werner/Test/ProteomeEffect/Algorithms/FKRP1/3dmap-flat.o    ga/a
+8    /D/werner/Test/ProteomeEffect/Algorithms/FKRP1/3dmap.cpp    ha/a
+9    /D/werner/Test/ProteomeEffect/Algorithms/FKRP1/3dmap.cpp~    ia/a
+10    /D/werner/Test/ProteomeEffect/Algorithms/FKRP1/3dmap.o    ja/a
+11    /D/werner/Test/ProteomeEffect/Algorithms/FKRP1/fkrp-protein-ranking.xls    ka/a
+54    /D/werner/Test/ProteomeEffect/Algorithms/FKRP3/fkrp-influences.xls    1a/a
+55    /D/werner/Test/ProteomeEffect/Algorithms/FKRP3/scrambled.csv.bz2    2a/a
+56    /D/werner/Test/ProteomeEffect/Algorithms/FKRP3/get-data.sh    3a/a
+57    /D/werner/Test/ProteomeEffect/Algorithms/FKRP3/get-data.sh~    4a/a
[...]
+475    /D/werner/Test/ProteomeEffect/Paper/BIB/LetterEditor1-BIB.doc    Oh/a
+476    /D/werner/Test/ProteomeEffect/Paper/BIB/LetterEditor1-BIB.pdf    Ph/a
+477    /D/werner/Test/ProteomeEffect/Paper/BIB/s1-ln452578695844769-1939656818Hwf102653189IdV-6983646684525786PDF_HI0001.pdf    Qh/a
+478    /D/werner/Test/ProteomeEffect/Paper/test.csv    Rh/a
 467    /D/werner/Test/ProteomeEffect/Paper/mk5-biological-process-25_files/std_hide.png    Gh/a
 468    /D/werner/Test/ProteomeEffect/Paper/mk5-biological-process-25_files/down.png    Hh/a
 469    /D/werner/Test/ProteomeEffect/Paper/mk5-biological-process-25_files/up.png    Ih/a
+614    /D/werner/Test/ProteomeEffect/Paper/InfluenceMapping8_tex.bbl    3j/a
+615    /D/werner/Test/ProteomeEffect/Paper/InfluenceMapping8_tex.blg    4j/a
+616    /D/werner/Test/ProteomeEffect/Paper/InfluenceMapping8_tex.dvi    5j/a
+617    /D/werner/Test/ProteomeEffect/Paper/InfluenceMapping8_tex.log    6j/a
+618    /D/werner/Test/ProteomeEffect/Paper/InfluenceMapping8_tex.pdf    7j/a
+619    /D/werner/Test/ProteomeEffect/Paper/InfluenceMapping8_tex.tex    8j/a
+620    /D/werner/Test/ProteomeEffect/Paper/InfluenceMapping9.lyx    9j/a
+621    /D/werner/Test/ProteomeEffect/Paper/PlosResponse1.doc    ak/a
+622    /D/werner/Test/ProteomeEffect/Paper/Ranked500-labeled.psd    bk/a
+623    /D/werner/Test/ProteomeEffect/Paper/bmc_article.aux    ck/a
+624    /D/werner/Test/ProteomeEffect/Paper/bmc_article.bib    dk/a
+625    /D/werner/Test/ProteomeEffect/Paper/bmc_article.bst    ek/a
+626    /D/werner/Test/ProteomeEffect/Paper/bmc_article.cls    fk/a
+627    /D/werner/Test/ProteomeEffect/Paper/bmc_article.dvi    gk/a
+628    /D/werner/Test/ProteomeEffect/Paper/bmc_article.log    hk/a
+629    /D/werner/Test/ProteomeEffect/Paper/bmc_article.tex    ik/a
+630    /D/werner/Test/ProteomeEffect/Paper/mk5-influence-map-regprop-filtered.ods    jk/a
 190    /D/werner/Test/ProteomeEffect/Paper/mk5-influence-map-regprop.ods    dd/a
+631    /D/werner/Test/ProteomeEffect/Paper/mk5-influence-red-vs-green-filtered.ods    kk/a
 89    /D/werner/Test/ProteomeEffect/Paper/mk5-influence-red-vs-green.ods    Ab/a
+632    /D/werner/Test/ProteomeEffect/Paper/mk5-swissprot.csv    lk/a
+633    /D/werner/Test/ProteomeEffect/Paper/mk5-unigene.csv    mk/a
+634    /D/werner/Test/ProteomeEffect/Paper/readme.html    nk/a
+635    /D/werner/Test/ProteomeEffect/Paper/references.bib    ok/a
+636    /D/werner/Test/ProteomeEffect/Paper/resultset.csv    pk/a
+637    /D/werner/Test/ProteomeEffect/Paper/PlosCB/58630_1_auth_cl_1_k7163g_convrt.pdf    qk/a
+638    /D/werner/Test/ProteomeEffect/Paper/PlosCB/58630_1_art_1_k46y84_convrt.pdf    rk/a
+639    /D/werner/Test/ProteomeEffect/Paper/PlosCB/58630_1_art_1_k714w3_convrt.pdf    sk/a
+640    /D/werner/Test/ProteomeEffect/Paper/PlosCB/58630_1_auth_cl_0_k453zy.pdf    tk/a
 612    /D/werner/Test/ProteomeEffect/Paper/PlosCB/InfluenceMapping9b.pdf    1j/a
+641    /D/werner/Test/ProteomeEffect/Paper/PlosCB/LetterEditor1-PLOSCB.doc    uk/a
+642    /D/werner/Test/ProteomeEffect/Paper/PlosCB/LetterEditor1-PLOSCB.pdf    vk/a
+643    /D/werner/Test/ProteomeEffect/Paper/PlosCB/LetterEditor2-PLOSCB.doc    wk/a

The entires that start with a + are new files. The entries that start without + were already known in the store under the given id.

Werner Van Belle - werner@yellowcouch.org
http://werner.yellowcouch.org/