Subscribe: Comments for The Pseudo Random Bit Bucket
http://moinakg.wordpress.com/comments/feed/
Preview: Comments for The Pseudo Random Bit Bucket

Comments for The Pseudo Random Bit Bucket



Moinakg's Ramblings



Last Build Date: Tue, 20 Jun 2017 19:26:54 +0000

 



Comment on High Performance Content Defined Chunking by casync — A tool for distributing file system images | Artificia Intelligence

Tue, 20 Jun 2017 19:26:54 +0000

[…] Encoding: Let’s take a large linear data stream, split it into variable-sized chunks (the size of each being a function of the chunk’s contents), and store these chunks in individual, compressed files in some directory, each file named after a strong hash value of its contents, so that the hash value may be used to as key for retrieving the full chunk data. Let’s call this directory a „chunk store“. At the same time, generate a „chunk index“ file that lists these chunk hash values plus their respective chunk sizes in a simple linear array. The chunking algorithm is supposed to create variable, but similarly sized chunks from the data stream, and do so in a way that the same data results in the same chunks even if placed at varying offsets. For more information see this blog story. […]



Comment on Pcompress 2.1 released with fixes and performance enhancements by Pcompress 2.1 - Linux mint, centos, ubuntu - OSWorld.pl - mały świat wielkich systemów!

Mon, 19 Jan 2015 23:27:37 +0000

[…] wydanie Pcompress 2.1, narzędzia do kompresji, dekompresji i deduplikacji, które wykorzystuje możliwości […]



Comment on Scaling Deduplication in Pcompress – Petascale in RAM by moinakg

Thu, 01 Jan 2015 06:39:57 +0000

Interesting paper. However this it is not the same thing that I discussed. The paper you referenced talks about re-organizing similar segments into a single compression region, whereas I have not looked at that. The basic idea of detecting similarity regions appear the same but, in my case, I am de-duplicating chunks within multiple similar regions but not changing the storage locations of the similarity regions. In de-duplication chunks within a region are replaced with pointers to identical chunks in another region. The two similar regions are not necessarily migrated next to each other.



Comment on Scaling Deduplication in Pcompress – Petascale in RAM by Phanu

Thu, 01 Jan 2015 01:35:51 +0000

Your mechanism look like "Migratory Compression: Coarse-grained Data Reordering to Improve Compressibility" reference: https://www.usenix.org/conference/fast14/technical-sessions/presentation/lin



Comment on Persisting the In-Memory Hash Index in Pcompress by moinakg

Fri, 12 Dec 2014 10:30:05 +0000

Thanks, I will look at LMDB. It is very interesting. I'd definitely prefer something which is already there rather than re-inventing the wheel.



Comment on Persisting the In-Memory Hash Index in Pcompress by jbd

Thu, 11 Dec 2014 22:16:27 +0000

Hi, Even if it's not my area of expertise, I really have the feeling you should have a look at LMDB (http://symas.com/mdb/) especially regarding your concern about data corruption and synchronous write. It also have a lot of the things you could be interested about even in your (random but not that much) hash lookup stuff. LMDB is a really really nice piece of software. If you've got time, there is some cool technical testimonial about it in this thread (hyc_symas is the lmdb author) : https://news.ycombinator.com/item?id=8732891 Keep up the good work with pcompress and your the articles in this blog ! [I don't know if my initial post is in being moderated or even reached your website, I was on my mobile phone at that time without enough battery :)]



Comment on Persisting the In-Memory Hash Index in Pcompress by Jbd

Thu, 11 Dec 2014 20:41:00 +0000

Hi, LMDB seems to be a perfect fit for what you're trying to achieve. http://symas.com/mdb/



Comment on Scaling Deduplication in Pcompress – Petascale in RAM by Oren Tirosh

Tue, 07 Oct 2014 14:14:38 +0000

Aronovich, Lior, et al. "The design of a similarity based deduplication system." Proceedings of SYSTOR 2009: The Israeli Experimental Systems Conference. ACM, 2009. APA



Comment on Scaling Deduplication in Pcompress – Petascale in RAM by moinakg

Tue, 07 Oct 2014 13:34:04 +0000

Hi Sam, I did not see your comment earlier. I have probably missed the email. The in-memory lookup table is a similarity based scheme. However it also leverages locality since it deals with chunks in groups. A whole group of nearby chunks are processed at once.



Comment on Scaling Deduplication in Pcompress – Petascale in RAM by moinakg

Tue, 07 Oct 2014 13:30:00 +0000

Thanks for the pointer. I wasn't aware of this. I'd have to check the details, if they have a paper.