Sun May 30 18:43:46 PDT 2010 Kevan Carstensen * Code cleanup - Change 'readv' to 'readvs' in remote_slot_readv in the storage server, to more adaquately convey what the argument is. Fri Jun 4 12:48:04 PDT 2010 Kevan Carstensen * Add a notion of the mutable file version number to interfaces.py Fri Jun 4 12:52:17 PDT 2010 Kevan Carstensen * Add a salt hasher for MDMF uploads Fri Jun 4 12:55:27 PDT 2010 Kevan Carstensen * Add MDMF and SDMF version numbers to interfaces.py Fri Jun 11 12:17:29 PDT 2010 Kevan Carstensen * Alter the mutable file servermap to read MDMF files Fri Jun 11 12:21:50 PDT 2010 Kevan Carstensen * Add tests for new MDMF proxies Mon Jun 14 14:34:59 PDT 2010 Kevan Carstensen * Alter MDMF proxy tests to reflect the new form of caching Mon Jun 14 14:37:21 PDT 2010 Kevan Carstensen * Add tests and support functions for servermap tests Tue Jun 22 17:13:32 PDT 2010 Kevan Carstensen * Make a segmented downloader Rework the current mutable file Retrieve class to download segmented files. The rewrite preserves the semantics and basic conceptual state machine of the old Retrieve class, but adapts them to work with files with more than one segment, which involves a fairly substantial rewrite. I've also adapted some existing SDMF tests to work with the new downloader, as necessary. TODO: - Write tests for MDMF functionality. - Finish writing and testing salt functionality Tue Jun 22 17:17:08 PDT 2010 Kevan Carstensen * Tell NodeMaker and MutableFileNode about the distinction between SDMF and MDMF Tue Jun 22 17:17:32 PDT 2010 Kevan Carstensen * Assorted servermap fixes - Check for failure when setting the private key - Check for failure when setting other things - Check for doneness in a way that is resilient to hung servers - Remove dead code - Reorganize error and success handling methods, and make sure they get used. Wed Jun 23 16:32:03 PDT 2010 Kevan Carstensen * Add objects for MDMF shares in support of a new segmented uploader This patch adds the following: - MDMFSlotWriteProxy, which can write MDMF shares to the storage server in the new format. - MDMFSlotReadProxy, which can read both SDMF and MDMF shares from the storage server. This patch also includes tests for these new object. Wed Jun 23 16:32:48 PDT 2010 Kevan Carstensen * A first stab at a segmented uploader This uploader will upload, segment-by-segment, MDMF files. It will only do this if it thinks that the filenode that it is uploading represents an MDMF file; otherwise, it uploads the file as SDMF. My TODO list so far: - More robust peer selection; we'll want to use something like servers of happiness to figure out reliability and unreliability. - Clean up. Wed Jun 23 16:35:03 PDT 2010 Kevan Carstensen * Make the mutable downloader batch its reads New patches: [Code cleanup Kevan Carstensen **20100531014346 Ignore-this: 697378037e83290267f108a4a88b8776 - Change 'readv' to 'readvs' in remote_slot_readv in the storage server, to more adaquately convey what the argument is. ] { hunk ./src/allmydata/storage/server.py 569 self) return share - def remote_slot_readv(self, storage_index, shares, readv): + def remote_slot_readv(self, storage_index, shares, readvs): start = time.time() self.count("readv") si_s = si_b2a(storage_index) hunk ./src/allmydata/storage/server.py 590 if sharenum in shares or not shares: filename = os.path.join(bucketdir, sharenum_s) msf = MutableShareFile(filename, self) - datavs[sharenum] = msf.readv(readv) + datavs[sharenum] = msf.readv(readvs) log.msg("returning shares %s" % (datavs.keys(),), facility="tahoe.storage", level=log.NOISY, parent=lp) self.add_latency("readv", time.time() - start) } [Add a notion of the mutable file version number to interfaces.py Kevan Carstensen **20100604194804 Ignore-this: fd767043437c3cd694807687e6dc677 ] hunk ./src/allmydata/interfaces.py 807 writer-visible data using this writekey. """ + def set_version(version): + """Tahoe-LAFS supports SDMF and MDMF mutable files. By default, + we upload in SDMF for reasons of compatibility. If you want to + change this, set_version will let you do that. + + To say that this file should be uploaded in SDMF, pass in a 0. To + say that the file should be uploaded as MDMF, pass in a 1. + """ + + def get_version(): + """Returns the mutable file protocol version.""" + class NotEnoughSharesError(Exception): """Download was unable to get enough shares""" [Add a salt hasher for MDMF uploads Kevan Carstensen **20100604195217 Ignore-this: 3072f4c4e75efa078f31aac3a56d36b2 ] { hunk ./src/allmydata/util/hashutil.py 90 MUTABLE_READKEY_TAG = "allmydata_mutable_writekey_to_readkey_v1" MUTABLE_DATAKEY_TAG = "allmydata_mutable_readkey_to_datakey_v1" MUTABLE_STORAGEINDEX_TAG = "allmydata_mutable_readkey_to_storage_index_v1" +MUTABLE_SALT_TAG = "allmydata_mutable_segment_salt_v1" # dirnodes DIRNODE_CHILD_WRITECAP_TAG = "allmydata_mutable_writekey_and_salt_to_dirnode_child_capkey_v1" hunk ./src/allmydata/util/hashutil.py 134 def plaintext_segment_hasher(): return tagged_hasher(PLAINTEXT_SEGMENT_TAG) +def mutable_salt_hash(data): + return tagged_hash(MUTABLE_SALT_TAG, data) +def mutable_salt_hasher(): + return tagged_hasher(MUTABLE_SALT_TAG) + KEYLEN = 16 IVLEN = 16 } [Add MDMF and SDMF version numbers to interfaces.py Kevan Carstensen **20100604195527 Ignore-this: 5736d229076ea432b9cf40fcee9b4749 ] hunk ./src/allmydata/interfaces.py 8 HASH_SIZE=32 +SDMF_VERSION=0 +MDMF_VERSION=1 + Hash = StringConstraint(maxLength=HASH_SIZE, minLength=HASH_SIZE)# binary format 32-byte SHA256 hash Nodeid = StringConstraint(maxLength=20, [Alter the mutable file servermap to read MDMF files Kevan Carstensen **20100611191729 Ignore-this: f05748597749f07b16cdbb711fae92e5 ] { hunk ./src/allmydata/mutable/servermap.py 7 from itertools import count from twisted.internet import defer from twisted.python import failure -from foolscap.api import DeadReferenceError, RemoteException, eventually +from foolscap.api import DeadReferenceError, RemoteException, eventually, \ + fireEventually from allmydata.util import base32, hashutil, idlib, log from allmydata.storage.server import si_b2a from allmydata.interfaces import IServermapUpdaterStatus hunk ./src/allmydata/mutable/servermap.py 17 from allmydata.mutable.common import MODE_CHECK, MODE_ANYTHING, MODE_WRITE, MODE_READ, \ DictOfSets, CorruptShareError, NeedMoreDataError from allmydata.mutable.layout import unpack_prefix_and_signature, unpack_header, unpack_share, \ - SIGNED_PREFIX_LENGTH + SIGNED_PREFIX_LENGTH, MDMFSlotReadProxy class UpdateStatus: implements(IServermapUpdaterStatus) hunk ./src/allmydata/mutable/servermap.py 254 """Return a set of versionids, one for each version that is currently recoverable.""" versionmap = self.make_versionmap() - recoverable_versions = set() for (verinfo, shares) in versionmap.items(): (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, hunk ./src/allmydata/mutable/servermap.py 366 self._servers_responded = set() # how much data should we read? + # SDMF: # * if we only need the checkstring, then [0:75] # * if we need to validate the checkstring sig, then [543ish:799ish] # * if we need the verification key, then [107:436ish] hunk ./src/allmydata/mutable/servermap.py 374 # * if we need the encrypted private key, we want [-1216ish:] # * but we can't read from negative offsets # * the offset table tells us the 'ish', also the positive offset - # A future version of the SMDF slot format should consider using - # fixed-size slots so we can retrieve less data. For now, we'll just - # read 2000 bytes, which also happens to read enough actual data to - # pre-fetch a 9-entry dirnode. + # MDMF: + # * Checkstring? [0:72] + # * If we want to validate the checkstring, then [0:72], [143:?] -- + # the offset table will tell us for sure. + # * If we need the verification key, we have to consult the offset + # table as well. + # At this point, we don't know which we are. Our filenode can + # tell us, but it might be lying -- in some cases, we're + # responsible for telling it which kind of file it is. self._read_size = 4000 if mode == MODE_CHECK: # we use unpack_prefix_and_signature, so we need 1k hunk ./src/allmydata/mutable/servermap.py 432 self._queries_completed = 0 sb = self._storage_broker + # All of the peers, permuted by the storage index, as usual. full_peerlist = sb.get_servers_for_index(self._storage_index) self.full_peerlist = full_peerlist # for use later, immutable self.extra_peers = full_peerlist[:] # peers are removed as we use them hunk ./src/allmydata/mutable/servermap.py 439 self._good_peers = set() # peers who had some shares self._empty_peers = set() # peers who don't have any shares self._bad_peers = set() # peers to whom our queries failed + self._readers = {} # peerid -> dict(sharewriters), filled in + # after responses come in. k = self._node.get_required_shares() hunk ./src/allmydata/mutable/servermap.py 443 + # For what cases can these conditions work? if k is None: # make a guess k = 3 hunk ./src/allmydata/mutable/servermap.py 456 self.num_peers_to_query = k + self.EPSILON if self.mode == MODE_CHECK: + # We want to query all of the peers. initial_peers_to_query = dict(full_peerlist) must_query = set(initial_peers_to_query.keys()) self.extra_peers = [] hunk ./src/allmydata/mutable/servermap.py 464 # we're planning to replace all the shares, so we want a good # chance of finding them all. We will keep searching until we've # seen epsilon that don't have a share. + # We don't query all of the peers because that could take a while. self.num_peers_to_query = N + self.EPSILON initial_peers_to_query, must_query = self._build_initial_querylist() self.required_num_empty_peers = self.EPSILON hunk ./src/allmydata/mutable/servermap.py 474 # might also avoid the round trip required to read the encrypted # private key. - else: + else: # MODE_READ, MODE_ANYTHING + # 2k peers is good enough. initial_peers_to_query, must_query = self._build_initial_querylist() # this is a set of peers that we are required to get responses from: hunk ./src/allmydata/mutable/servermap.py 485 # set as we get responses. self._must_query = must_query + # This tells the done check whether requests are still being + # processed. We should wait before returning until at least + # updated correctly (and dealing with connection errors. + self._processing = 0 + # now initial_peers_to_query contains the peers that we should ask, # self.must_query contains the peers that we must have heard from # before we can consider ourselves finished, and self.extra_peers hunk ./src/allmydata/mutable/servermap.py 495 # contains the overflow (peers that we should tap if we don't get # enough responses) + # I guess that self._must_query is a subset of + # initial_peers_to_query? + assert set(must_query).issubset(set(initial_peers_to_query)) self._send_initial_requests(initial_peers_to_query) self._status.timings["initial_queries"] = time.time() - self._started hunk ./src/allmydata/mutable/servermap.py 554 # errors that aren't handled by _query_failed (and errors caused by # _query_failed) get logged, but we still want to check for doneness. d.addErrback(log.err) - d.addBoth(self._check_for_done) d.addErrback(self._fatal_error) return d hunk ./src/allmydata/mutable/servermap.py 584 self._servermap.reachable_peers.add(peerid) self._must_query.discard(peerid) self._queries_completed += 1 + # self._processing counts the number of queries that have + # completed, but are still processing. We wait until all queries + # are done processing before returning a result to the client. + # TODO: Should we do this? A response to the initial query means + # that we may not have to query the server for anything else, + # but if we're dealing with an MDMF share, we'll probably have + # to ask it for its signature, unless we cache those sometplace, + # and even then. + self._processing += 1 if not self._running: self.log("but we're not running, so we'll ignore it", parent=lp, level=log.NOISY) hunk ./src/allmydata/mutable/servermap.py 605 else: self._empty_peers.add(peerid) - last_verinfo = None - last_shnum = None + ss, storage_index = stuff + ds = [] + + + def _tattle(ignored, status): + print status + print ignored + return ignored + + def _cache(verinfo, shnum, now, data): + self._queries_oustand + self._node._add_to_cache(verinfo, shnum, 0, data, now) + return shnum, verinfo + + def _corrupt(e, shnum, data): + # This gets raised when there was something wrong with + # the remote server. Specifically, when there was an + # error unpacking the remote data from the server, or + # when the signature is invalid. + print e + f = failure.Failure() + self.log(format="bad share: %(f_value)s", f_value=str(f.value), + failure=f, parent=lp, level=log.WEIRD, umid="h5llHg") + # Notify the server that its share is corrupt. + self.notify_server_corruption(peerid, shnum, str(e)) + # By flagging this as a bad peer, we won't count any of + # the other shares on that peer as valid, though if we + # happen to find a valid version string amongst those + # shares, we'll keep track of it so that we don't need + # to validate the signature on those again. + self._bad_peers.add(peerid) + self._last_failure = f + # 393CHANGE: Use the reader for this. + checkstring = data[:SIGNED_PREFIX_LENGTH] + self._servermap.mark_bad_share(peerid, shnum, checkstring) + self._servermap.problems.append(f) + for shnum,datav in datavs.items(): data = datav[0] hunk ./src/allmydata/mutable/servermap.py 644 - try: - verinfo = self._got_results_one_share(shnum, data, peerid, lp) - last_verinfo = verinfo - last_shnum = shnum - self._node._add_to_cache(verinfo, shnum, 0, data, now) - except CorruptShareError, e: - # log it and give the other shares a chance to be processed - f = failure.Failure() - self.log(format="bad share: %(f_value)s", f_value=str(f.value), - failure=f, parent=lp, level=log.WEIRD, umid="h5llHg") - self.notify_server_corruption(peerid, shnum, str(e)) - self._bad_peers.add(peerid) - self._last_failure = f - checkstring = data[:SIGNED_PREFIX_LENGTH] - self._servermap.mark_bad_share(peerid, shnum, checkstring) - self._servermap.problems.append(f) - pass - - self._status.timings["cumulative_verify"] += (time.time() - now) + reader = MDMFSlotReadProxy(ss, + storage_index, + shnum, + data) + self._readers.setdefault(peerid, dict())[shnum] = reader + # our goal, with each response, is to validate the version + # information and share data as best we can at this point -- + # we do this by validating the signature. To do this, we + # need to do the following: + # - If we don't already have the public key, fetch the + # public key. We use this to validate the signature. + friendly_peer = idlib.shortnodeid_b2a(peerid) + if not self._node.get_pubkey(): + # fetch and set the public key. + d = reader.get_verification_key() + d.addCallback(self._try_to_set_pubkey) + else: + # we already have the public key. + d = defer.succeed(None) + # Neither of these two branches return anything of + # consequence, so the first entry in our deferredlist will + # be None. hunk ./src/allmydata/mutable/servermap.py 667 - if self._need_privkey and last_verinfo: - # send them a request for the privkey. We send one request per - # server. - lp2 = self.log("sending privkey request", - parent=lp, level=log.NOISY) - (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, - offsets_tuple) = last_verinfo - o = dict(offsets_tuple) + # - Next, we need the version information. We almost + # certainly got this by reading the first thousand or so + # bytes of the share on the storage server, so we + # shouldn't need to fetch anything at this step. + d2 = reader.get_verinfo() + # - Next, we need the signature. For an SDMF share, it is + # likely that we fetched this when doing our initial fetch + # to get the version information. In MDMF, this lives at + # the end of the share, so unless the file is quite small, + # we'll need to do a remote fetch to get it. + d3 = reader.get_signature() + # Once we have all three of these responses, we can move on + # to validating the signature hunk ./src/allmydata/mutable/servermap.py 681 - self._queries_outstanding.add(peerid) - readv = [ (o['enc_privkey'], (o['EOF'] - o['enc_privkey'])) ] - ss = self._servermap.connections[peerid] - privkey_started = time.time() - d = self._do_read(ss, peerid, self._storage_index, - [last_shnum], readv) - d.addCallback(self._got_privkey_results, peerid, last_shnum, - privkey_started, lp2) - d.addErrback(self._privkey_query_failed, peerid, last_shnum, lp2) - d.addErrback(log.err) - d.addCallback(self._check_for_done) - d.addErrback(self._fatal_error) + # Does the node already have a privkey? If not, we'll try to + # fetch it here. + if not self._node.get_privkey(): + d4 = reader.get_encprivkey() + d4.addCallback(lambda results, shnum=shnum, peerid=peerid: + self._try_to_validate_privkey(results, peerid, shnum, lp)) + else: + d4 = defer.succeed(None) hunk ./src/allmydata/mutable/servermap.py 690 + dl = defer.DeferredList([d, d2, d3, d4]) + dl.addCallback(lambda results, shnum=shnum, peerid=peerid: + self._got_signature_one_share(results, shnum, peerid, lp)) + dl.addErrback(lambda error, shnum=shnum, data=data: + _corrupt(error, shnum, data)) + ds.append(dl) + # dl is a deferred list that will fire when all of the shares + # that we found on this peer are done processing. When dl fires, + # we know that processing is done, so we can decrement the + # semaphore-like thing that we incremented earlier. + dl = defer.DeferredList(ds) + def _done_processing(ignored): + self._processing -= 1 + return ignored + dl.addCallback(_done_processing) + # Are we done? Done means that there are no more queries to + # send, that there are no outstanding queries, and that we + # haven't received any queries that are still processing. If we + # are done, self._check_for_done will cause the done deferred + # that we returned to our caller to fire, which tells them that + # they have a complete servermap, and that we won't be touching + # the servermap anymore. + dl.addBoth(self._check_for_done) + dl.addErrback(self._fatal_error) # all done! hunk ./src/allmydata/mutable/servermap.py 715 + return dl self.log("_got_results done", parent=lp, level=log.NOISY) hunk ./src/allmydata/mutable/servermap.py 718 + def _try_to_set_pubkey(self, pubkey_s): + if self._node.get_pubkey(): + return # don't go through this again if we don't have to + fingerprint = hashutil.ssk_pubkey_fingerprint_hash(pubkey_s) + assert len(fingerprint) == 32 + if fingerprint != self._node.get_fingerprint(): + raise CorruptShareError(peerid, shnum, + "pubkey doesn't match fingerprint") + self._node._populate_pubkey(self._deserialize_pubkey(pubkey_s)) + assert self._node.get_pubkey() + + def notify_server_corruption(self, peerid, shnum, reason): ss = self._servermap.connections[peerid] ss.callRemoteOnly("advise_corrupt_share", hunk ./src/allmydata/mutable/servermap.py 735 "mutable", self._storage_index, shnum, reason) - def _got_results_one_share(self, shnum, data, peerid, lp): + + def _got_signature_one_share(self, results, shnum, peerid, lp): + # It is our job to give versioninfo to our caller. We need to + # raise CorruptShareError if the share is corrupt for any + # reason, something that our caller will handle. self.log(format="_got_results: got shnum #%(shnum)d from peerid %(peerid)s", shnum=shnum, peerid=idlib.shortnodeid_b2a(peerid), hunk ./src/allmydata/mutable/servermap.py 745 level=log.NOISY, parent=lp) - - # this might raise NeedMoreDataError, if the pubkey and signature - # live at some weird offset. That shouldn't happen, so I'm going to - # treat it as a bad share. - (seqnum, root_hash, IV, k, N, segsize, datalength, - pubkey_s, signature, prefix) = unpack_prefix_and_signature(data) - - if not self._node.get_pubkey(): - fingerprint = hashutil.ssk_pubkey_fingerprint_hash(pubkey_s) - assert len(fingerprint) == 32 - if fingerprint != self._node.get_fingerprint(): - raise CorruptShareError(peerid, shnum, - "pubkey doesn't match fingerprint") - self._node._populate_pubkey(self._deserialize_pubkey(pubkey_s)) - - if self._need_privkey: - self._try_to_extract_privkey(data, peerid, shnum, lp) - - (ig_version, ig_seqnum, ig_root_hash, ig_IV, ig_k, ig_N, - ig_segsize, ig_datalen, offsets) = unpack_header(data) + _, verinfo, signature, __ = results + (seqnum, + root_hash, + saltish, + segsize, + datalen, + k, + n, + prefix, + offsets) = verinfo[1] offsets_tuple = tuple( [(key,value) for key,value in offsets.items()] ) hunk ./src/allmydata/mutable/servermap.py 757 - verinfo = (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, + # XXX: This should be done for us in the method, so + # presumably you can go in there and fix it. + verinfo = (seqnum, + root_hash, + saltish, + segsize, + datalen, + k, + n, + prefix, offsets_tuple) hunk ./src/allmydata/mutable/servermap.py 768 + # This tuple uniquely identifies a share on the grid; we use it + # to keep track of the ones that we've already seen. if verinfo not in self._valid_versions: hunk ./src/allmydata/mutable/servermap.py 772 - # it's a new pair. Verify the signature. - valid = self._node.get_pubkey().verify(prefix, signature) + # This is a new version tuple, and we need to validate it + # against the public key before keeping track of it. + valid = self._node.get_pubkey().verify(prefix, signature[1]) if not valid: hunk ./src/allmydata/mutable/servermap.py 776 - raise CorruptShareError(peerid, shnum, "signature is invalid") + raise CorruptShareError(peerid, shnum, + "signature is invalid") hunk ./src/allmydata/mutable/servermap.py 779 - # ok, it's a valid verinfo. Add it to the list of validated - # versions. - self.log(" found valid version %d-%s from %s-sh%d: %d-%d/%d/%d" - % (seqnum, base32.b2a(root_hash)[:4], - idlib.shortnodeid_b2a(peerid), shnum, - k, N, segsize, datalength), - parent=lp) - self._valid_versions.add(verinfo) - # We now know that this is a valid candidate verinfo. + # ok, it's a valid verinfo. Add it to the list of validated + # versions. + self.log(" found valid version %d-%s from %s-sh%d: %d-%d/%d/%d" + % (seqnum, base32.b2a(root_hash)[:4], + idlib.shortnodeid_b2a(peerid), shnum, + k, n, segsize, datalen), + parent=lp) + self._valid_versions.add(verinfo) + # We now know that this is a valid candidate verinfo. Whether or + # not this instance of it is valid is a matter for the next + # statement; at this point, we just know that if we see this + # version info again, that its signature checks out and that + # we're okay to skip the signature-checking step. hunk ./src/allmydata/mutable/servermap.py 793 + # (peerid, shnum) are bound in the method invocation. if (peerid, shnum) in self._servermap.bad_shares: # we've been told that the rest of the data in this share is # unusable, so don't add it to the servermap. hunk ./src/allmydata/mutable/servermap.py 808 self.versionmap.add(verinfo, (shnum, peerid, timestamp)) return verinfo + def _deserialize_pubkey(self, pubkey_s): verifier = rsa.create_verifying_key_from_string(pubkey_s) return verifier hunk ./src/allmydata/mutable/servermap.py 813 - def _try_to_extract_privkey(self, data, peerid, shnum, lp): - try: - r = unpack_share(data) - except NeedMoreDataError, e: - # this share won't help us. oh well. - offset = e.encprivkey_offset - length = e.encprivkey_length - self.log("shnum %d on peerid %s: share was too short (%dB) " - "to get the encprivkey; [%d:%d] ought to hold it" % - (shnum, idlib.shortnodeid_b2a(peerid), len(data), - offset, offset+length), - parent=lp) - # NOTE: if uncoordinated writes are taking place, someone might - # change the share (and most probably move the encprivkey) before - # we get a chance to do one of these reads and fetch it. This - # will cause us to see a NotEnoughSharesError(unable to fetch - # privkey) instead of an UncoordinatedWriteError . This is a - # nuisance, but it will go away when we move to DSA-based mutable - # files (since the privkey will be small enough to fit in the - # write cap). - - return - - (seqnum, root_hash, IV, k, N, segsize, datalen, - pubkey, signature, share_hash_chain, block_hash_tree, - share_data, enc_privkey) = r - - return self._try_to_validate_privkey(enc_privkey, peerid, shnum, lp) def _try_to_validate_privkey(self, enc_privkey, peerid, shnum, lp): hunk ./src/allmydata/mutable/servermap.py 815 - + """ + Given a writekey from a remote server, I validate it against the + writekey stored in my node. If it is valid, then I set the + privkey and encprivkey properties of the node. + """ alleged_privkey_s = self._node._decrypt_privkey(enc_privkey) alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s) if alleged_writekey != self._node.get_writekey(): hunk ./src/allmydata/mutable/servermap.py 925 # return self._send_more_queries(outstanding) : send some more queries # return self._done() : all done # return : keep waiting, no new queries - lp = self.log(format=("_check_for_done, mode is '%(mode)s', " "%(outstanding)d queries outstanding, " "%(extra)d extra peers available, " hunk ./src/allmydata/mutable/servermap.py 943 self.log("but we're not running", parent=lp, level=log.NOISY) return + if self._processing > 0: + # wait until more results are done before returning. + return + if self._must_query: # we are still waiting for responses from peers that used to have # a share, so we must continue to wait. No additional queries are hunk ./src/allmydata/mutable/servermap.py 1134 self._servermap.last_update_time = self._started # the servermap will not be touched after this self.log("servermap: %s" % self._servermap.summarize_versions()) + eventually(self._done_deferred.callback, self._servermap) def _fatal_error(self, f): } [Add tests for new MDMF proxies Kevan Carstensen **20100611192150 Ignore-this: 986d2cb867cbd4477b131cd951cd9eac ] { hunk ./src/allmydata/test/test_storage.py 2 -import time, os.path, stat, re, simplejson, struct +import time, os.path, stat, re, simplejson, struct, shutil from twisted.trial import unittest hunk ./src/allmydata/test/test_storage.py 22 from allmydata.storage.expirer import LeaseCheckingCrawler from allmydata.immutable.layout import WriteBucketProxy, WriteBucketProxy_v2, \ ReadBucketProxy -from allmydata.interfaces import BadWriteEnablerError -from allmydata.test.common import LoggingServiceParent +from allmydata.mutable.layout import MDMFSlotWriteProxy, MDMFSlotReadProxy, \ + LayoutInvalid +from allmydata.interfaces import BadWriteEnablerError, MDMF_VERSION, \ + SDMF_VERSION +from allmydata.test.common import LoggingServiceParent, ShouldFailMixin from allmydata.test.common_web import WebRenderingMixin from allmydata.web.storage import StorageStatus, remove_prefix hunk ./src/allmydata/test/test_storage.py 1286 self.failUnless(os.path.exists(prefixdir), prefixdir) self.failIf(os.path.exists(bucketdir), bucketdir) + +class MDMFProxies(unittest.TestCase, ShouldFailMixin): + def setUp(self): + self.sparent = LoggingServiceParent() + self._lease_secret = itertools.count() + self.ss = self.create("MDMFProxies storage test server") + self.rref = RemoteBucket() + self.rref.target = self.ss + self.secrets = (self.write_enabler("we_secret"), + self.renew_secret("renew_secret"), + self.cancel_secret("cancel_secret")) + self.segment = "aaaaaa" + self.block = "aa" + self.salt = "a" * 16 + self.block_hash = "a" * 32 + self.block_hash_tree = [self.block_hash for i in xrange(6)] + self.share_hash = self.block_hash + self.share_hash_chain = dict([(i, self.share_hash) for i in xrange(6)]) + self.signature = "foobarbaz" + self.verification_key = "vvvvvv" + self.encprivkey = "private" + self.root_hash = self.block_hash + self.salt_hash = self.root_hash + self.block_hash_tree_s = self.serialize_blockhashes(self.block_hash_tree) + self.share_hash_chain_s = self.serialize_sharehashes(self.share_hash_chain) + + + def tearDown(self): + self.sparent.stopService() + shutil.rmtree(self.workdir("MDMFProxies storage test server")) + + + def write_enabler(self, we_tag): + return hashutil.tagged_hash("we_blah", we_tag) + + + def renew_secret(self, tag): + return hashutil.tagged_hash("renew_blah", str(tag)) + + + def cancel_secret(self, tag): + return hashutil.tagged_hash("cancel_blah", str(tag)) + + + def workdir(self, name): + basedir = os.path.join("storage", "MutableServer", name) + return basedir + + + def create(self, name): + workdir = self.workdir(name) + ss = StorageServer(workdir, "\x00" * 20) + ss.setServiceParent(self.sparent) + return ss + + + def build_test_mdmf_share(self, tail_segment=False, empty=False): + # Start with the checkstring + data = struct.pack(">BQ32s32s", + 1, + 0, + self.root_hash, + self.salt_hash) + self.checkstring = data + # Next, the encoding parameters + if tail_segment: + data += struct.pack(">BBQQ", + 3, + 10, + 6, + 33) + elif empty: + data += struct.pack(">BBQQ", + 3, + 10, + 0, + 0) + else: + data += struct.pack(">BBQQ", + 3, + 10, + 6, + 36) + # Now we'll build the offsets. + # The header -- everything up to the salts -- is 143 bytes long. + # The shares come after the salts. + if empty: + salts = "" + else: + salts = self.salt * 6 + share_offset = 143 + len(salts) + if tail_segment: + sharedata = self.block * 6 + elif empty: + sharedata = "" + else: + sharedata = self.block * 6 + "a" + # The encrypted private key comes after the shares + encrypted_private_key_offset = share_offset + len(sharedata) + # The blockhashes come after the private key + blockhashes_offset = encrypted_private_key_offset + len(self.encprivkey) + # The sharehashes come after the blockhashes + sharehashes_offset = blockhashes_offset + len(self.block_hash_tree_s) + # The signature comes after the share hash chain + signature_offset = sharehashes_offset + len(self.share_hash_chain_s) + # The verification key comes after the signature + verification_offset = signature_offset + len(self.signature) + # The EOF comes after the verification key + eof_offset = verification_offset + len(self.verification_key) + data += struct.pack(">LQQQQQQ", + share_offset, + encrypted_private_key_offset, + blockhashes_offset, + sharehashes_offset, + signature_offset, + verification_offset, + eof_offset) + self.offsets = {} + self.offsets['share_data'] = share_offset + self.offsets['enc_privkey'] = encrypted_private_key_offset + self.offsets['block_hash_tree'] = blockhashes_offset + self.offsets['share_hash_chain'] = sharehashes_offset + self.offsets['signature'] = signature_offset + self.offsets['verification_key'] = verification_offset + self.offsets['EOF'] = eof_offset + # Next, we'll add in the salts, + data += salts + # the share data, + data += sharedata + # the private key, + data += self.encprivkey + # the block hash tree, + data += self.block_hash_tree_s + # the share hash chain, + data += self.share_hash_chain_s + # the signature, + data += self.signature + # and the verification key + data += self.verification_key + return data + + + def write_test_share_to_server(self, + storage_index, + tail_segment=False, + empty=False): + """ + I write some data for the read tests to read to self.ss + + If tail_segment=True, then I will write a share that has a + smaller tail segment than other segments. + """ + write = self.ss.remote_slot_testv_and_readv_and_writev + data = self.build_test_mdmf_share(tail_segment, empty) + # Finally, we write the whole thing to the storage server in one + # pass. + testvs = [(0, 1, "eq", "")] + tws = {} + tws[0] = (testvs, [(0, data)], None) + readv = [(0, 1)] + results = write(storage_index, self.secrets, tws, readv) + self.failUnless(results[0]) + + + def build_test_sdmf_share(self, empty=False): + if empty: + sharedata = "" + else: + sharedata = self.segment * 6 + blocksize = len(sharedata) / 3 + block = sharedata[:blocksize] + prefix = struct.pack(">BQ32s16s BBQQ", + 0, # version, + 0, + self.root_hash, + self.salt, + 3, + 10, + len(sharedata), + len(sharedata), + ) + post_offset = struct.calcsize(">BQ32s16sBBQQLLLLQQ") + signature_offset = post_offset + len(self.verification_key) + sharehashes_offset = signature_offset + len(self.signature) + blockhashes_offset = sharehashes_offset + len(self.share_hash_chain_s) + sharedata_offset = blockhashes_offset + len(self.block_hash_tree_s) + encprivkey_offset = sharedata_offset + len(block) + eof_offset = encprivkey_offset + len(self.encprivkey) + offsets = struct.pack(">LLLLQQ", + signature_offset, + sharehashes_offset, + blockhashes_offset, + sharedata_offset, + encprivkey_offset, + eof_offset) + final_share = "".join([prefix, + offsets, + self.verification_key, + self.signature, + self.share_hash_chain_s, + self.block_hash_tree_s, + block, + self.encprivkey]) + self.offsets = {} + self.offsets['signature'] = signature_offset + self.offsets['share_hash_chain'] = sharehashes_offset + self.offsets['block_hash_tree'] = blockhashes_offset + self.offsets['share_data'] = sharedata_offset + self.offsets['enc_privkey'] = encprivkey_offset + self.offsets['EOF'] = eof_offset + return final_share + + + def write_sdmf_share_to_server(self, + storage_index, + empty=False): + # Some tests need SDMF shares to verify that we can still + # read them. This method writes one, which resembles but is not + assert self.rref + write = self.ss.remote_slot_testv_and_readv_and_writev + share = self.build_test_sdmf_share(empty) + testvs = [(0, 1, "eq", "")] + tws = {} + tws[0] = (testvs, [(0, share)], None) + readv = [] + results = write(storage_index, self.secrets, tws, readv) + self.failUnless(results[0]) + + + def test_read(self): + self.write_test_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + # Check that every method equals what we expect it to. + d = defer.succeed(None) + def _check_block_and_salt((block, salt)): + self.failUnlessEqual(block, self.block) + self.failUnlessEqual(salt, self.salt) + + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mr.get_block_and_salt(i)) + d.addCallback(_check_block_and_salt) + + d.addCallback(lambda ignored: + mr.get_encprivkey()) + d.addCallback(lambda encprivkey: + self.failUnlessEqual(self.encprivkey, encprivkey)) + + d.addCallback(lambda ignored: + mr.get_blockhashes()) + d.addCallback(lambda blockhashes: + self.failUnlessEqual(self.block_hash_tree, blockhashes)) + + d.addCallback(lambda ignored: + mr.get_sharehashes()) + d.addCallback(lambda sharehashes: + self.failUnlessEqual(self.share_hash_chain, sharehashes)) + + d.addCallback(lambda ignored: + mr.get_signature()) + d.addCallback(lambda signature: + self.failUnlessEqual(signature, self.signature)) + + d.addCallback(lambda ignored: + mr.get_verification_key()) + d.addCallback(lambda verification_key: + self.failUnlessEqual(verification_key, self.verification_key)) + + d.addCallback(lambda ignored: + mr.get_seqnum()) + d.addCallback(lambda seqnum: + self.failUnlessEqual(seqnum, 0)) + + d.addCallback(lambda ignored: + mr.get_root_hash()) + d.addCallback(lambda root_hash: + self.failUnlessEqual(self.root_hash, root_hash)) + + d.addCallback(lambda ignored: + mr.get_salt_hash()) + d.addCallback(lambda salt_hash: + self.failUnlessEqual(self.salt_hash, salt_hash)) + + d.addCallback(lambda ignored: + mr.get_seqnum()) + d.addCallback(lambda seqnum: + self.failUnlessEqual(0, seqnum)) + + d.addCallback(lambda ignored: + mr.get_encoding_parameters()) + def _check_encoding_parameters((k, n, segsize, datalen)): + self.failUnlessEqual(k, 3) + self.failUnlessEqual(n, 10) + self.failUnlessEqual(segsize, 6) + self.failUnlessEqual(datalen, 36) + d.addCallback(_check_encoding_parameters) + + d.addCallback(lambda ignored: + mr.get_checkstring()) + d.addCallback(lambda checkstring: + self.failUnlessEqual(checkstring, checkstring)) + return d + + + def test_read_with_different_tail_segment_size(self): + self.write_test_share_to_server("si1", tail_segment=True) + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = mr.get_block_and_salt(5) + def _check_tail_segment(results): + block, salt = results + self.failUnlessEqual(len(block), 1) + self.failUnlessEqual(block, "a") + d.addCallback(_check_tail_segment) + return d + + + def test_get_block_with_invalid_segnum(self): + self.write_test_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = defer.succeed(None) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "test invalid segnum", + None, + mr.get_block_and_salt, 7)) + return d + + + def test_get_encoding_parameters_first(self): + self.write_test_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = mr.get_encoding_parameters() + def _check_encoding_parameters((k, n, segment_size, datalen)): + self.failUnlessEqual(k, 3) + self.failUnlessEqual(n, 10) + self.failUnlessEqual(segment_size, 6) + self.failUnlessEqual(datalen, 36) + d.addCallback(_check_encoding_parameters) + return d + + + def test_get_seqnum_first(self): + self.write_test_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = mr.get_seqnum() + d.addCallback(lambda seqnum: + self.failUnlessEqual(seqnum, 0)) + return d + + + def test_get_root_hash_first(self): + self.write_test_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = mr.get_root_hash() + d.addCallback(lambda root_hash: + self.failUnlessEqual(root_hash, self.root_hash)) + return d + + + def test_get_salt_hash_first(self): + self.write_test_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = mr.get_salt_hash() + d.addCallback(lambda salt_hash: + self.failUnlessEqual(salt_hash, self.salt_hash)) + return d + + + def test_get_checkstring_first(self): + self.write_test_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = mr.get_checkstring() + d.addCallback(lambda checkstring: + self.failUnlessEqual(checkstring, self.checkstring)) + return d + + + def test_write_read_vectors(self): + # When writing for us, the storage server will return to us a + # read vector, along with its result. If a write fails because + # the test vectors failed, this read vector can help us to + # diagnose the problem. This test ensures that the read vector + # is working appropriately. + mw = self._make_new_mw("si1", 0) + d = defer.succeed(None) + + # Write one share. This should return a checkstring of nothing, + # since there is no data there. + d.addCallback(lambda ignored: + mw.put_block(self.block, 0, self.salt)) + def _check_first_write(results): + result, readvs = results + self.failUnless(result) + self.failIf(readvs) + d.addCallback(_check_first_write) + # Now, there should be a different checkstring returned when + # we write other shares + d.addCallback(lambda ignored: + mw.put_block(self.block, 1, self.salt)) + def _check_next_write(results): + result, readvs = results + self.failUnless(result) + self.expected_checkstring = mw.get_checkstring() + self.failUnlessIn(0, readvs) + self.failUnlessEqual(readvs[0][0], self.expected_checkstring) + d.addCallback(_check_next_write) + # Add the other four shares + for i in xrange(2, 6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(_check_next_write) + # Add the encrypted private key + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(_check_next_write) + # Add the block hash tree and share hash tree + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(_check_next_write) + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + d.addCallback(_check_next_write) + # Add the root hash and the salt hash. This should change the + # checkstring, but not in a way that we'll be able to see right + # now, since the read vectors are applied before the write + # vectors. + d.addCallback(lambda ignored: + mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + def _check_old_testv_after_new_one_is_written(results): + result, readvs = results + self.failUnless(result) + self.failUnlessIn(0, readvs) + self.failUnlessEqual(self.expected_checkstring, + readvs[0][0]) + new_checkstring = mw.get_checkstring() + self.failIfEqual(new_checkstring, + readvs[0][0]) + d.addCallback(_check_old_testv_after_new_one_is_written) + # Now add the signature. This should succeed, meaning that the + # data gets written and the read vector matches what the writer + # thinks should be there. + d.addCallback(lambda ignored: + mw.put_signature(self.signature)) + d.addCallback(_check_next_write) + # The checkstring remains the same for the rest of the process. + return d + + + def test_blockhashes_after_share_hash_chain(self): + mw = self._make_new_mw("si1", 0) + d = defer.succeed(None) + # Put everything up to and including the share hash chain + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + # Now try to put a block hash tree after the share hash chain. + # This won't necessarily overwrite the share hash chain, but it + # is a bad idea in general -- if we write one that is anything + # other than the exact size of the initial one, we will either + # overwrite the share hash chain, or give the reader (who uses + # the offset of the share hash chain as an end boundary) a + # shorter tree than they know to read, which will result in them + # reading junk. There is little reason to support it as a use + # case, so we should disallow it altogether. + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "test same blockhashes", + None, + mw.put_blockhashes, self.block_hash_tree)) + return d + + + def test_encprivkey_after_blockhashes(self): + mw = self._make_new_mw("si1", 0) + d = defer.succeed(None) + # Put everything up to and including the block hash tree + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "out of order private key", + None, + mw.put_encprivkey, self.encprivkey)) + return d + + + def test_share_hash_chain_after_signature(self): + mw = self._make_new_mw("si1", 0) + d = defer.succeed(None) + # Put everything up to and including the signature + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + d.addCallback(lambda ignored: + mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + d.addCallback(lambda ignored: + mw.put_signature(self.signature)) + # Now try to put the share hash chain again. This should fail + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "out of order share hash chain", + None, + mw.put_sharehashes, self.share_hash_chain)) + return d + + + def test_signature_after_verification_key(self): + mw = self._make_new_mw("si1", 0) + d = defer.succeed(None) + # Put everything up to and including the verification key. + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + d.addCallback(lambda ignored: + mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + d.addCallback(lambda ignored: + mw.put_signature(self.signature)) + d.addCallback(lambda ignored: + mw.put_verification_key(self.verification_key)) + # Now try to put the signature again. This should fail + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "signature after verification", + None, + mw.put_signature, self.signature)) + return d + + + def test_uncoordinated_write(self): + # Make two mutable writers, both pointing to the same storage + # server, both at the same storage index, and try writing to the + # same share. + mw1 = self._make_new_mw("si1", 0) + mw2 = self._make_new_mw("si1", 0) + d = defer.succeed(None) + def _check_success(results): + result, readvs = results + self.failUnless(result) + + def _check_failure(results): + result, readvs = results + self.failIf(result) + + d.addCallback(lambda ignored: + mw1.put_block(self.block, 0, self.salt)) + d.addCallback(_check_success) + d.addCallback(lambda ignored: + mw2.put_block(self.block, 0, self.salt)) + d.addCallback(_check_failure) + return d + + + def test_invalid_salt_size(self): + # Salts need to be 16 bytes in size. Writes that attempt to + # write more or less than this should be rejected. + mw = self._make_new_mw("si1", 0) + invalid_salt = "a" * 17 # 17 bytes + another_invalid_salt = "b" * 15 # 15 bytes + d = defer.succeed(None) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "salt too big", + None, + mw.put_block, self.block, 0, invalid_salt)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "salt too small", + None, + mw.put_block, self.block, 0, + another_invalid_salt)) + return d + + + def test_write_test_vectors(self): + # If we give the write proxy a bogus test vector at + # any point during the process, it should fail to write. + mw = self._make_new_mw("si1", 0) + mw.set_checkstring("this is a lie") + # The initial write should be expecting to find the improbable + # checkstring above in place; finding nothing, it should fail. + d = defer.succeed(None) + d.addCallback(lambda ignored: + mw.put_block(self.block, 0, self.salt)) + def _check_failure(results): + result, readv = results + self.failIf(result) + d.addCallback(_check_failure) + # Now set the checkstring to the empty string, which + # indicates that no share is there. + d.addCallback(lambda ignored: + mw.set_checkstring("")) + d.addCallback(lambda ignored: + mw.put_block(self.block, 0, self.salt)) + def _check_success(results): + result, readv = results + self.failUnless(result) + d.addCallback(_check_success) + # Now set the checkstring to something wrong + d.addCallback(lambda ignored: + mw.set_checkstring("something wrong")) + # This should fail to do anything + d.addCallback(lambda ignored: + mw.put_block(self.block, 1, self.salt)) + d.addCallback(_check_failure) + # Now set it back to what it should be. + d.addCallback(lambda ignored: + mw.set_checkstring(mw.get_checkstring())) + for i in xrange(1, 6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(_check_success) + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(_check_success) + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(_check_success) + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + d.addCallback(_check_success) + def _keep_old_checkstring(ignored): + self.old_checkstring = mw.get_checkstring() + mw.set_checkstring("foobarbaz") + d.addCallback(_keep_old_checkstring) + d.addCallback(lambda ignored: + mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + d.addCallback(_check_failure) + d.addCallback(lambda ignored: + self.failUnlessEqual(self.old_checkstring, mw.get_checkstring())) + def _restore_old_checkstring(ignored): + mw.set_checkstring(self.old_checkstring) + d.addCallback(_restore_old_checkstring) + d.addCallback(lambda ignored: + mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + # The checkstring should have been set appropriately for us on + # the last write; if we try to change it to something else, + # that change should cause the verification key step to fail. + d.addCallback(lambda ignored: + mw.set_checkstring("something else")) + d.addCallback(lambda ignored: + mw.put_signature(self.signature)) + d.addCallback(_check_failure) + d.addCallback(lambda ignored: + mw.set_checkstring(mw.get_checkstring())) + d.addCallback(lambda ignored: + mw.put_signature(self.signature)) + d.addCallback(_check_success) + d.addCallback(lambda ignored: + mw.put_verification_key(self.verification_key)) + d.addCallback(_check_success) + return d + + + def test_offset_only_set_on_success(self): + # The write proxy should be smart enough to detect when a write + # has failed, and to temper its definition of progress based on + # that. + mw = self._make_new_mw("si1", 0) + d = defer.succeed(None) + for i in xrange(1, 6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + def _break_checkstring(ignored): + self._old_checkstring = mw.get_checkstring() + mw.set_checkstring("foobarbaz") + + def _fix_checkstring(ignored): + mw.set_checkstring(self._old_checkstring) + + d.addCallback(_break_checkstring) + + # Setting the encrypted private key shouldn't work now, which is + # to be expected and is tested elsewhere. We also want to make + # sure that we can't add the block hash tree after a failed + # write of this sort. + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "test out-of-order blockhashes", + None, + mw.put_blockhashes, self.block_hash_tree)) + d.addCallback(_fix_checkstring) + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(_break_checkstring) + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "test out-of-order sharehashes", + None, + mw.put_sharehashes, self.share_hash_chain)) + d.addCallback(_fix_checkstring) + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(_break_checkstring) + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "out-of-order root hash", + None, + mw.put_root_and_salt_hashes, + self.root_hash, self.salt_hash)) + d.addCallback(_fix_checkstring) + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + d.addCallback(_break_checkstring) + d.addCallback(lambda ignored: + mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "out-of-order signature", + None, + mw.put_signature, self.signature)) + d.addCallback(_fix_checkstring) + d.addCallback(lambda ignored: + mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + d.addCallback(_break_checkstring) + d.addCallback(lambda ignored: + mw.put_signature(self.signature)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "out-of-order verification key", + None, + mw.put_verification_key, + self.verification_key)) + d.addCallback(_fix_checkstring) + d.addCallback(lambda ignored: + mw.put_signature(self.signature)) + d.addCallback(_break_checkstring) + d.addCallback(lambda ignored: + mw.put_verification_key(self.verification_key)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "out-of-order finish", + None, + mw.finish_publishing)) + return d + + + def serialize_blockhashes(self, blockhashes): + return "".join(blockhashes) + + + def serialize_sharehashes(self, sharehashes): + ret = "".join([struct.pack(">H32s", i, sharehashes[i]) + for i in sorted(sharehashes.keys())]) + return ret + + + def test_write(self): + # This translates to a file with 6 6-byte segments, and with 2-byte + # blocks. + mw = self._make_new_mw("si1", 0) + mw2 = self._make_new_mw("si1", 1) + # Test writing some blocks. + read = self.ss.remote_slot_readv + def _check_block_write(i, share): + self.failUnlessEqual(read("si1", [share], [(239 + (i * 2), 2)]), + {share: [self.block]}) + self.failUnlessEqual(read("si1", [share], [(143 + (i * 16), 16)]), + {share: [self.salt]}) + d = defer.succeed(None) + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(lambda ignored, i=i: + _check_block_write(i, 0)) + # Now try the same thing, but with share 1 instead of share 0. + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw2.put_block(self.block, i, self.salt)) + d.addCallback(lambda ignored, i=i: + _check_block_write(i, 1)) + + def _spy_on_results(results): + print read("si1", [], [(0, 40000000)]) + return results + + # Next, we make a fake encrypted private key, and put it onto the + # storage server. + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + # So far, we have: + # header: 143 bytes + # salts: 16 * 6 = 96 bytes + # blocks: 2 * 6 = 12 bytes + # = 251 bytes + expected_private_key_offset = 251 + self.failUnlessEqual(len(self.encprivkey), 7) + d.addCallback(lambda ignored: + self.failUnlessEqual(read("si1", [0], [(251, 7)]), + {0: [self.encprivkey]})) + + # Next, we put a fake block hash tree. + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + # The block hash tree got inserted at: + # header + salts + blocks: 251 bytes + # encrypted private key: 7 bytes + # = 258 bytes + expected_block_hash_offset = 258 + self.failUnlessEqual(len(self.block_hash_tree_s), 32 * 6) + d.addCallback(lambda ignored: + self.failUnlessEqual(read("si1", [0], [(expected_block_hash_offset, 32 * 6)]), + {0: [self.block_hash_tree_s]})) + + # Next, put a fake share hash chain + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + # The share hash chain got inserted at: + # header + salts + blocks + private key = 258 bytes + # block hash tree: 32 * 6 = 192 bytes + # = 450 bytes + expected_share_hash_offset = 450 + d.addCallback(lambda ignored: + self.failUnlessEqual(read("si1", [0],[(expected_share_hash_offset, (32 + 2) * 6)]), + {0: [self.share_hash_chain_s]})) + + # Next, we put what is supposed to be the root hash of + # our share hash tree but isn't, along with the flat hash + # of all the salts. + d.addCallback(lambda ignored: + mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + # The root hash gets inserted at byte 9 (its position is in the header, + # and is fixed). The salt is right after it. + def _check(ignored): + self.failUnlessEqual(read("si1", [0], [(9, 32)]), + {0: [self.root_hash]}) + self.failUnlessEqual(read("si1", [0], [(41, 32)]), + {0: [self.salt_hash]}) + d.addCallback(_check) + + # Next, we put a signature of the header block. + d.addCallback(lambda ignored: + mw.put_signature(self.signature)) + # The signature gets written to: + # header + salts + blocks + block and share hash tree = 654 + expected_signature_offset = 654 + self.failUnlessEqual(len(self.signature), 9) + d.addCallback(lambda ignored: + self.failUnlessEqual(read("si1", [0], [(expected_signature_offset, 9)]), + {0: [self.signature]})) + + # Next, we put the verification key + d.addCallback(lambda ignored: + mw.put_verification_key(self.verification_key)) + # The verification key gets written to: + # 654 + 9 = 663 bytes + expected_verification_key_offset = 663 + self.failUnlessEqual(len(self.verification_key), 6) + d.addCallback(lambda ignored: + self.failUnlessEqual(read("si1", [0], [(expected_verification_key_offset, 6)]), + {0: [self.verification_key]})) + + def _check_signable(ignored): + # Make sure that the signable is what we think it should be. + signable = mw.get_signable() + verno, seq, roothash, salthash, k, n, segsize, datalen = \ + struct.unpack(">BQ32s32sBBQQ", + signable) + self.failUnlessEqual(verno, 1) + self.failUnlessEqual(seq, 0) + self.failUnlessEqual(roothash, self.root_hash) + self.failUnlessEqual(salthash, self.salt_hash) + self.failUnlessEqual(k, 3) + self.failUnlessEqual(n, 10) + self.failUnlessEqual(segsize, 6) + self.failUnlessEqual(datalen, 36) + d.addCallback(_check_signable) + # Next, we cause the offset table to be published. + d.addCallback(lambda ignored: + mw.finish_publishing()) + expected_eof_offset = 669 + + # The offset table starts at byte 91. Happily, we have already + # worked out most of these offsets above, but we want to make + # sure that the representation on disk agrees what what we've + # calculated. + # + # (we don't have an explicit offset for the AES salts, because + # we know that they start right after the header) + def _check_offsets(ignored): + # Check the version number to make sure that it is correct. + expected_version_number = struct.pack(">B", 1) + self.failUnlessEqual(read("si1", [0], [(0, 1)]), + {0: [expected_version_number]}) + # Check the sequence number to make sure that it is correct + expected_sequence_number = struct.pack(">Q", 0) + self.failUnlessEqual(read("si1", [0], [(1, 8)]), + {0: [expected_sequence_number]}) + # Check that the encoding parameters (k, N, segement size, data + # length) are what they should be. These are 3, 10, 6, 36 + expected_k = struct.pack(">B", 3) + self.failUnlessEqual(read("si1", [0], [(73, 1)]), + {0: [expected_k]}) + expected_n = struct.pack(">B", 10) + self.failUnlessEqual(read("si1", [0], [(74, 1)]), + {0: [expected_n]}) + expected_segment_size = struct.pack(">Q", 6) + self.failUnlessEqual(read("si1", [0], [(75, 8)]), + {0: [expected_segment_size]}) + expected_data_length = struct.pack(">Q", 36) + self.failUnlessEqual(read("si1", [0], [(83, 8)]), + {0: [expected_data_length]}) + # 91 4 The offset of the share data + expected_offset = struct.pack(">L", 239) + self.failUnlessEqual(read("si1", [0], [(91, 4)]), + {0: [expected_offset]}) + # 95 8 The offset of the encrypted private key + expected_offset = struct.pack(">Q", expected_private_key_offset) + self.failUnlessEqual(read("si1", [0], [(95, 8)]), + {0: [expected_offset]}) + # 103 8 The offset of the block hash tree + expected_offset = struct.pack(">Q", expected_block_hash_offset) + self.failUnlessEqual(read("si1", [0], [(103, 8)]), + {0: [expected_offset]}) + # 111 8 The offset of the share hash chain + expected_offset = struct.pack(">Q", expected_share_hash_offset) + self.failUnlessEqual(read("si1", [0], [(111, 8)]), + {0: [expected_offset]}) + # 119 8 The offset of the signature + expected_offset = struct.pack(">Q", expected_signature_offset) + self.failUnlessEqual(read("si1", [0], [(119, 8)]), + {0: [expected_offset]}) + # 127 8 The offset of the verification key + expected_offset = struct.pack(">Q", expected_verification_key_offset) + self.failUnlessEqual(read("si1", [0], [(127, 8)]), + {0: [expected_offset]}) + # 135 8 offset of the EOF + expected_offset = struct.pack(">Q", expected_eof_offset) + self.failUnlessEqual(read("si1", [0], [(135, 8)]), + {0: [expected_offset]}) + # = 143 bytes in total. + d.addCallback(_check_offsets) + return d + + def _make_new_mw(self, si, share, datalength=36): + # This is a file of size 36 bytes. Since it has a segment + # size of 6, we know that it has 6 byte segments, which will + # be split into blocks of 2 bytes because our FEC k + # parameter is 3. + mw = MDMFSlotWriteProxy(share, self.rref, si, self.secrets, 0, 3, 10, + 6, datalength) + return mw + + + def test_write_rejected_with_too_many_blocks(self): + mw = self._make_new_mw("si0", 0) + + # Try writing too many blocks. We should not be able to write + # more than 6 + # blocks into each share. + d = defer.succeed(None) + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "too many blocks", + None, + mw.put_block, self.block, 7, self.salt)) + return d + + + def test_write_rejected_with_invalid_salt(self): + # Try writing an invalid salt. Salts are 16 bytes -- any more or + # less should cause an error. + mw = self._make_new_mw("si1", 0) + bad_salt = "a" * 17 # 17 bytes + d = defer.succeed(None) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "test_invalid_salt", + None, mw.put_block, self.block, 7, bad_salt)) + return d + + + def test_write_rejected_with_invalid_salt_hash(self): + # Try writing an invalid salt hash. These should be SHA256d, and + # 32 bytes long as a result. + mw = self._make_new_mw("si2", 0) + invalid_salt_hash = "b" * 31 + d = defer.succeed(None) + # Before this test can work, we need to put some blocks + salts, + # a block hash tree, and a share hash tree. Otherwise, we'll see + # failures that match what we are looking for, but are caused by + # the constraints imposed on operation ordering. + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "invalid root hash", + None, mw.put_root_and_salt_hashes, + self.root_hash, invalid_salt_hash)) + return d + + + def test_write_rejected_with_invalid_root_hash(self): + # Try writing an invalid root hash. This should be SHA256d, and + # 32 bytes long as a result. + mw = self._make_new_mw("si2", 0) + # 17 bytes != 32 bytes + invalid_root_hash = "a" * 17 + d = defer.succeed(None) + # Before this test can work, we need to put some blocks + salts, + # a block hash tree, and a share hash tree. Otherwise, we'll see + # failures that match what we are looking for, but are caused by + # the constraints imposed on operation ordering. + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "invalid root hash", + None, mw.put_root_and_salt_hashes, + invalid_root_hash, self.salt_hash)) + return d + + + def test_write_rejected_with_invalid_blocksize(self): + # The blocksize implied by the writer that we get from + # _make_new_mw is 2bytes -- any more or any less than this + # should be cause for failure, unless it is the tail segment, in + # which case it may not be failure. + invalid_block = "a" + mw = self._make_new_mw("si3", 0, 33) # implies a tail segment with + # one byte blocks + # 1 bytes != 2 bytes + d = defer.succeed(None) + d.addCallback(lambda ignored, invalid_block=invalid_block: + self.shouldFail(LayoutInvalid, "test blocksize too small", + None, mw.put_block, invalid_block, 0, + self.salt)) + invalid_block = invalid_block * 3 + # 3 bytes != 2 bytes + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "test blocksize too large", + None, + mw.put_block, invalid_block, 0, self.salt)) + for i in xrange(5): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + # Try to put an invalid tail segment + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "test invalid tail segment", + None, + mw.put_block, self.block, 5, self.salt)) + valid_block = "a" + d.addCallback(lambda ignored: + mw.put_block(valid_block, 5, self.salt)) + return d + + + def test_write_enforces_order_constraints(self): + # We require that the MDMFSlotWriteProxy be interacted with in a + # specific way. + # That way is: + # 0: __init__ + # 1: write blocks and salts + # 2: Write the encrypted private key + # 3: Write the block hashes + # 4: Write the share hashes + # 5: Write the root hash and salt hash + # 6: Write the signature and verification key + # 7: Write the file. + # + # Some of these can be performed out-of-order, and some can't. + # The dependencies that I want to test here are: + # - Private key before block hashes + # - share hashes and block hashes before root hash + # - root hash before signature + # - signature before verification key + mw0 = self._make_new_mw("si0", 0) + # Write some shares + d = defer.succeed(None) + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw0.put_block(self.block, i, self.salt)) + # Try to write the block hashes before writing the encrypted + # private key + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "block hashes before key", + None, mw0.put_blockhashes, + self.block_hash_tree)) + + # Write the private key. + d.addCallback(lambda ignored: + mw0.put_encprivkey(self.encprivkey)) + + + # Try to write the share hash chain without writing the block + # hash tree + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "share hash chain before " + "block hash tree", + None, + mw0.put_sharehashes, self.share_hash_chain)) + + # Try to write the root hash and salt hash without writing either the + # block hashes or the share hashes + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "root hash before share hashes", + None, + mw0.put_root_and_salt_hashes, + self.root_hash, self.salt_hash)) + + # Now write the block hashes and try again + d.addCallback(lambda ignored: + mw0.put_blockhashes(self.block_hash_tree)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "root hash before share hashes", + None, mw0.put_root_and_salt_hashes, + self.root_hash, self.salt_hash)) + + # We haven't yet put the root hash on the share, so we shouldn't + # be able to sign it. + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "signature before root hash", + None, mw0.put_signature, self.signature)) + + d.addCallback(lambda ignored: + self.failUnlessRaises(LayoutInvalid, mw0.get_signable)) + + # ..and, since that fails, we also shouldn't be able to put the + # verification key. + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "key before signature", + None, mw0.put_verification_key, + self.verification_key)) + + # Now write the share hashes and verify that it works. + d.addCallback(lambda ignored: + mw0.put_sharehashes(self.share_hash_chain)) + + # We should still be unable to sign the header + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "signature before hashes", + None, + mw0.put_signature, self.signature)) + + # We should be able to write the root hash now too + d.addCallback(lambda ignored: + mw0.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + + # We should still be unable to put the verification key + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "key before signature", + None, mw0.put_verification_key, + self.verification_key)) + + d.addCallback(lambda ignored: + mw0.put_signature(self.signature)) + + # We shouldn't be able to write the offsets to the remote server + # until the offset table is finished; IOW, until we have written + # the verification key. + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "offsets before verification key", + None, + mw0.finish_publishing)) + + d.addCallback(lambda ignored: + mw0.put_verification_key(self.verification_key)) + return d + + + def test_end_to_end(self): + mw = self._make_new_mw("si1", 0) + # Write a share using the mutable writer, and make sure that the + # reader knows how to read everything back to us. + d = defer.succeed(None) + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + d.addCallback(lambda ignored: + mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + d.addCallback(lambda ignored: + mw.put_signature(self.signature)) + d.addCallback(lambda ignored: + mw.put_verification_key(self.verification_key)) + d.addCallback(lambda ignored: + mw.finish_publishing()) + + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + def _check_block_and_salt((block, salt)): + self.failUnlessEqual(block, self.block) + self.failUnlessEqual(salt, self.salt) + + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mr.get_block_and_salt(i)) + d.addCallback(_check_block_and_salt) + + d.addCallback(lambda ignored: + mr.get_encprivkey()) + d.addCallback(lambda encprivkey: + self.failUnlessEqual(self.encprivkey, encprivkey)) + + d.addCallback(lambda ignored: + mr.get_blockhashes()) + d.addCallback(lambda blockhashes: + self.failUnlessEqual(self.block_hash_tree, blockhashes)) + + d.addCallback(lambda ignored: + mr.get_sharehashes()) + d.addCallback(lambda sharehashes: + self.failUnlessEqual(self.share_hash_chain, sharehashes)) + + d.addCallback(lambda ignored: + mr.get_signature()) + d.addCallback(lambda signature: + self.failUnlessEqual(signature, self.signature)) + + d.addCallback(lambda ignored: + mr.get_verification_key()) + d.addCallback(lambda verification_key: + self.failUnlessEqual(verification_key, self.verification_key)) + + d.addCallback(lambda ignored: + mr.get_seqnum()) + d.addCallback(lambda seqnum: + self.failUnlessEqual(seqnum, 0)) + + d.addCallback(lambda ignored: + mr.get_root_hash()) + d.addCallback(lambda root_hash: + self.failUnlessEqual(self.root_hash, root_hash)) + + d.addCallback(lambda ignored: + mr.get_salt_hash()) + d.addCallback(lambda salt_hash: + self.failUnlessEqual(self.salt_hash, salt_hash)) + + d.addCallback(lambda ignored: + mr.get_encoding_parameters()) + def _check_encoding_parameters((k, n, segsize, datalen)): + self.failUnlessEqual(k, 3) + self.failUnlessEqual(n, 10) + self.failUnlessEqual(segsize, 6) + self.failUnlessEqual(datalen, 36) + d.addCallback(_check_encoding_parameters) + + d.addCallback(lambda ignored: + mr.get_checkstring()) + d.addCallback(lambda checkstring: + self.failUnlessEqual(checkstring, mw.get_checkstring())) + return d + + + def test_is_sdmf(self): + # The MDMFSlotReadProxy should also know how to read SDMF files, + # since it will encounter them on the grid. Callers use the + # is_sdmf method to test this. + self.write_sdmf_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = mr.is_sdmf() + d.addCallback(lambda issdmf: + self.failUnless(issdmf)) + return d + + + def test_reads_sdmf(self): + # The slot read proxy should, naturally, know how to tell us + # about data in the SDMF format + self.write_sdmf_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = defer.succeed(None) + d.addCallback(lambda ignored: + mr.is_sdmf()) + d.addCallback(lambda issdmf: + self.failUnless(issdmf)) + + # What do we need to read? + # - The sharedata + # - The salt + d.addCallback(lambda ignored: + mr.get_block_and_salt(0)) + def _check_block_and_salt(results): + block, salt = results + self.failUnlessEqual(block, self.block * 6) + self.failUnlessEqual(salt, self.salt) + d.addCallback(_check_block_and_salt) + + # - The blockhashes + d.addCallback(lambda ignored: + mr.get_blockhashes()) + d.addCallback(lambda blockhashes: + self.failUnlessEqual(self.block_hash_tree, + blockhashes, + blockhashes)) + # - The sharehashes + d.addCallback(lambda ignored: + mr.get_sharehashes()) + d.addCallback(lambda sharehashes: + self.failUnlessEqual(self.share_hash_chain, + sharehashes)) + # - The keys + d.addCallback(lambda ignored: + mr.get_encprivkey()) + d.addCallback(lambda encprivkey: + self.failUnlessEqual(encprivkey, self.encprivkey, encprivkey)) + d.addCallback(lambda ignored: + mr.get_verification_key()) + d.addCallback(lambda verification_key: + self.failUnlessEqual(verification_key, + self.verification_key, + verification_key)) + # - The signature + d.addCallback(lambda ignored: + mr.get_signature()) + d.addCallback(lambda signature: + self.failUnlessEqual(signature, self.signature, signature)) + + # - The sequence number + d.addCallback(lambda ignored: + mr.get_seqnum()) + d.addCallback(lambda seqnum: + self.failUnlessEqual(seqnum, 0, seqnum)) + + # - The root hash + # - The salt hash (to verify that it is None) + d.addCallback(lambda ignored: + mr.get_root_hash()) + d.addCallback(lambda root_hash: + self.failUnlessEqual(root_hash, self.root_hash, root_hash)) + d.addCallback(lambda ignored: + mr.get_salt_hash()) + d.addCallback(lambda salt_hash: + self.failIf(salt_hash)) + return d + + + def test_only_reads_one_segment_sdmf(self): + # SDMF shares have only one segment, so it doesn't make sense to + # read more segments than that. The reader should know this and + # complain if we try to do that. + self.write_sdmf_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = defer.succeed(None) + d.addCallback(lambda ignored: + mr.is_sdmf()) + d.addCallback(lambda issdmf: + self.failUnless(issdmf)) + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "test bad segment", + None, + mr.get_block_and_salt, 1)) + return d + + + def test_read_with_prefetched_mdmf_data(self): + # The MDMFSlotReadProxy will prefill certain fields if you pass + # it data that you have already fetched. This is useful for + # cases like the Servermap, which prefetches ~2kb of data while + # finding out which shares are on the remote peer so that it + # doesn't waste round trips. + mdmf_data = self.build_test_mdmf_share() + # We're telling it enough to figure out whether it is SDMF or + # MDMF. + mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:1]) + self.failUnlessEqual(mr._version_number, MDMF_VERSION) + + # Now we're telling it more, but still not enough to flesh out + # the rest of the encoding parameter, so none of them should be + # set. + mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:10]) + self.failUnlessEqual(mr._version_number, MDMF_VERSION) + self.failIf(mr._sequence_number) + + # This should be enough to flesh out the encoding parameters of + # an MDMF file. + mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:91]) + self.failUnlessEqual(mr._version_number, MDMF_VERSION) + self.failUnlessEqual(mr._root_hash, self.root_hash), + self.failUnlessEqual(mr._sequence_number, 0) + self.failUnlessEqual(mr._required_shares, 3) + self.failUnlessEqual(mr._total_shares, 10) + + # This should be enough to fill in the encoding parameters and + # a little more, but not enough to complete the offset table. + mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:100]) + self.failUnlessEqual(mr._version_number, MDMF_VERSION) + self.failUnlessEqual(mr._root_hash, self.root_hash) + self.failUnlessEqual(mr._sequence_number, 0) + self.failUnlessEqual(mr._required_shares, 3) + self.failUnlessEqual(mr._total_shares, 10) + self.failIf(mr._offsets) + + # This should be enough to fill in both the encoding parameters + # and the table of offsets + mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:143]) + self.failUnlessEqual(mr._version_number, MDMF_VERSION) + self.failUnlessEqual(mr._root_hash, self.root_hash) + self.failUnlessEqual(mr._sequence_number, 0) + self.failUnlessEqual(mr._required_shares, 3) + self.failUnlessEqual(mr._total_shares, 10) + self.failUnless(mr._offsets) + + + def test_read_with_prefetched_sdmf_data(self): + sdmf_data = self.build_test_sdmf_share() + # Feed it just enough data to check the share type + mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:1]) + self.failUnlessEqual(mr._version_number, SDMF_VERSION) + self.failIf(mr._sequence_number) + + # Now feed it more data, but not enough data to populate the + # encoding parameters. The results should be exactly the same as + # before. + mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:10]) + self.failUnlessEqual(mr._version_number, SDMF_VERSION) + self.failIf(mr._sequence_number) + + # Now feed it enough data to populate the encoding parameters + mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:75]) + self.failUnlessEqual(mr._version_number, SDMF_VERSION) + self.failUnlessEqual(mr._sequence_number, 0) + self.failUnlessEqual(mr._root_hash, self.root_hash) + self.failUnlessEqual(mr._required_shares, 3) + self.failUnlessEqual(mr._total_shares, 10) + + # Now feed it enough data to populate the encoding parameters + # and then some, but not enough to fill in the offset table. + mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:100]) + self.failUnlessEqual(mr._version_number, SDMF_VERSION) + self.failUnlessEqual(mr._sequence_number, 0) + self.failUnlessEqual(mr._root_hash, self.root_hash) + self.failUnlessEqual(mr._required_shares, 3) + self.failUnlessEqual(mr._total_shares, 10) + self.failIf(mr._offsets) + + # Now fill in the offset table. + mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:107]) + self.failUnlessEqual(mr._version_number, SDMF_VERSION) + self.failUnlessEqual(mr._sequence_number, 0) + self.failUnlessEqual(mr._root_hash, self.root_hash) + self.failUnlessEqual(mr._required_shares, 3) + self.failUnlessEqual(mr._total_shares, 10) + self.failUnless(mr._offsets) + + + def test_read_with_prefetched_bogus_data(self): + bogus_data = "kjkasdlkjsjkdjksajdjsadjsajdskaj" + # This shouldn't do anything. + mr = MDMFSlotReadProxy(self.rref, "si1", 0, bogus_data) + self.failIf(mr._version_number) + + + def test_read_with_empty_mdmf_file(self): + # Some tests upload a file with no contents to test things + # unrelated to the actual handling of the content of the file. + # The reader should behave intelligently in these cases. + self.write_test_share_to_server("si1", empty=True) + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + # We should be able to get the encoding parameters, and they + # should be correct. + d = defer.succeed(None) + d.addCallback(lambda ignored: + mr.get_encoding_parameters()) + def _check_encoding_parameters(params): + self.failUnlessEqual(len(params), 4) + k, n, segsize, datalen = params + self.failUnlessEqual(k, 3) + self.failUnlessEqual(n, 10) + self.failUnlessEqual(segsize, 0) + self.failUnlessEqual(datalen, 0) + d.addCallback(_check_encoding_parameters) + + # We should not be able to fetch a block, since there are no + # blocks to fetch + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "get block on empty file", + None, + mr.get_block_and_salt, 0)) + return d + + + def test_read_with_empty_sdmf_file(self): + self.write_sdmf_share_to_server("si1", empty=True) + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + # We should be able to get the encoding parameters, and they + # should be correct + d = defer.succeed(None) + d.addCallback(lambda ignored: + mr.get_encoding_parameters()) + def _check_encoding_parameters(params): + self.failUnlessEqual(len(params), 4) + k, n, segsize, datalen = params + self.failUnlessEqual(k, 3) + self.failUnlessEqual(n, 10) + self.failUnlessEqual(segsize, 0) + self.failUnlessEqual(datalen, 0) + d.addCallback(_check_encoding_parameters) + + # It does not make sense to get a block in this format, so we + # should not be able to. + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "get block on an empty file", + None, + mr.get_block_and_salt, 0)) + return d + + + def test_verinfo_with_sdmf_file(self): + self.write_sdmf_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + # We should be able to get the version information. + d = defer.succeed(None) + d.addCallback(lambda ignored: + mr.get_verinfo()) + def _check_verinfo(verinfo): + self.failUnless(verinfo) + self.failUnlessEqual(len(verinfo), 9) + (seqnum, + root_hash, + salt, + segsize, + datalen, + k, + n, + prefix, + offsets) = verinfo + self.failUnlessEqual(seqnum, 0) + self.failUnlessEqual(root_hash, self.root_hash) + self.failUnlessEqual(salt, self.salt) + self.failUnlessEqual(segsize, 36) + self.failUnlessEqual(datalen, 36) + self.failUnlessEqual(k, 3) + self.failUnlessEqual(n, 10) + expected_prefix = struct.pack(">BQ32s16s BBQQ", + 0, + seqnum, + root_hash, + salt, + k, + n, + segsize, + datalen) + self.failUnlessEqual(prefix, expected_prefix) + self.failUnlessEqual(offsets, self.offsets) + d.addCallback(_check_verinfo) + return d + + + def test_verinfo_with_mdmf_file(self): + self.write_test_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = defer.succeed(None) + d.addCallback(lambda ignored: + mr.get_verinfo()) + def _check_verinfo(verinfo): + self.failUnless(verinfo) + self.failUnlessEqual(len(verinfo), 9) + (seqnum, + root_hash, + salt_hash, + segsize, + datalen, + k, + n, + prefix, + offsets) = verinfo + self.failUnlessEqual(seqnum, 0) + self.failUnlessEqual(root_hash, self.root_hash) + self.failUnlessEqual(salt_hash, self.salt_hash) + self.failUnlessEqual(segsize, 6) + self.failUnlessEqual(datalen, 36) + self.failUnlessEqual(k, 3) + self.failUnlessEqual(n, 10) + expected_prefix = struct.pack(">BQ32s32s BBQQ", + 1, + seqnum, + root_hash, + salt_hash, + k, + n, + segsize, + datalen) + self.failUnlessEqual(prefix, expected_prefix) + self.failUnlessEqual(offsets, self.offsets) + d.addCallback(_check_verinfo) + return d + + class Stats(unittest.TestCase): def setUp(self): } [Alter MDMF proxy tests to reflect the new form of caching Kevan Carstensen **20100614213459 Ignore-this: 3e84dbd1b6ea103be36e0e98babe79d4 ] { hunk ./src/allmydata/test/test_storage.py 23 from allmydata.immutable.layout import WriteBucketProxy, WriteBucketProxy_v2, \ ReadBucketProxy from allmydata.mutable.layout import MDMFSlotWriteProxy, MDMFSlotReadProxy, \ - LayoutInvalid + LayoutInvalid, MDMFSIGNABLEHEADER, \ + SIGNED_PREFIX from allmydata.interfaces import BadWriteEnablerError, MDMF_VERSION, \ SDMF_VERSION from allmydata.test.common import LoggingServiceParent, ShouldFailMixin hunk ./src/allmydata/test/test_storage.py 105 class RemoteBucket: + def __init__(self): + self.read_count = 0 + self.write_count = 0 + def callRemote(self, methname, *args, **kwargs): def _call(): meth = getattr(self.target, "remote_" + methname) hunk ./src/allmydata/test/test_storage.py 113 return meth(*args, **kwargs) + + if methname == "slot_readv": + self.read_count += 1 + if methname == "slot_writev": + self.write_count += 1 + return defer.maybeDeferred(_call) hunk ./src/allmydata/test/test_storage.py 121 + class BucketProxy(unittest.TestCase): def make_bucket(self, name, size): basedir = os.path.join("storage", "BucketProxy", name) hunk ./src/allmydata/test/test_storage.py 2605 mr.get_block_and_salt(0)) def _check_block_and_salt(results): block, salt = results + # Our original file is 36 bytes long. Then each share is 12 + # bytes in size. The share is composed entirely of the + # letter a. self.block contains 2 as, so 6 * self.block is + # what we are looking for. self.failUnlessEqual(block, self.block * 6) self.failUnlessEqual(salt, self.salt) d.addCallback(_check_block_and_salt) hunk ./src/allmydata/test/test_storage.py 2687 # finding out which shares are on the remote peer so that it # doesn't waste round trips. mdmf_data = self.build_test_mdmf_share() - # We're telling it enough to figure out whether it is SDMF or - # MDMF. - mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:1]) - self.failUnlessEqual(mr._version_number, MDMF_VERSION) - - # Now we're telling it more, but still not enough to flesh out - # the rest of the encoding parameter, so none of them should be - # set. - mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:10]) - self.failUnlessEqual(mr._version_number, MDMF_VERSION) - self.failIf(mr._sequence_number) - - # This should be enough to flesh out the encoding parameters of - # an MDMF file. - mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:91]) - self.failUnlessEqual(mr._version_number, MDMF_VERSION) - self.failUnlessEqual(mr._root_hash, self.root_hash), - self.failUnlessEqual(mr._sequence_number, 0) - self.failUnlessEqual(mr._required_shares, 3) - self.failUnlessEqual(mr._total_shares, 10) - - # This should be enough to fill in the encoding parameters and - # a little more, but not enough to complete the offset table. - mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:100]) - self.failUnlessEqual(mr._version_number, MDMF_VERSION) - self.failUnlessEqual(mr._root_hash, self.root_hash) - self.failUnlessEqual(mr._sequence_number, 0) - self.failUnlessEqual(mr._required_shares, 3) - self.failUnlessEqual(mr._total_shares, 10) - self.failIf(mr._offsets) + self.write_test_share_to_server("si1") + def _make_mr(ignored, length): + mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:length]) + return mr hunk ./src/allmydata/test/test_storage.py 2692 + d = defer.succeed(None) # This should be enough to fill in both the encoding parameters hunk ./src/allmydata/test/test_storage.py 2694 - # and the table of offsets - mr = MDMFSlotReadProxy(self.rref, "si1", 0, mdmf_data[:143]) - self.failUnlessEqual(mr._version_number, MDMF_VERSION) - self.failUnlessEqual(mr._root_hash, self.root_hash) - self.failUnlessEqual(mr._sequence_number, 0) - self.failUnlessEqual(mr._required_shares, 3) - self.failUnlessEqual(mr._total_shares, 10) - self.failUnless(mr._offsets) + # and the table of offsets, which will complete the version + # information tuple. + d.addCallback(_make_mr, 143) + d.addCallback(lambda mr: + mr.get_verinfo()) + def _check_verinfo(verinfo): + self.failUnless(verinfo) + self.failUnlessEqual(len(verinfo), 9) + (seqnum, + root_hash, + salt_hash, + segsize, + datalen, + k, + n, + prefix, + offsets) = verinfo + self.failUnlessEqual(seqnum, 0) + self.failUnlessEqual(root_hash, self.root_hash) + self.failUnlessEqual(salt_hash, self.salt_hash) + self.failUnlessEqual(segsize, 6) + self.failUnlessEqual(datalen, 36) + self.failUnlessEqual(k, 3) + self.failUnlessEqual(n, 10) + expected_prefix = struct.pack(MDMFSIGNABLEHEADER, + 1, + seqnum, + root_hash, + salt_hash, + k, + n, + segsize, + datalen) + self.failUnlessEqual(expected_prefix, prefix) + self.failUnlessEqual(self.rref.read_count, 0) + d.addCallback(_check_verinfo) + # This is not enough data to read a block and a share, so the + # wrapper should attempt to read this from the remote server. + d.addCallback(_make_mr, 143) + d.addCallback(lambda mr: + mr.get_block_and_salt(0)) + def _check_block_and_salt((block, salt)): + self.failUnlessEqual(block, self.block) + self.failUnlessEqual(salt, self.salt) + self.failUnlessEqual(self.rref.read_count, 1) + # The file that we're playing with has 6 segments. Then there + # are 6 * 16 = 96 bytes of salts before we can write shares. + # Each block has two bytes, so 143 + 96 + 2 = 241 bytes should + # be enough to read one block. + d.addCallback(_make_mr, 241) + d.addCallback(lambda mr: + mr.get_block_and_salt(0)) + d.addCallback(_check_block_and_salt) + return d def test_read_with_prefetched_sdmf_data(self): hunk ./src/allmydata/test/test_storage.py 2752 sdmf_data = self.build_test_sdmf_share() - # Feed it just enough data to check the share type - mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:1]) - self.failUnlessEqual(mr._version_number, SDMF_VERSION) - self.failIf(mr._sequence_number) - - # Now feed it more data, but not enough data to populate the - # encoding parameters. The results should be exactly the same as - # before. - mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:10]) - self.failUnlessEqual(mr._version_number, SDMF_VERSION) - self.failIf(mr._sequence_number) - - # Now feed it enough data to populate the encoding parameters - mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:75]) - self.failUnlessEqual(mr._version_number, SDMF_VERSION) - self.failUnlessEqual(mr._sequence_number, 0) - self.failUnlessEqual(mr._root_hash, self.root_hash) - self.failUnlessEqual(mr._required_shares, 3) - self.failUnlessEqual(mr._total_shares, 10) - - # Now feed it enough data to populate the encoding parameters - # and then some, but not enough to fill in the offset table. - mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:100]) - self.failUnlessEqual(mr._version_number, SDMF_VERSION) - self.failUnlessEqual(mr._sequence_number, 0) - self.failUnlessEqual(mr._root_hash, self.root_hash) - self.failUnlessEqual(mr._required_shares, 3) - self.failUnlessEqual(mr._total_shares, 10) - self.failIf(mr._offsets) - - # Now fill in the offset table. - mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:107]) - self.failUnlessEqual(mr._version_number, SDMF_VERSION) - self.failUnlessEqual(mr._sequence_number, 0) - self.failUnlessEqual(mr._root_hash, self.root_hash) - self.failUnlessEqual(mr._required_shares, 3) - self.failUnlessEqual(mr._total_shares, 10) - self.failUnless(mr._offsets) + self.write_sdmf_share_to_server("si1") + def _make_mr(ignored, length): + mr = MDMFSlotReadProxy(self.rref, "si1", 0, sdmf_data[:length]) + return mr hunk ./src/allmydata/test/test_storage.py 2757 + d = defer.succeed(None) + # This should be enough to get us the encoding parameters, + # offset table, and everything else we need to build a verinfo + # string. + d.addCallback(_make_mr, 107) + d.addCallback(lambda mr: + mr.get_verinfo()) + def _check_verinfo(verinfo): + self.failUnless(verinfo) + self.failUnlessEqual(len(verinfo), 9) + (seqnum, + root_hash, + salt, + segsize, + datalen, + k, + n, + prefix, + offsets) = verinfo + self.failUnlessEqual(seqnum, 0) + self.failUnlessEqual(root_hash, self.root_hash) + self.failUnlessEqual(salt, self.salt) + self.failUnlessEqual(segsize, 36) + self.failUnlessEqual(datalen, 36) + self.failUnlessEqual(k, 3) + self.failUnlessEqual(n, 10) + expected_prefix = struct.pack(SIGNED_PREFIX, + 0, + seqnum, + root_hash, + salt, + k, + n, + segsize, + datalen) + self.failUnlessEqual(expected_prefix, prefix) + self.failUnlessEqual(self.rref.read_count, 0) + d.addCallback(_check_verinfo) + # This shouldn't be enough to read any share data. + d.addCallback(_make_mr, 107) + d.addCallback(lambda mr: + mr.get_block_and_salt(0)) + def _check_block_and_salt((block, salt)): + self.failUnlessEqual(block, self.block * 6) + self.failUnlessEqual(salt, self.salt) + # TODO: Fix the read routine so that it reads only the data + # that it has cached if it can't read all of it. + self.failUnlessEqual(self.rref.read_count, 2) hunk ./src/allmydata/test/test_storage.py 2806 - def test_read_with_prefetched_bogus_data(self): - bogus_data = "kjkasdlkjsjkdjksajdjsadjsajdskaj" - # This shouldn't do anything. - mr = MDMFSlotReadProxy(self.rref, "si1", 0, bogus_data) - self.failIf(mr._version_number) + # This should be enough to read share data. + d.addCallback(_make_mr, self.offsets['share_data']) + d.addCallback(lambda mr: + mr.get_block_and_salt(0)) + d.addCallback(_check_block_and_salt) + return d def test_read_with_empty_mdmf_file(self): } [Add tests and support functions for servermap tests Kevan Carstensen **20100614213721 Ignore-this: 583734d2f728fc80637b5c0c0f4c0fc ] { hunk ./src/allmydata/test/test_mutable.py 103 d = fireEventually() d.addCallback(lambda res: _call()) return d + def callRemoteOnly(self, methname, *args, **kwargs): d = self.callRemote(methname, *args, **kwargs) d.addBoth(lambda ignore: None) hunk ./src/allmydata/test/test_mutable.py 152 chr(ord(original[byte_offset]) ^ 0x01) + original[byte_offset+1:]) +def add_two(original, byte_offset): + # It isn't enough to simply flip the bit for the version number, + # because 1 is a valid version number. So we add two instead. + return (original[:byte_offset] + + chr(ord(original[byte_offset]) ^ 0x02) + + original[byte_offset+1:]) + def corrupt(res, s, offset, shnums_to_corrupt=None, offset_offset=0): # if shnums_to_corrupt is None, corrupt all shares. Otherwise it is a # list of shnums to corrupt. hunk ./src/allmydata/test/test_mutable.py 188 real_offset = offset1 real_offset = int(real_offset) + offset2 + offset_offset assert isinstance(real_offset, int), offset - shares[shnum] = flip_bit(data, real_offset) + if offset1 == 0: # verbyte + f = add_two + else: + f = flip_bit + shares[shnum] = f(data, real_offset) return res def make_storagebroker(s=None, num_peers=10): hunk ./src/allmydata/test/test_mutable.py 625 d.addCallback(_created) return d - def publish_multiple(self): + def publish_mdmf(self): + # like publish_one, except that the result is guaranteed to be + # an MDMF file. + # self.CONTENTS should have more than one segment. + self.CONTENTS = "This is an MDMF file" * 100000 + self._storage = FakeStorage() + self._nodemaker = make_nodemaker(self._storage) + self._storage_broker = self._nodemaker.storage_broker + d = self._nodemaker.create_mutable_file(self.CONTENTS, version=1) + def _created(node): + self._fn = node + self._fn2 = self._nodemaker.create_from_cap(node.get_uri()) + d.addCallback(_created) + return d + + + def publish_sdmf(self): + # like publish_one, except that the result is guaranteed to be + # an SDMF file + self.CONTENTS = "This is an SDMF file" * 1000 + self._storage = FakeStorage() + self._nodemaker = make_nodemaker(self._storage) + self._storage_broker = self._nodemaker.storage_broker + d = self._nodemaker.create_mutable_file(self.CONTENTS, version=0) + def _created(node): + self._fn = node + self._fn2 = self._nodemaker.create_from_cap(node.get_uri()) + d.addCallback(_created) + return d + + + def publish_multiple(self, version=0): self.CONTENTS = ["Contents 0", "Contents 1", "Contents 2", hunk ./src/allmydata/test/test_mutable.py 665 self._copied_shares = {} self._storage = FakeStorage() self._nodemaker = make_nodemaker(self._storage) - d = self._nodemaker.create_mutable_file(self.CONTENTS[0]) # seqnum=1 + d = self._nodemaker.create_mutable_file(self.CONTENTS[0], version=version) # seqnum=1 def _created(node): self._fn = node # now create multiple versions of the same file, and accumulate hunk ./src/allmydata/test/test_mutable.py 689 d.addCallback(_created) return d + def _copy_shares(self, ignored, index): shares = self._storage._peers # we need a deep copy hunk ./src/allmydata/test/test_mutable.py 842 self._storage._peers = {} # delete all shares ms = self.make_servermap d = defer.succeed(None) - +# d.addCallback(lambda res: ms(mode=MODE_CHECK)) d.addCallback(lambda sm: self.failUnlessNoneRecoverable(sm)) hunk ./src/allmydata/test/test_mutable.py 894 return d + def test_servermapupdater_finds_mdmf_files(self): + # setUp already published an MDMF file for us. We just need to + # make sure that when we run the ServermapUpdater, the file is + # reported to have one recoverable version. + d = defer.succeed(None) + d.addCallback(lambda ignored: + self.publish_mdmf()) + d.addCallback(lambda ignored: + self.make_servermap(mode=MODE_CHECK)) + # Calling make_servermap also updates the servermap in the mode + # that we specify, so we just need to see what it says. + def _check_servermap(sm): + self.failUnlessEqual(len(sm.recoverable_versions()), 1) + d.addCallback(_check_servermap) + # Now, we upload more versions + d.addCallback(lambda ignored: + self.publish_multiple(version=1)) + d.addCallback(lambda ignored: + self.make_servermap(mode=MODE_CHECK)) + def _check_servermap_multiple(sm): + v = sm.recoverable_versions() + i = sm.unrecoverable_versions() + d.addCallback(_check_servermap_multiple) + return d + test_servermapupdater_finds_mdmf_files.todo = ("I don't know how to " + "write this yet") + + + def test_servermapupdater_finds_sdmf_files(self): + d = defer.succeed(None) + d.addCallback(lambda ignored: + self.publish_sdmf()) + d.addCallback(lambda ignored: + self.make_servermap(mode=MODE_CHECK)) + d.addCallback(lambda servermap: + self.failUnlessEqual(len(servermap.recoverable_versions()), 1)) + return d + class Roundtrip(unittest.TestCase, testutil.ShouldFailMixin, PublishMixin): def setUp(self): hunk ./src/allmydata/test/test_mutable.py 1084 return d def test_corrupt_all_verbyte(self): - # when the version byte is not 0, we hit an UnknownVersionError error - # in unpack_share(). + # when the version byte is not 0 or 1, we hit an UnknownVersionError + # error in unpack_share(). d = self._test_corrupt_all(0, "UnknownVersionError") def _check_servermap(servermap): # and the dump should mention the problems hunk ./src/allmydata/test/test_mutable.py 1091 s = StringIO() dump = servermap.dump(s).getvalue() - self.failUnless("10 PROBLEMS" in dump, dump) + self.failUnless("30 PROBLEMS" in dump, dump) d.addCallback(_check_servermap) return d hunk ./src/allmydata/test/test_mutable.py 2153 self.basedir = "mutable/Problems/test_privkey_query_missing" self.set_up_grid(num_servers=20) nm = self.g.clients[0].nodemaker - LARGE = "These are Larger contents" * 2000 # about 50KB + LARGE = "These are Larger contents" * 2000 # about 50KiB nm._node_cache = DevNullDictionary() # disable the nodecache d = nm.create_mutable_file(LARGE) } [Make a segmented downloader Kevan Carstensen **20100623001332 Ignore-this: f3543532a5d573cc884c17a4ebbf451e Rework the current mutable file Retrieve class to download segmented files. The rewrite preserves the semantics and basic conceptual state machine of the old Retrieve class, but adapts them to work with files with more than one segment, which involves a fairly substantial rewrite. I've also adapted some existing SDMF tests to work with the new downloader, as necessary. TODO: - Write tests for MDMF functionality. - Finish writing and testing salt functionality ] { hunk ./src/allmydata/mutable/retrieve.py 9 from twisted.python import failure from foolscap.api import DeadReferenceError, eventually, fireEventually from allmydata.interfaces import IRetrieveStatus, NotEnoughSharesError -from allmydata.util import hashutil, idlib, log +from allmydata.util import hashutil, idlib, log, mathutil from allmydata import hashtree, codec from allmydata.storage.server import si_b2a from pycryptopp.cipher.aes import AES hunk ./src/allmydata/mutable/retrieve.py 16 from pycryptopp.publickey import rsa from allmydata.mutable.common import DictOfSets, CorruptShareError, UncoordinatedWriteError -from allmydata.mutable.layout import SIGNED_PREFIX, unpack_share_data +from allmydata.mutable.layout import SIGNED_PREFIX, unpack_share_data, \ + MDMFSlotReadProxy class RetrieveStatus: implements(IRetrieveStatus) hunk ./src/allmydata/mutable/retrieve.py 103 self.verinfo = verinfo # during repair, we may be called upon to grab the private key, since # it wasn't picked up during a verify=False checker run, and we'll - # need it for repair to generate the a new version. + # need it for repair to generate a new version. self._need_privkey = fetch_privkey if self._node.get_privkey(): self._need_privkey = False hunk ./src/allmydata/mutable/retrieve.py 108 + if self._need_privkey: + # TODO: Evaluate the need for this. We'll use it if we want + # to limit how many queries are on the wire for the privkey + # at once. + self._privkey_query_markers = [] # one Marker for each time we've + # tried to get the privkey. + self._status = RetrieveStatus() self._status.set_storage_index(self._storage_index) self._status.set_helper(False) hunk ./src/allmydata/mutable/retrieve.py 124 offsets_tuple) = self.verinfo self._status.set_size(datalength) self._status.set_encoding(k, N) + self.readers = {} def get_status(self): return self._status hunk ./src/allmydata/mutable/retrieve.py 148 self.remaining_sharemap = DictOfSets() for (shnum, peerid, timestamp) in shares: self.remaining_sharemap.add(shnum, peerid) + # If the servermap update fetched anything, it fetched at least 1 + # KiB, so we ask for that much. + # TODO: Change the cache methods to allow us to fetch all of the + # data that they have, then change this method to do that. + any_cache, timestamp = self._node._read_from_cache(self.verinfo, + shnum, + 0, + 1000) + ss = self.servermap.connections[peerid] + reader = MDMFSlotReadProxy(ss, + self._storage_index, + shnum, + any_cache) + reader.peerid = peerid + self.readers[shnum] = reader + self.shares = {} # maps shnum to validated blocks hunk ./src/allmydata/mutable/retrieve.py 166 + self._active_readers = [] # list of active readers for this dl. + self._validated_readers = set() # set of readers that we have + # validated the prefix of + self._block_hash_trees = {} # shnum => hashtree + # TODO: Make this into a file-backed consumer or something to + # conserve memory. + self._plaintext = "" # how many shares do we need? hunk ./src/allmydata/mutable/retrieve.py 175 - (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, + (seqnum, + root_hash, + IV, + segsize, + datalength, + k, + N, + prefix, offsets_tuple) = self.verinfo hunk ./src/allmydata/mutable/retrieve.py 184 - assert len(self.remaining_sharemap) >= k - # we start with the lowest shnums we have available, since FEC is - # faster if we're using "primary shares" - self.active_shnums = set(sorted(self.remaining_sharemap.keys())[:k]) - for shnum in self.active_shnums: - # we use an arbitrary peer who has the share. If shares are - # doubled up (more than one share per peer), we could make this - # run faster by spreading the load among multiple peers. But the - # algorithm to do that is more complicated than I want to write - # right now, and a well-provisioned grid shouldn't have multiple - # shares per peer. - peerid = list(self.remaining_sharemap[shnum])[0] - self.get_data(shnum, peerid) hunk ./src/allmydata/mutable/retrieve.py 185 - # control flow beyond this point: state machine. Receiving responses - # from queries is the input. We might send out more queries, or we - # might produce a result. hunk ./src/allmydata/mutable/retrieve.py 186 + # We need one share hash tree for the entire file; its leaves + # are the roots of the block hash trees for the shares that + # comprise it, and its root is in the verinfo. + self.share_hash_tree = hashtree.IncompleteHashTree(N) + self.share_hash_tree.set_hashes({0: root_hash}) + + # This will set up both the segment decoder and the tail segment + # decoder, as well as a variety of other instance variables that + # the download process will use. + self._setup_encoding_parameters() + assert len(self.remaining_sharemap) >= k + + self.log("starting download") + self._add_active_peers() + # The download process beyond this is a state machine. + # _add_active_peers will select the peers that we want to use + # for the download, and then attempt to start downloading. After + # each segment, it will check for doneness, reacting to broken + # peers and corrupt shares as necessary. If it runs out of good + # peers before downloading all of the segments, _done_deferred + # will errback. Otherwise, it will eventually callback with the + # contents of the mutable file. return self._done_deferred hunk ./src/allmydata/mutable/retrieve.py 210 - def get_data(self, shnum, peerid): - self.log(format="sending sh#%(shnum)d request to [%(peerid)s]", - shnum=shnum, - peerid=idlib.shortnodeid_b2a(peerid), - level=log.NOISY) - ss = self.servermap.connections[peerid] - started = time.time() - (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, + + def _setup_encoding_parameters(self): + """ + I set up the encoding parameters, including k, n, the number + of segments associated with this file, and the segment decoder. + I do not set the tail segment decoder, which is set in the + method that decodes the tail segment, as it is single-use. + """ + # XXX: Or is it? What if servers fail in the last step? + (seqnum, + root_hash, + IV, + segsize, + datalength, + k, + n, + known_prefix, offsets_tuple) = self.verinfo hunk ./src/allmydata/mutable/retrieve.py 228 - offsets = dict(offsets_tuple) + self._required_shares = k + self._total_shares = n + self._segment_size = segsize + self._data_length = datalength + if datalength and segsize: + self._num_segments = mathutil.div_ceil(datalength, segsize) + self._tail_data_size = datalength % segsize + else: + self._num_segments = 0 + self._tail_data_size = 0 + + self._segment_decoder = codec.CRSDecoder() + self._segment_decoder.set_params(segsize, k, n) + self._current_segment = 0 + + if not self._tail_data_size: + self._tail_data_size = segsize hunk ./src/allmydata/mutable/retrieve.py 246 - # we read the checkstring, to make sure that the data we grab is from - # the right version. - readv = [ (0, struct.calcsize(SIGNED_PREFIX)) ] + self._tail_segment_size = mathutil.next_multiple(self._tail_data_size, + self._required_shares) + if self._tail_segment_size == self._segment_size: + self._tail_decoder = self._segment_decoder + else: + self._tail_decoder = codec.CRSDecoder() + self._tail_decoder.set_params(self._tail_segment_size, + self._required_shares, + self._total_shares) + + self.log("got encoding parameters: " + "k: %d " + "n: %d " + "%d segments of %d bytes each (%d byte tail segment)" % \ + (k, n, self._num_segments, self._segment_size, + self._tail_segment_size)) + + for i in xrange(self._total_shares): + # So we don't have to do this later. + self._block_hash_trees[i] = hashtree.IncompleteHashTree(self._num_segments) hunk ./src/allmydata/mutable/retrieve.py 267 - # We also read the data, and the hashes necessary to validate them - # (share_hash_chain, block_hash_tree, share_data). We don't read the - # signature or the pubkey, since that was handled during the - # servermap phase, and we'll be comparing the share hash chain - # against the roothash that was validated back then. + # If we have more than one segment, we are an SDMF file, which + # means that we need to validate the salts as we receive them. + self._salt_hash_tree = hashtree.IncompleteHashTree(self._num_segments) + self._salt_hash_tree[0] = IV # from the prefix. + hunk ./src/allmydata/mutable/retrieve.py 273 - readv.append( (offsets['share_hash_chain'], - offsets['enc_privkey'] - offsets['share_hash_chain'] ) ) + def _add_active_peers(self): + """ + I populate self._active_readers with enough active readers to + retrieve the contents of this mutable file. I am called before + downloading starts, and (eventually) after each validation + error, connection error, or other problem in the download. + """ + # TODO: It would be cool to investigate other heuristics for + # reader selection. For instance, the cost (in time the user + # spends waiting for their file) of selecting a really slow peer + # that happens to have a primary share is probably more than + # selecting a really fast peer that doesn't have a primary + # share. Maybe the servermap could be extended to provide this + # information; it could keep track of latency information while + # it gathers more important data, and then this routine could + # use that to select active readers. + # + # (these and other questions would be easier to answer with a + # robust, configurable tahoe-lafs simulator, which modeled node + # failures, differences in node speed, and other characteristics + # that we expect storage servers to have. You could have + # presets for really stable grids (like allmydata.com), + # friendnets, make it easy to configure your own settings, and + # then simulate the effect of big changes on these use cases + # instead of just reasoning about what the effect might be. Out + # of scope for MDMF, though.) hunk ./src/allmydata/mutable/retrieve.py 300 - # if we need the private key (for repair), we also fetch that - if self._need_privkey: - readv.append( (offsets['enc_privkey'], - offsets['EOF'] - offsets['enc_privkey']) ) + # We need at least self._required_shares readers to download a + # segment. + needed = self._required_shares - len(self._active_readers) + # XXX: Why don't format= log messages work here? + self.log("adding %d peers to the active peers list" % needed) hunk ./src/allmydata/mutable/retrieve.py 306 - m = Marker() - self._outstanding_queries[m] = (peerid, shnum, started) + # We favor lower numbered shares, since FEC is faster with + # primary shares than with other shares, and lower-numbered + # shares are more likely to be primary than higher numbered + # shares. + active_shnums = set(sorted(self.remaining_sharemap.keys())) + active_shnums = list(active_shnums)[:needed] + if len(active_shnums) < needed: + # We don't have enough readers to retrieve the file; fail. + return self._failed() hunk ./src/allmydata/mutable/retrieve.py 316 - # ask the cache first - got_from_cache = False - datavs = [] - for (offset, length) in readv: - (data, timestamp) = self._node._read_from_cache(self.verinfo, shnum, - offset, length) - if data is not None: - datavs.append(data) - if len(datavs) == len(readv): - self.log("got data from cache") - got_from_cache = True - d = fireEventually({shnum: datavs}) - # datavs is a dict mapping shnum to a pair of strings - else: - d = self._do_read(ss, peerid, self._storage_index, [shnum], readv) - self.remaining_sharemap.discard(shnum, peerid) + for shnum in active_shnums: + self._active_readers.append(self.readers[shnum]) + self.log("added reader for share %d" % shnum) + assert len(self._active_readers) == self._required_shares + # Conceptually, this is part of the _add_active_peers step. It + # validates the prefixes of newly added readers to make sure + # that they match what we are expecting for self.verinfo. If + # validation is successful, _validate_active_prefixes will call + # _download_current_segment for us. If validation is + # unsuccessful, then _validate_prefixes will remove the peer and + # call _add_active_peers again, where we will attempt to rectify + # the problem by choosing another peer. + return self._validate_active_prefixes() hunk ./src/allmydata/mutable/retrieve.py 330 - d.addCallback(self._got_results, m, peerid, started, got_from_cache) - d.addErrback(self._query_failed, m, peerid) - # errors that aren't handled by _query_failed (and errors caused by - # _query_failed) get logged, but we still want to check for doneness. - def _oops(f): - self.log(format="problem in _query_failed for sh#%(shnum)d to %(peerid)s", - shnum=shnum, - peerid=idlib.shortnodeid_b2a(peerid), - failure=f, - level=log.WEIRD, umid="W0xnQA") - d.addErrback(_oops) - d.addBoth(self._check_for_done) - # any error during _check_for_done means the download fails. If the - # download is successful, _check_for_done will fire _done by itself. - d.addErrback(self._done) - d.addErrback(log.err) - return d # purely for testing convenience hunk ./src/allmydata/mutable/retrieve.py 331 - def _do_read(self, ss, peerid, storage_index, shnums, readv): - # isolate the callRemote to a separate method, so tests can subclass - # Publish and override it - d = ss.callRemote("slot_readv", storage_index, shnums, readv) - return d + def _validate_active_prefixes(self): + """ + I check to make sure that the prefixes on the peers that I am + currently reading from match the prefix that we want to see, as + said in self.verinfo. + + If I find that all of the active peers have acceptable prefixes, + I pass control to _download_current_segment, which will use + those peers to do cool things. If I find that some of the active + peers have unacceptable prefixes, I will remove them from active + peers (and from further consideration) and call + _add_active_peers to attempt to rectify the situation. I keep + track of which peers I have already validated so that I don't + need to do so again. + """ + assert self._active_readers, "No more active readers" hunk ./src/allmydata/mutable/retrieve.py 348 - def remove_peer(self, peerid): + ds = [] + new_readers = set(self._active_readers) - self._validated_readers + self.log('validating %d newly-added active readers' % len(new_readers)) + + for reader in new_readers: + # We force a remote read here -- otherwise, we are relying + # on cached data that we already verified as valid, and we + # won't detect an uncoordinated write that has occurred + # since the last servermap update. + d = reader.get_prefix(force_remote=True) + d.addCallback(self._try_to_validate_prefix, reader) + ds.append(d) + dl = defer.DeferredList(ds, consumeErrors=True) + def _check_results(results): + # Each result in results will be of the form (success, msg). + # We don't care about msg, but success will tell us whether + # or not the checkstring validated. If it didn't, we need to + # remove the offending (peer,share) from our active readers, + # and ensure that active readers is again populated. + bad_readers = [] + for i, result in enumerate(results): + if not result[0]: + reader = self._active_readers[i] + f = result[1] + assert isinstance(f, failure.Failure) + + self.log("The reader %s failed to " + "properly validate: %s" % \ + (reader, str(f.value))) + bad_readers.append((reader, f)) + else: + reader = self._active_readers[i] + self.log("the reader %s checks out, so we'll use it" % \ + reader) + self._validated_readers.add(reader) + # Each time we validate a reader, we check to see if + # we need the private key. If we do, we politely ask + # for it and then continue computing. If we find + # that we haven't gotten it at the end of + # segment decoding, then we'll take more drastic + # measures. + if self._need_privkey: + d = reader.get_encprivkey() + d.addCallback(self._try_to_validate_privkey, reader) + if bad_readers: + # We do them all at once, or else we screw up list indexing. + for (reader, f) in bad_readers: + self._mark_bad_share(reader, f) + return self._add_active_peers() + else: + return self._download_current_segment() + # The next step will assert that it has enough active + # readers to fetch shares; we just need to remove it. + dl.addCallback(_check_results) + return dl + + + def _try_to_validate_prefix(self, prefix, reader): + """ + I check that the prefix returned by a candidate server for + retrieval matches the prefix that the servermap knows about + (and, hence, the prefix that was validated earlier). If it does, + I return True, which means that I approve of the use of the + candidate server for segment retrieval. If it doesn't, I return + False, which means that another server must be chosen. + """ + (seqnum, + root_hash, + IV, + segsize, + datalength, + k, + N, + known_prefix, + offsets_tuple) = self.verinfo + if known_prefix != prefix: + self.log("prefix from share %d doesn't match" % reader.shnum) + raise UncoordinatedWriteError("Mismatched prefix -- this could " + "indicate an uncoordinated write") + # Otherwise, we're okay -- no issues. + + + def _remove_reader(self, reader): + """ + At various points, we will wish to remove a peer from + consideration and/or use. These include, but are not necessarily + limited to: + + - A connection error. + - A mismatched prefix (that is, a prefix that does not match + our conception of the version information string). + - A failing block hash, salt hash, or share hash, which can + indicate disk failure/bit flips, or network trouble. + + This method will do that. I will make sure that the + (shnum,reader) combination represented by my reader argument is + not used for anything else during this download. I will not + advise the reader of any corruption, something that my callers + may wish to do on their own. + """ + # TODO: When you're done writing this, see if this is ever + # actually used for something that _mark_bad_share isn't. I have + # a feeling that they will be used for very similar things, and + # that having them both here is just going to be an epic amount + # of code duplication. + # + # (well, okay, not epic, but meaningful) + self.log("removing reader %s" % reader) + # Remove the reader from _active_readers + self._active_readers.remove(reader) + # TODO: self.readers.remove(reader)? for shnum in list(self.remaining_sharemap.keys()): hunk ./src/allmydata/mutable/retrieve.py 460 - self.remaining_sharemap.discard(shnum, peerid) + # TODO: Make sure that we set reader.peerid somewhere. + self.remaining_sharemap.discard(shnum, reader.peerid) hunk ./src/allmydata/mutable/retrieve.py 463 - def _got_results(self, datavs, marker, peerid, started, got_from_cache): - now = time.time() - elapsed = now - started - if not got_from_cache: - self._status.add_fetch_timing(peerid, elapsed) - self.log(format="got results (%(shares)d shares) from [%(peerid)s]", - shares=len(datavs), - peerid=idlib.shortnodeid_b2a(peerid), - level=log.NOISY) - self._outstanding_queries.pop(marker, None) - if not self._running: - return hunk ./src/allmydata/mutable/retrieve.py 464 - # note that we only ask for a single share per query, so we only - # expect a single share back. On the other hand, we use the extra - # shares if we get them.. seems better than an assert(). + def _mark_bad_share(self, reader, f): + """ + I mark the (peerid, shnum) encapsulated by my reader argument as + a bad share, which means that it will not be used anywhere else. + + There are several reasons to want to mark something as a bad + share. These include: + + - A connection error to the peer. + - A mismatched prefix (that is, a prefix that does not match + our local conception of the version information string). + - A failing block hash, salt hash, share hash, or other + integrity check. hunk ./src/allmydata/mutable/retrieve.py 478 - for shnum,datav in datavs.items(): - (prefix, hash_and_data) = datav[:2] + This method will ensure that readers that we wish to mark bad + (for these reasons or other reasons) are not used for the rest + of the download. Additionally, it will attempt to tell the + remote peer (with no guarantee of success) that its share is + corrupt. + """ + self.log("marking share %d on server %s as bad" % \ + (reader.shnum, reader)) + self._remove_reader(reader) + self._bad_shares.add((reader.peerid, reader.shnum)) + self._status.problems[reader.peerid] = f + self._last_failure = f + self.notify_server_corruption(reader.peerid, reader.shnum, f.value) + + + def _download_current_segment(self): + """ + I download, validate, decode, decrypt, and assemble the segment + that this Retrieve is currently responsible for downloading. + """ + assert len(self._active_readers) >= self._required_shares + if self._current_segment < self._num_segments: + d = self._process_segment(self._current_segment) + else: + d = defer.succeed(None) + d.addCallback(self._check_for_done) + return d + + + def _process_segment(self, segnum): + """ + I download, validate, decode, and decrypt one segment of the + file that this Retrieve is retrieving. This means coordinating + the process of getting k blocks of that file, validating them, + assembling them into one segment with the decoder, and then + decrypting them. + """ + self.log("processing segment %d" % segnum) + + # TODO: The old code uses a marker. Should this code do that + # too? What did the Marker do? + assert len(self._active_readers) >= self._required_shares + + # We need to ask each of our active readers for its block and + # salt. We will then validate those. If validation is + # successful, we will assemble the results into plaintext. + ds = [] + for reader in self._active_readers: + d = reader.get_block_and_salt(segnum) + d.addCallback(self._validate_block, segnum, reader) + d.addErrback(self._validation_failed, reader) + ds.append(d) + dl = defer.DeferredList(ds) + dl.addCallback(self._maybe_decode_and_decrypt_segment, segnum) + return dl + + + def _maybe_decode_and_decrypt_segment(self, blocks_and_salts, segnum): + """ + I take the results of fetching and validating the blocks from a + callback chain in another method. If the results are such that + they tell me that validation and fetching succeeded without + incident, I will proceed with decoding and decryption. + Otherwise, I will do nothing. + """ + self.log("trying to decode and decrypt segment %d" % segnum) + failures = False + for block_and_salt in blocks_and_salts: + if not block_and_salt[0] or block_and_salt[1] == None: + self.log("some validation operations failed; not proceeding") + failures = True + break + if not failures: + self.log("everything looks ok, building segment %d" % segnum) + d = self._decode_blocks(blocks_and_salts, segnum) + d.addCallback(self._decrypt_segment) + d.addErrback(self._decoding_or_decrypting_failed) + d.addCallback(self._set_segment) + return d + else: + return defer.succeed(None) + + + def _set_segment(self, segment): + """ + Given a plaintext segment, I register that segment with the + target that is handling the file download. + """ + self.log("got plaintext for segment %d" % self._current_segment) + self._plaintext += segment + self._current_segment += 1 + + + def _validation_failed(self, f, reader): + """ + I am called when a block or a salt fails to correctly validate. + I react to this failure by notifying the remote server of + corruption, and then removing the remote peer from further + activity. + """ + self.log("validation failed on share %d, peer %s, segment %d: %s" % \ + (reader.shnum, reader, self._current_segment, str(f))) + self._mark_bad_share(reader, f) + return + + + def _decoding_or_decrypting_failed(self, f): + """ + I am called when a list of blocks fails to decode into a segment + of crypttext, or fails to decrypt (for whatever reason) into a + segment of plaintext. I exist to make a log message about this + failure: my other job is to mark a share as corrupt, which is + not hard. + """ + # XXX: Is this correct? When we're dealing with validation + # failures, it's easy to say that one share or one server was + # responsible for the failure. Is it so easy when decoding or + # decrypting fails? Maybe we should just log here, and try + # again? Of course, that could lead to infinite loops if + # something *is* wrong, because the state machine will just keep + # trying to download the broken segment over and over and + # over... + self.log("decoding or decrypting failed on segment %d: %s" % \ + (self._current_segment, str(f.value))) + for reader in self._active_readers: + self._mark_bad_share(reader, f) + + assert len(self._active_readers) == 0 + return + + + def _validate_block(self, (block, salt), segnum, reader): + """ + I validate a block from one share on a remote server. + """ + # Grab the part of the block hash tree that is necessary to + # validate this block, then generate the block hash root. + d = self._get_needed_hashes(reader, segnum) + def _handle_validation(block_and_sharehashes): + self.log("validating share %d for segment %d" % (reader.shnum, + segnum)) + blockhashes, sharehashes = block_and_sharehashes + blockhashes = dict(enumerate(blockhashes[1])) + bht = self._block_hash_trees[reader.shnum] + # If we needed sharehashes in the last step, we'll want to + # get those dealt with before we start processing the + # blockhashes. + if self.share_hash_tree.needed_hashes(reader.shnum): + try: + self.share_hash_tree.set_hashes(hashes=sharehashes[1]) + except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \ + IndexError), e: + # XXX: This is a stupid message -- make it more + # informative. + raise CorruptShareError(reader.peerid, + reader.shnum, + "corrupt hashes: %s" % e) + + if not bht[0]: + share_hash = self.share_hash_tree.get_leaf(reader.shnum) + if not share_hash: + raise CorruptShareError(reader.peerid, + reader.shnum, + "missing the root hash") + bht.set_hashes({0: share_hash}) + + if bht.needed_hashes(segnum, include_leaf=True): + try: + bht.set_hashes(blockhashes) + except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \ + IndexError), e: + raise CorruptShareError(reader.peerid, + reader.shnum, + "block hash tree failure: %s" % e) + + blockhash = hashutil.block_hash(block) + self.log("got blockhash %s" % [blockhash]) + self.log("comparing to tree %s" % bht) + # If this works without an error, then validation is + # successful. try: hunk ./src/allmydata/mutable/retrieve.py 659 - self._got_results_one_share(shnum, peerid, - prefix, hash_and_data) - except CorruptShareError, e: - # log it and give the other shares a chance to be processed - f = failure.Failure() - self.log(format="bad share: %(f_value)s", - f_value=str(f.value), failure=f, - level=log.WEIRD, umid="7fzWZw") - self.notify_server_corruption(peerid, shnum, str(e)) - self.remove_peer(peerid) - self.servermap.mark_bad_share(peerid, shnum, prefix) - self._bad_shares.add( (peerid, shnum) ) - self._status.problems[peerid] = f - self._last_failure = f - pass - if self._need_privkey and len(datav) > 2: - lp = None - self._try_to_validate_privkey(datav[2], peerid, shnum, lp) - # all done! + bht.set_hashes(leaves={segnum: blockhash}) + except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \ + IndexError), e: + raise CorruptShareError(reader.peerid, + reader.shnum, + "block hash tree failure: %s" % e) + + # TODO: Validate the salt, too. + self.log('share %d is valid for segment %d' % (reader.shnum, + segnum)) + return {reader.shnum: (block, salt)} + d.addCallback(_handle_validation) + return d + + + def _get_needed_hashes(self, reader, segnum): + """ + I get the hashes needed to validate segnum from the reader, then return + to my caller when this is done. + """ + bht = self._block_hash_trees[reader.shnum] + needed = bht.needed_hashes(segnum, include_leaf=True) + # The root of the block hash tree is also a leaf in the share + # hash tree. So we don't need to fetch it from the remote + # server. In the case of files with one segment, this means that + # we won't fetch any block hash tree from the remote server, + # since the hash of each share of the file is the entire block + # hash tree, and is a leaf in the share hash tree. This is fine, + # since any share corruption will be detected in the share hash + # tree. + needed.discard(0) + # XXX: not now, causes test failures. + self.log("getting blockhashes for segment %d, share %d: %s" % \ + (segnum, reader.shnum, str(needed))) + d1 = reader.get_blockhashes(needed) + if self.share_hash_tree.needed_hashes(reader.shnum): + need = self.share_hash_tree.needed_hashes(reader.shnum) + self.log("also need sharehashes for share %d: %s" % (reader.shnum, + str(need))) + d2 = reader.get_sharehashes(need) + else: + d2 = defer.succeed(None) + dl = defer.DeferredList([d1, d2]) + return dl + + + def _decode_blocks(self, blocks_and_salts, segnum): + """ + I take a list of k blocks and salts, and decode that into a + single encrypted segment. + """ + d = {} + # We want to merge our dictionaries to the form + # {shnum: blocks_and_salts} + # + # The dictionaries come from validate block that way, so we just + # need to merge them. + for block_and_salt in blocks_and_salts: + d.update(block_and_salt[1]) + + # All of these blocks should have the same salt; in SDMF, it is + # the file-wide IV, while in MDMF it is the per-segment salt. In + # either case, we just need to get one of them and use it. + # + # d.items()[0] is like (shnum, (block, salt)) + # d.items()[0][1] is like (block, salt) + # d.items()[0][1][1] is the salt. + salt = d.items()[0][1][1] + # Next, extract just the blocks from the dict. We'll use the + # salt in the next step. + share_and_shareids = [(k, v[0]) for k, v in d.items()] + d2 = dict(share_and_shareids) + shareids = [] + shares = [] + for shareid, share in d2.items(): + shareids.append(shareid) + shares.append(share) + + assert len(shareids) >= self._required_shares, len(shareids) + # zfec really doesn't want extra shares + shareids = shareids[:self._required_shares] + shares = shares[:self._required_shares] + self.log("decoding segment %d" % segnum) + if segnum == self._num_segments - 1: + d = defer.maybeDeferred(self._tail_decoder.decode, shares, shareids) + else: + d = defer.maybeDeferred(self._segment_decoder.decode, shares, shareids) + def _process(buffers): + segment = "".join(buffers) + self.log(format="now decoding segment %(segnum)s of %(numsegs)s", + segnum=segnum, + numsegs=self._num_segments, + level=log.NOISY) + self.log(" joined length %d, datalength %d" % + (len(segment), self._data_length)) + if segnum == self._num_segments - 1: + size_to_use = self._tail_data_size + else: + size_to_use = self._segment_size + segment = segment[:size_to_use] + self.log(" segment len=%d" % len(segment)) + return segment, salt + d.addCallback(_process) + return d + + + def _decrypt_segment(self, segment_and_salt): + """ + I take a single segment and its salt, and decrypt it. I return + the plaintext of the segment that is in my argument. + """ + segment, salt = segment_and_salt + self._status.set_status("decrypting") + self.log("decrypting segment %d" % self._current_segment) + started = time.time() + key = hashutil.ssk_readkey_data_hash(salt, self._node.get_readkey()) + decryptor = AES(key) + plaintext = decryptor.process(segment) + self._status.timings["decrypt"] = time.time() - started + return plaintext + def notify_server_corruption(self, peerid, shnum, reason): ss = self.servermap.connections[peerid] hunk ./src/allmydata/mutable/retrieve.py 786 ss.callRemoteOnly("advise_corrupt_share", "mutable", self._storage_index, shnum, reason) - def _got_results_one_share(self, shnum, peerid, - got_prefix, got_hash_and_data): - self.log("_got_results: got shnum #%d from peerid %s" - % (shnum, idlib.shortnodeid_b2a(peerid))) - (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, - offsets_tuple) = self.verinfo - assert len(got_prefix) == len(prefix), (len(got_prefix), len(prefix)) - if got_prefix != prefix: - msg = "someone wrote to the data since we read the servermap: prefix changed" - raise UncoordinatedWriteError(msg) - (share_hash_chain, block_hash_tree, - share_data) = unpack_share_data(self.verinfo, got_hash_and_data) - - assert isinstance(share_data, str) - # build the block hash tree. SDMF has only one leaf. - leaves = [hashutil.block_hash(share_data)] - t = hashtree.HashTree(leaves) - if list(t) != block_hash_tree: - raise CorruptShareError(peerid, shnum, "block hash tree failure") - share_hash_leaf = t[0] - t2 = hashtree.IncompleteHashTree(N) - # root_hash was checked by the signature - t2.set_hashes({0: root_hash}) - try: - t2.set_hashes(hashes=share_hash_chain, - leaves={shnum: share_hash_leaf}) - except (hashtree.BadHashError, hashtree.NotEnoughHashesError, - IndexError), e: - msg = "corrupt hashes: %s" % (e,) - raise CorruptShareError(peerid, shnum, msg) - self.log(" data valid! len=%d" % len(share_data)) - # each query comes down to this: placing validated share data into - # self.shares - self.shares[shnum] = share_data hunk ./src/allmydata/mutable/retrieve.py 787 - def _try_to_validate_privkey(self, enc_privkey, peerid, shnum, lp): + def _try_to_validate_privkey(self, enc_privkey, reader): alleged_privkey_s = self._node._decrypt_privkey(enc_privkey) alleged_writekey = hashutil.ssk_writekey_hash(alleged_privkey_s) hunk ./src/allmydata/mutable/retrieve.py 793 if alleged_writekey != self._node.get_writekey(): self.log("invalid privkey from %s shnum %d" % - (idlib.nodeid_b2a(peerid)[:8], shnum), - parent=lp, level=log.WEIRD, umid="YIw4tA") + (reader, reader.shnum), + level=log.WEIRD, umid="YIw4tA") return # it's good hunk ./src/allmydata/mutable/retrieve.py 798 - self.log("got valid privkey from shnum %d on peerid %s" % - (shnum, idlib.shortnodeid_b2a(peerid)), - parent=lp) + self.log("got valid privkey from shnum %d on reader %s" % + (reader.shnum, reader)) privkey = rsa.create_signing_key_from_string(alleged_privkey_s) self._node._populate_encprivkey(enc_privkey) self._node._populate_privkey(privkey) hunk ./src/allmydata/mutable/retrieve.py 805 self._need_privkey = False + def _query_failed(self, f, marker, peerid): self.log(format="query to [%(peerid)s] failed", peerid=idlib.shortnodeid_b2a(peerid), hunk ./src/allmydata/mutable/retrieve.py 822 self.log(format="error during query: %(f_value)s", f_value=str(f.value), failure=f, level=level, umid="gOJB5g") - def _check_for_done(self, res): - # exit paths: - # return : keep waiting, no new queries - # return self._send_more_queries(outstanding) : send some more queries - # fire self._done(plaintext) : download successful - # raise exception : download fails - - self.log(format="_check_for_done: running=%(running)s, decoding=%(decoding)s", - running=self._running, decoding=self._decoding, - level=log.NOISY) - if not self._running: - return - if self._decoding: - return - (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, - offsets_tuple) = self.verinfo - - if len(self.shares) < k: - # we don't have enough shares yet - return self._maybe_send_more_queries(k) - if self._need_privkey: - # we got k shares, but none of them had a valid privkey. TODO: - # look further. Adding code to do this is a bit complicated, and - # I want to avoid that complication, and this should be pretty - # rare (k shares with bitflips in the enc_privkey but not in the - # data blocks). If we actually do get here, the subsequent repair - # will fail for lack of a privkey. - self.log("got k shares but still need_privkey, bummer", - level=log.WEIRD, umid="MdRHPA") - - # we have enough to finish. All the shares have had their hashes - # checked, so if something fails at this point, we don't know how - # to fix it, so the download will fail. hunk ./src/allmydata/mutable/retrieve.py 823 - self._decoding = True # avoid reentrancy - self._status.set_status("decoding") - now = time.time() - elapsed = now - self._started - self._status.timings["fetch"] = elapsed - - d = defer.maybeDeferred(self._decode) - d.addCallback(self._decrypt, IV, self._node.get_readkey()) - d.addBoth(self._done) - return d # purely for test convenience - - def _maybe_send_more_queries(self, k): - # we don't have enough shares yet. Should we send out more queries? - # There are some number of queries outstanding, each for a single - # share. If we can generate 'needed_shares' additional queries, we do - # so. If we can't, then we know this file is a goner, and we raise - # NotEnoughSharesError. - self.log(format=("_maybe_send_more_queries, have=%(have)d, k=%(k)d, " - "outstanding=%(outstanding)d"), - have=len(self.shares), k=k, - outstanding=len(self._outstanding_queries), - level=log.NOISY) - - remaining_shares = k - len(self.shares) - needed = remaining_shares - len(self._outstanding_queries) - if not needed: - # we have enough queries in flight already - - # TODO: but if they've been in flight for a long time, and we - # have reason to believe that new queries might respond faster - # (i.e. we've seen other queries come back faster, then consider - # sending out new queries. This could help with peers which have - # silently gone away since the servermap was updated, for which - # we're still waiting for the 15-minute TCP disconnect to happen. - self.log("enough queries are in flight, no more are needed", - level=log.NOISY) - return - - outstanding_shnums = set([shnum - for (peerid, shnum, started) - in self._outstanding_queries.values()]) - # prefer low-numbered shares, they are more likely to be primary - available_shnums = sorted(self.remaining_sharemap.keys()) - for shnum in available_shnums: - if shnum in outstanding_shnums: - # skip ones that are already in transit - continue - if shnum not in self.remaining_sharemap: - # no servers for that shnum. note that DictOfSets removes - # empty sets from the dict for us. - continue - peerid = list(self.remaining_sharemap[shnum])[0] - # get_data will remove that peerid from the sharemap, and add the - # query to self._outstanding_queries - self._status.set_status("Retrieving More Shares") - self.get_data(shnum, peerid) - needed -= 1 - if not needed: - break - - # at this point, we have as many outstanding queries as we can. If - # needed!=0 then we might not have enough to recover the file. - if needed: - format = ("ran out of peers: " - "have %(have)d shares (k=%(k)d), " - "%(outstanding)d queries in flight, " - "need %(need)d more, " - "found %(bad)d bad shares") - args = {"have": len(self.shares), - "k": k, - "outstanding": len(self._outstanding_queries), - "need": needed, - "bad": len(self._bad_shares), - } - self.log(format=format, - level=log.WEIRD, umid="ezTfjw", **args) - err = NotEnoughSharesError("%s, last failure: %s" % - (format % args, self._last_failure)) - if self._bad_shares: - self.log("We found some bad shares this pass. You should " - "update the servermap and try again to check " - "more peers", - level=log.WEIRD, umid="EFkOlA") - err.servermap = self.servermap - raise err - - return - - def _decode(self): - started = time.time() - (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, - offsets_tuple) = self.verinfo + def _check_for_done(self, res): + """ + I check to see if this Retrieve object has successfully finished + its work. hunk ./src/allmydata/mutable/retrieve.py 828 - # shares_dict is a dict mapping shnum to share data, but the codec - # wants two lists. - shareids = []; shares = [] - for shareid, share in self.shares.items(): - shareids.append(shareid) - shares.append(share) + I can exit in the following ways: + - If there are no more segments to download, then I exit by + causing self._done_deferred to fire with the plaintext + content requested by the caller. + - If there are still segments to be downloaded, and there + are enough active readers (readers which have not broken + and have not given us corrupt data) to continue + downloading, I send control back to + _download_current_segment. + - If there are still segments to be downloaded but there are + not enough active peers to download them, I ask + _add_active_peers to add more peers. If it is successful, + it will call _download_current_segment. If there are not + enough peers to retrieve the file, then that will cause + _done_deferred to errback. + """ + self.log("checking for doneness") + if self._current_segment == self._num_segments: + # No more segments to download, we're done. + self.log("got plaintext, done") + return self._done() hunk ./src/allmydata/mutable/retrieve.py 850 - assert len(shareids) >= k, len(shareids) - # zfec really doesn't want extra shares - shareids = shareids[:k] - shares = shares[:k] + if len(self._active_readers) >= self._required_shares: + # More segments to download, but we have enough good peers + # in self._active_readers that we can do that without issue, + # so go nab the next segment. + self.log("not done yet: on segment %d of %d" % \ + (self._current_segment + 1, self._num_segments)) + return self._download_current_segment() hunk ./src/allmydata/mutable/retrieve.py 858 - fec = codec.CRSDecoder() - fec.set_params(segsize, k, N) + self.log("not done yet: on segment %d of %d, need to add peers" % \ + (self._current_segment + 1, self._num_segments)) + return self._add_active_peers() hunk ./src/allmydata/mutable/retrieve.py 862 - self.log("params %s, we have %d shares" % ((segsize, k, N), len(shares))) - self.log("about to decode, shareids=%s" % (shareids,)) - d = defer.maybeDeferred(fec.decode, shares, shareids) - def _done(buffers): - self._status.timings["decode"] = time.time() - started - self.log(" decode done, %d buffers" % len(buffers)) - segment = "".join(buffers) - self.log(" joined length %d, datalength %d" % - (len(segment), datalength)) - segment = segment[:datalength] - self.log(" segment len=%d" % len(segment)) - return segment - def _err(f): - self.log(" decode failed: %s" % f) - return f - d.addCallback(_done) - d.addErrback(_err) - return d hunk ./src/allmydata/mutable/retrieve.py 863 - def _decrypt(self, crypttext, IV, readkey): - self._status.set_status("decrypting") - started = time.time() - key = hashutil.ssk_readkey_data_hash(IV, readkey) - decryptor = AES(key) - plaintext = decryptor.process(crypttext) - self._status.timings["decrypt"] = time.time() - started - return plaintext + def _done(self): + """ + I am called by _check_for_done when the download process has + finished successfully. After making some useful logging + statements, I return the decrypted contents to the owner of this + Retrieve object through self._done_deferred. + """ + eventually(self._done_deferred.callback, self._plaintext) hunk ./src/allmydata/mutable/retrieve.py 872 - def _done(self, res): - if not self._running: - return - self._running = False - self._status.set_active(False) - self._status.timings["total"] = time.time() - self._started - # res is either the new contents, or a Failure - if isinstance(res, failure.Failure): - self.log("Retrieve done, with failure", failure=res, - level=log.UNUSUAL) - self._status.set_status("Failed") - else: - self.log("Retrieve done, success!") - self._status.set_status("Finished") - self._status.set_progress(1.0) - # remember the encoding parameters, use them again next time - (seqnum, root_hash, IV, segsize, datalength, k, N, prefix, - offsets_tuple) = self.verinfo - self._node._populate_required_shares(k) - self._node._populate_total_shares(N) - eventually(self._done_deferred.callback, res) hunk ./src/allmydata/mutable/retrieve.py 873 + def _failed(self): + """ + I am called by _add_active_peers when there are not enough + active peers left to complete the download. After making some + useful logging statements, I return an exception to that effect + to the caller of this Retrieve object through + self._done_deferred. + """ + format = ("ran out of peers: " + "have %(have)d of %(total)d segments " + "found %(bad)d bad shares " + "encoding %(k)d-of-%(n)d") + args = {"have": self._current_segment, + "total": self._num_segments, + "k": self._required_shares, + "n": self._total_shares, + "bad": len(self._bad_shares)} + e = NotEnoughSharesError("%s, last failure: %s" % (format % args, + str(self._last_failure))) + f = failure.Failure(e) + eventually(self._done_deferred.callback, f) hunk ./src/allmydata/test/test_mutable.py 309 d.addCallback(_created) return d + def test_create_with_initial_contents_function(self): data = "initial contents" def _make_contents(n): hunk ./src/allmydata/test/test_mutable.py 594 self.failUnless(p._pubkey.verify(sig_material, signature)) #self.failUnlessEqual(signature, p._privkey.sign(sig_material)) self.failUnless(isinstance(share_hash_chain, dict)) - self.failUnlessEqual(len(share_hash_chain), 4) # ln2(10)++ + # TODO: Revisit this to make sure that the additional + # share hashes are really necessary. + # + # (just because they magically make the tests pass does + # not mean that they are necessary) + # ln2(10)++ + 1 for leaves. + self.failUnlessEqual(len(share_hash_chain), 5) for shnum,share_hash in share_hash_chain.items(): self.failUnless(isinstance(shnum, int)) self.failUnless(isinstance(share_hash, str)) hunk ./src/allmydata/test/test_mutable.py 915 def _check_servermap(sm): self.failUnlessEqual(len(sm.recoverable_versions()), 1) d.addCallback(_check_servermap) - # Now, we upload more versions - d.addCallback(lambda ignored: - self.publish_multiple(version=1)) - d.addCallback(lambda ignored: - self.make_servermap(mode=MODE_CHECK)) - def _check_servermap_multiple(sm): - v = sm.recoverable_versions() - i = sm.unrecoverable_versions() - d.addCallback(_check_servermap_multiple) return d hunk ./src/allmydata/test/test_mutable.py 916 - test_servermapupdater_finds_mdmf_files.todo = ("I don't know how to " - "write this yet") def test_servermapupdater_finds_sdmf_files(self): hunk ./src/allmydata/test/test_mutable.py 1163 def _check(res): f = res[0] self.failUnless(f.check(NotEnoughSharesError)) - self.failUnless("someone wrote to the data since we read the servermap" in str(f)) + self.failUnless("uncoordinated write" in str(f)) return self._test_corrupt_all(1, "ran out of peers", corrupt_early=False, failure_checker=_check) hunk ./src/allmydata/test/test_mutable.py 1937 d.addCallback(lambda res: self.shouldFail(NotEnoughSharesError, "test_retrieve_surprise", - "ran out of peers: have 0 shares (k=3)", + "ran out of peers: have 0 of 1", n.download_version, self.old_map, self.old_map.best_recoverable_version(), hunk ./src/allmydata/test/test_mutable.py 1946 d.addCallback(_created) return d + def test_unexpected_shares(self): # upload the file, take a servermap, shut down one of the servers, # upload it again (causing shares to appear on a new server), then } [Tell NodeMaker and MutableFileNode about the distinction between SDMF and MDMF Kevan Carstensen **20100623001708 Ignore-this: 92c723fd536264be2eef9e2a919d334f ] { hunk ./src/allmydata/mutable/filenode.py 8 from twisted.internet import defer, reactor from foolscap.api import eventually from allmydata.interfaces import IMutableFileNode, \ - ICheckable, ICheckResults, NotEnoughSharesError + ICheckable, ICheckResults, NotEnoughSharesError, MDMF_VERSION, SDMF_VERSION from allmydata.util import hashutil, log from allmydata.util.assertutil import precondition from allmydata.uri import WriteableSSKFileURI, ReadonlySSKFileURI hunk ./src/allmydata/mutable/filenode.py 67 self._sharemap = {} # known shares, shnum-to-[nodeids] self._cache = ResponseCache() self._most_recent_size = None + # filled in after __init__ if we're being created for the first time; + # filled in by the servermap updater before publishing, otherwise. + # set to this default value in case neither of those things happen, + # or in case the servermap can't find any shares to tell us what + # to publish as. + # TODO: Set this back to None, and find out why the tests fail + # with it set to None. + self._protocol_version = SDMF_VERSION # all users of this MutableFileNode go through the serializer. This # takes advantage of the fact that Deferreds discard the callbacks hunk ./src/allmydata/mutable/filenode.py 472 def _did_upload(self, res, size): self._most_recent_size = size return res + + + def set_version(self, version): + # I can be set in two ways: + # 1. When the node is created. + # 2. (for an existing share) when the Servermap is updated + # before I am read. + assert version in (MDMF_VERSION, SDMF_VERSION) + self._protocol_version = version + + + def get_version(self): + return self._protocol_version hunk ./src/allmydata/nodemaker.py 3 import weakref from zope.interface import implements -from allmydata.interfaces import INodeMaker +from allmydata.util.assertutil import precondition +from allmydata.interfaces import INodeMaker, MustBeDeepImmutableError, \ + SDMF_VERSION, MDMF_VERSION from allmydata.immutable.filenode import ImmutableFileNode, LiteralFileNode from allmydata.immutable.upload import Data from allmydata.mutable.filenode import MutableFileNode hunk ./src/allmydata/nodemaker.py 92 return self._create_dirnode(filenode) return None - def create_mutable_file(self, contents=None, keysize=None): + def create_mutable_file(self, contents=None, keysize=None, + version=SDMF_VERSION): n = MutableFileNode(self.storage_broker, self.secret_holder, self.default_encoding_parameters, self.history) hunk ./src/allmydata/nodemaker.py 96 + n.set_version(version) d = self.key_generator.generate(keysize) d.addCallback(n.create_with_keys, contents) d.addCallback(lambda res: n) hunk ./src/allmydata/nodemaker.py 102 return d - def create_new_mutable_directory(self, initial_children={}): + def create_new_mutable_directory(self, initial_children={}, + version=SDMF_VERSION): + # initial_children must have metadata (i.e. {} instead of None) + for (name, (node, metadata)) in initial_children.iteritems(): + precondition(isinstance(metadata, dict), + "create_new_mutable_directory requires metadata to be a dict, not None", metadata) + node.raise_error() d = self.create_mutable_file(lambda n: hunk ./src/allmydata/nodemaker.py 110 - pack_children(n, initial_children)) + pack_children(n, initial_children), + version) d.addCallback(self._create_dirnode) return d } [Assorted servermap fixes Kevan Carstensen **20100623001732 Ignore-this: d54c4b5de327960ea4ffe096664b5a65 - Check for failure when setting the private key - Check for failure when setting other things - Check for doneness in a way that is resilient to hung servers - Remove dead code - Reorganize error and success handling methods, and make sure they get used. ] { hunk ./src/allmydata/mutable/servermap.py 485 # set as we get responses. self._must_query = must_query - # This tells the done check whether requests are still being - # processed. We should wait before returning until at least - # updated correctly (and dealing with connection errors. - self._processing = 0 - # now initial_peers_to_query contains the peers that we should ask, # self.must_query contains the peers that we must have heard from # before we can consider ourselves finished, and self.extra_peers hunk ./src/allmydata/mutable/servermap.py 550 # _query_failed) get logged, but we still want to check for doneness. d.addErrback(log.err) d.addErrback(self._fatal_error) + d.addCallback(self._check_for_done) return d def _do_read(self, ss, peerid, storage_index, shnums, readv): hunk ./src/allmydata/mutable/servermap.py 569 d = ss.callRemote("slot_readv", storage_index, shnums, readv) return d + + def _got_corrupt_share(self, e, shnum, peerid, data, lp): + """ + I am called when a remote server returns a corrupt share in + response to one of our queries. By corrupt, I mean a share + without a valid signature. I then record the failure, notify the + server of the corruption, and record the share as bad. + """ + f = failure.Failure(e) + self.log(format="bad share: %(f_value)s", f_value=str(f.value), + failure=f, parent=lp, level=log.WEIRD, umid="h5llHg") + # Notify the server that its share is corrupt. + self.notify_server_corruption(peerid, shnum, str(e)) + # By flagging this as a bad peer, we won't count any of + # the other shares on that peer as valid, though if we + # happen to find a valid version string amongst those + # shares, we'll keep track of it so that we don't need + # to validate the signature on those again. + self._bad_peers.add(peerid) + self._last_failure = f + # XXX: Use the reader for this? + checkstring = data[:SIGNED_PREFIX_LENGTH] + self._servermap.mark_bad_share(peerid, shnum, checkstring) + self._servermap.problems.append(f) + + + def _cache_good_sharedata(self, verinfo, shnum, now, data): + """ + If one of my queries returns successfully (which means that we + were able to and successfully did validate the signature), I + cache the data that we initially fetched from the storage + server. This will help reduce the number of roundtrips that need + to occur when the file is downloaded, or when the file is + updated. + """ + self._node._add_to_cache(verinfo, shnum, 0, data, now) + + def _got_results(self, datavs, peerid, readsize, stuff, started): lp = self.log(format="got result from [%(peerid)s], %(numshares)d shares", peerid=idlib.shortnodeid_b2a(peerid), hunk ./src/allmydata/mutable/servermap.py 618 self._servermap.reachable_peers.add(peerid) self._must_query.discard(peerid) self._queries_completed += 1 - # self._processing counts the number of queries that have - # completed, but are still processing. We wait until all queries - # are done processing before returning a result to the client. - # TODO: Should we do this? A response to the initial query means - # that we may not have to query the server for anything else, - # but if we're dealing with an MDMF share, we'll probably have - # to ask it for its signature, unless we cache those sometplace, - # and even then. - self._processing += 1 if not self._running: self.log("but we're not running, so we'll ignore it", parent=lp, level=log.NOISY) hunk ./src/allmydata/mutable/servermap.py 633 ss, storage_index = stuff ds = [] - - def _tattle(ignored, status): - print status - print ignored - return ignored - - def _cache(verinfo, shnum, now, data): - self._queries_oustand - self._node._add_to_cache(verinfo, shnum, 0, data, now) - return shnum, verinfo - - def _corrupt(e, shnum, data): - # This gets raised when there was something wrong with - # the remote server. Specifically, when there was an - # error unpacking the remote data from the server, or - # when the signature is invalid. - print e - f = failure.Failure() - self.log(format="bad share: %(f_value)s", f_value=str(f.value), - failure=f, parent=lp, level=log.WEIRD, umid="h5llHg") - # Notify the server that its share is corrupt. - self.notify_server_corruption(peerid, shnum, str(e)) - # By flagging this as a bad peer, we won't count any of - # the other shares on that peer as valid, though if we - # happen to find a valid version string amongst those - # shares, we'll keep track of it so that we don't need - # to validate the signature on those again. - self._bad_peers.add(peerid) - self._last_failure = f - # 393CHANGE: Use the reader for this. - checkstring = data[:SIGNED_PREFIX_LENGTH] - self._servermap.mark_bad_share(peerid, shnum, checkstring) - self._servermap.problems.append(f) - for shnum,datav in datavs.items(): data = datav[0] reader = MDMFSlotReadProxy(ss, hunk ./src/allmydata/mutable/servermap.py 646 # need to do the following: # - If we don't already have the public key, fetch the # public key. We use this to validate the signature. - friendly_peer = idlib.shortnodeid_b2a(peerid) if not self._node.get_pubkey(): # fetch and set the public key. d = reader.get_verification_key() hunk ./src/allmydata/mutable/servermap.py 649 - d.addCallback(self._try_to_set_pubkey) + d.addCallback(lambda results, shnum=shnum, peerid=peerid: + self._try_to_set_pubkey(results, peerid, shnum, lp)) + # XXX: Make self._pubkey_query_failed? + d.addErrback(lambda error, shnum=shnum, peerid=peerid: + self._got_corrupt_share(error, shnum, peerid, data, lp)) else: # we already have the public key. d = defer.succeed(None) hunk ./src/allmydata/mutable/servermap.py 666 # bytes of the share on the storage server, so we # shouldn't need to fetch anything at this step. d2 = reader.get_verinfo() + d2.addErrback(lambda error, shnum=shnum, peerid=peerid: + self._got_corrupt_share(error, shnum, peerid, data, lp)) # - Next, we need the signature. For an SDMF share, it is # likely that we fetched this when doing our initial fetch # to get the version information. In MDMF, this lives at hunk ./src/allmydata/mutable/servermap.py 674 # the end of the share, so unless the file is quite small, # we'll need to do a remote fetch to get it. d3 = reader.get_signature() + d3.addErrback(lambda error, shnum=shnum, peerid=peerid: + self._got_corrupt_share(error, shnum, peerid, data, lp)) # Once we have all three of these responses, we can move on # to validating the signature hunk ./src/allmydata/mutable/servermap.py 681 # Does the node already have a privkey? If not, we'll try to # fetch it here. - if not self._node.get_privkey(): + if self._need_privkey: d4 = reader.get_encprivkey() d4.addCallback(lambda results, shnum=shnum, peerid=peerid: self._try_to_validate_privkey(results, peerid, shnum, lp)) hunk ./src/allmydata/mutable/servermap.py 685 + d4.addErrback(lambda error, shnum=shnum, peerid=peerid: + self._privkey_query_failed(error, shnum, data, lp)) else: d4 = defer.succeed(None) hunk ./src/allmydata/mutable/servermap.py 694 dl.addCallback(lambda results, shnum=shnum, peerid=peerid: self._got_signature_one_share(results, shnum, peerid, lp)) dl.addErrback(lambda error, shnum=shnum, data=data: - _corrupt(error, shnum, data)) + self._got_corrupt_share(error, shnum, peerid, data, lp)) + dl.addCallback(lambda verinfo, shnum=shnum, peerid=peerid, data=data: + self._cache_good_sharedata(verinfo, shnum, now, data)) ds.append(dl) # dl is a deferred list that will fire when all of the shares # that we found on this peer are done processing. When dl fires, hunk ./src/allmydata/mutable/servermap.py 702 # we know that processing is done, so we can decrement the # semaphore-like thing that we incremented earlier. - dl = defer.DeferredList(ds) - def _done_processing(ignored): - self._processing -= 1 - return ignored - dl.addCallback(_done_processing) + dl = defer.DeferredList(ds, fireOnOneErrback=True) # Are we done? Done means that there are no more queries to # send, that there are no outstanding queries, and that we # haven't received any queries that are still processing. If we hunk ./src/allmydata/mutable/servermap.py 710 # that we returned to our caller to fire, which tells them that # they have a complete servermap, and that we won't be touching # the servermap anymore. - dl.addBoth(self._check_for_done) + dl.addCallback(self._check_for_done) dl.addErrback(self._fatal_error) # all done! hunk ./src/allmydata/mutable/servermap.py 713 - return dl self.log("_got_results done", parent=lp, level=log.NOISY) hunk ./src/allmydata/mutable/servermap.py 714 + return dl + hunk ./src/allmydata/mutable/servermap.py 717 - def _try_to_set_pubkey(self, pubkey_s): + def _try_to_set_pubkey(self, pubkey_s, peerid, shnum, lp): if self._node.get_pubkey(): return # don't go through this again if we don't have to fingerprint = hashutil.ssk_pubkey_fingerprint_hash(pubkey_s) hunk ./src/allmydata/mutable/servermap.py 773 if verinfo not in self._valid_versions: # This is a new version tuple, and we need to validate it # against the public key before keeping track of it. + assert self._node.get_pubkey() valid = self._node.get_pubkey().verify(prefix, signature[1]) if not valid: raise CorruptShareError(peerid, shnum, hunk ./src/allmydata/mutable/servermap.py 892 self._queries_completed += 1 self._last_failure = f - def _got_privkey_results(self, datavs, peerid, shnum, started, lp): - now = time.time() - elapsed = now - started - self._status.add_per_server_time(peerid, "privkey", started, elapsed) - self._queries_outstanding.discard(peerid) - if not self._need_privkey: - return - if shnum not in datavs: - self.log("privkey wasn't there when we asked it", - level=log.WEIRD, umid="VA9uDQ") - return - datav = datavs[shnum] - enc_privkey = datav[0] - self._try_to_validate_privkey(enc_privkey, peerid, shnum, lp) def _privkey_query_failed(self, f, peerid, shnum, lp): self._queries_outstanding.discard(peerid) hunk ./src/allmydata/mutable/servermap.py 906 self._servermap.problems.append(f) self._last_failure = f + def _check_for_done(self, res): # exit paths: # return self._send_more_queries(outstanding) : send some more queries hunk ./src/allmydata/mutable/servermap.py 930 self.log("but we're not running", parent=lp, level=log.NOISY) return - if self._processing > 0: - # wait until more results are done before returning. - return - if self._must_query: # we are still waiting for responses from peers that used to have # a share, so we must continue to wait. No additional queries are } [Add objects for MDMF shares in support of a new segmented uploader Kevan Carstensen **20100623233203 Ignore-this: 9fa8319bc5e9142da7e70e1c91da2300 This patch adds the following: - MDMFSlotWriteProxy, which can write MDMF shares to the storage server in the new format. - MDMFSlotReadProxy, which can read both SDMF and MDMF shares from the storage server. This patch also includes tests for these new object. ] { hunk ./src/allmydata/interfaces.py 7 ChoiceOf, IntegerConstraint, Any, RemoteInterface, Referenceable HASH_SIZE=32 +SALT_SIZE=16 SDMF_VERSION=0 MDMF_VERSION=1 hunk ./src/allmydata/mutable/layout.py 4 import struct from allmydata.mutable.common import NeedMoreDataError, UnknownVersionError +from allmydata.interfaces import HASH_SIZE, SALT_SIZE, SDMF_VERSION, \ + MDMF_VERSION +from allmydata.util import mathutil, observer +from twisted.python import failure +from twisted.internet import defer + + +# These strings describe the format of the packed structs they help process +# Here's what they mean: +# +# PREFIX: +# >: Big-endian byte order; the most significant byte is first (leftmost). +# B: The version information; an 8 bit version identifier. Stored as +# an unsigned char. This is currently 00 00 00 00; our modifications +# will turn it into 00 00 00 01. +# Q: The sequence number; this is sort of like a revision history for +# mutable files; they start at 1 and increase as they are changed after +# being uploaded. Stored as an unsigned long long, which is 8 bytes in +# length. +# 32s: The root hash of the share hash tree. We use sha-256d, so we use 32 +# characters = 32 bytes to store the value. +# 16s: The salt for the readkey. This is a 16-byte random value, stored as +# 16 characters. +# +# SIGNED_PREFIX additions, things that are covered by the signature: +# B: The "k" encoding parameter. We store this as an 8-bit character, +# which is convenient because our erasure coding scheme cannot +# encode if you ask for more than 255 pieces. +# B: The "N" encoding parameter. Stored as an 8-bit character for the +# same reasons as above. +# Q: The segment size of the uploaded file. This will essentially be the +# length of the file in SDMF. An unsigned long long, so we can store +# files of quite large size. +# Q: The data length of the uploaded file. Modulo padding, this will be +# the same of the data length field. Like the data length field, it is +# an unsigned long long and can be quite large. +# +# HEADER additions: +# L: The offset of the signature of this. An unsigned long. +# L: The offset of the share hash chain. An unsigned long. +# L: The offset of the block hash tree. An unsigned long. +# L: The offset of the share data. An unsigned long. +# Q: The offset of the encrypted private key. An unsigned long long, to +# account for the possibility of a lot of share data. +# Q: The offset of the EOF. An unsigned long long, to account for the +# possibility of a lot of share data. +# +# After all of these, we have the following: +# - The verification key: Occupies the space between the end of the header +# and the start of the signature (i.e.: data[HEADER_LENGTH:o['signature']]. +# - The signature, which goes from the signature offset to the share hash +# chain offset. +# - The share hash chain, which goes from the share hash chain offset to +# the block hash tree offset. +# - The share data, which goes from the share data offset to the encrypted +# private key offset. +# - The encrypted private key offset, which goes until the end of the file. +# +# The block hash tree in this encoding has only one share, so the offset of +# the share data will be 32 bits more than the offset of the block hash tree. +# Given this, we may need to check to see how many bytes a reasonably sized +# block hash tree will take up. PREFIX = ">BQ32s16s" # each version has a different prefix SIGNED_PREFIX = ">BQ32s16s BBQQ" # this is covered by the signature hunk ./src/allmydata/mutable/layout.py 191 return (share_hash_chain, block_hash_tree, share_data) -def pack_checkstring(seqnum, root_hash, IV): +def pack_checkstring(seqnum, root_hash, IV, version=0): return struct.pack(PREFIX, hunk ./src/allmydata/mutable/layout.py 193 - 0, # version, + version, seqnum, root_hash, IV) hunk ./src/allmydata/mutable/layout.py 266 encprivkey]) return final_share +def pack_prefix(seqnum, root_hash, IV, + required_shares, total_shares, + segment_size, data_length): + prefix = struct.pack(SIGNED_PREFIX, + 0, # version, + seqnum, + root_hash, + IV, + required_shares, + total_shares, + segment_size, + data_length, + ) + return prefix + + +MDMFHEADER = ">BQ32s32sBBQQ LQQQQQQQ" +MDMFHEADERWITHOUTOFFSETS = ">BQ32s32sBBQQ" +MDMFHEADERSIZE = struct.calcsize(MDMFHEADER) +MDMFCHECKSTRING = ">BQ32s32s" +MDMFSIGNABLEHEADER = ">BQ32s32sBBQQ" +MDMFOFFSETS = ">LQQQQQQQ" + +class MDMFSlotWriteProxy: + #implements(IMutableSlotWriter) TODO + + """ + I represent a remote write slot for an MDMF mutable file. + + I abstract away from my caller the details of block and salt + management, and the implementation of the on-disk format for MDMF + shares. + """ + + # Expected layout, MDMF: + # offset: size: name: + #-- signed part -- + # 0 1 version number (01) + # 1 8 sequence number + # 9 32 share tree root hash + # 41 32 salt tree root hash + # 73 1 The "k" encoding parameter + # 74 1 The "N" encoding parameter + # 75 8 The segment size of the uploaded file + # 83 8 The data length of the uploaded file + #-- end signed part -- + # 91 4 The offset of the share data + # 95 8 The offset of the encrypted private key + # 103 8 The offset of the block hash tree + # 111 8 The offset of the salt hash tree + # 119 8 The offset of the signature hash chain + # 127 8 The offset of the signature + # 135 8 The offset of the verification key + # 143 8 offset of the EOF + # + # followed by salts, share data, the encrypted private key, the + # block hash tree, the salt hash tree, the share hash chain, a + # signature over the first eight fields, and a verification key. + # + # The checkstring is the first four fields -- the version number, + # sequence number, root hash and root salt hash. This is consistent + # in meaning to what we have with SDMF files, except now instead of + # using the literal salt, we use a value derived from all of the + # salts. + # + # The ordering of the offsets is different to reflect the dependencies + # that we'll run into with an MDMF file. The expected write flow is + # something like this: + # + # 0: Initialize with the sequence number, encoding + # parameters and data length. From this, we can deduce the + # number of segments, and from that we can deduce the size of + # the AES salt field, telling us where to write AES salts, and + # where to write share data. We can also figure out where the + # encrypted private key should go, because we can figure out + # how big the share data will be. + # + # 1: Encrypt, encode, and upload the file in chunks. Do something + # like + # + # put_block(data, segnum, salt) + # + # to write a block and a salt to the disk. We can do both of + # these operations now because we have enough of the offsets to + # know where to put them. + # + # 2: Put the encrypted private key. Use: + # + # put_encprivkey(encprivkey) + # + # Now that we know the length of the private key, we can fill + # in the offset for the block hash tree. + # + # 3: We're now in a position to upload the block hash tree for + # a share. Put that using something like: + # + # put_blockhashes(block_hash_tree) + # + # Note that block_hash_tree is a list of hashes -- we'll take + # care of the details of serializing that appropriately. When + # we get the block hash tree, we are also in a position to + # calculate the offset for the share hash chain, and fill that + # into the offsets table. + # + # 4: At the same time, we're in a position to upload the salt hash + # tree. This is a Merkle tree over all of the salts. We use a + # Merkle tree so that we can validate each block,salt pair as + # we download them later. We do this using + # + # put_salthashes(salt_hash_tree) + # + # When you do this, I automatically put the root of the tree + # (the hash at index 0 of the list) in its appropriate slot in + # the signed prefix of the share. + # + # 5: We're now in a position to upload the share hash chain for + # a share. Do that with something like: + # + # put_sharehashes(share_hash_chain) + # + # share_hash_chain should be a dictionary mapping shnums to + # 32-byte hashes -- the wrapper handles serialization. + # We'll know where to put the signature at this point, also. + # The root of this tree will be put explicitly in the next + # step. + # + # TODO: Why? Why not just include it in the tree here? + # + # 6: Before putting the signature, we must first put the + # root_hash. Do this with: + # + # put_root_hash(root_hash). + # + # In terms of knowing where to put this value, it was always + # possible to place it, but it makes sense semantically to + # place it after the share hash tree, so that's why you do it + # in this order. + # + # 6: With the root hash put, we can now sign the header. Use: + # + # get_signable() + # + # to get the part of the header that you want to sign, and use: + # + # put_signature(signature) + # + # to write your signature to the remote server. + # + # 6: Add the verification key, and finish. Do: + # + # put_verification_key(key) + # + # and + # + # finish_publish() + # + # Checkstring management: + # + # To write to a mutable slot, we have to provide test vectors to ensure + # that we are writing to the same data that we think we are. These + # vectors allow us to detect uncoordinated writes; that is, writes + # where both we and some other shareholder are writing to the + # mutable slot, and to report those back to the parts of the program + # doing the writing. + # + # With SDMF, this was easy -- all of the share data was written in + # one go, so it was easy to detect uncoordinated writes, and we only + # had to do it once. With MDMF, not all of the file is written at + # once. + # + # If a share is new, we write out as much of the header as we can + # before writing out anything else. This gives other writers a + # canary that they can use to detect uncoordinated writes, and, if + # they do the same thing, gives us the same canary. We them update + # the share. We won't be able to write out two fields of the header + # -- the share tree hash and the salt hash -- until we finish + # writing out the share. We only require the writer to provide the + # initial checkstring, and keep track of what it should be after + # updates ourselves. + # + # If we haven't written anything yet, then on the first write (which + # will probably be a block + salt of a share), we'll also write out + # the header. On subsequent passes, we'll expect to see the header. + # This changes in two places: + # + # - When we write out the salt hash + # - When we write out the root of the share hash tree + # + # since these values will change the header. It is possible that we + # can just make those be written in one operation to minimize + # disruption. + def __init__(self, + shnum, + rref, # a remote reference to a storage server + storage_index, + secrets, # (write_enabler, renew_secret, cancel_secret) + seqnum, # the sequence number of the mutable file + required_shares, + total_shares, + segment_size, + data_length): # the length of the original file + self._shnum = shnum + self._rref = rref + self._storage_index = storage_index + self._seqnum = seqnum + self._required_shares = required_shares + assert self._shnum >= 0 and self._shnum < total_shares + self._total_shares = total_shares + # We build up the offset table as we write things. It is the + # last thing we write to the remote server. + self._offsets = {} + self._testvs = [] + self._secrets = secrets + # The segment size needs to be a multiple of the k parameter -- + # any padding should have been carried out by the publisher + # already. + assert segment_size % required_shares == 0 + self._segment_size = segment_size + self._data_length = data_length + + # These are set later -- we define them here so that we can + # check for their existence easily + + # This is the root of the share hash tree -- the Merkle tree + # over the roots of the block hash trees computed for shares in + # this upload. + self._root_hash = None + # This is the root of the salt hash tree -- the Merkle tree over + # the hashes of the salts used for each segment of the file. + self._salt_hash = None + + # We haven't yet written anything to the remote bucket. By + # setting this, we tell the _write method as much. The write + # method will then know that it also needs to add a write vector + # for the checkstring (or what we have of it) to the first write + # request. We'll then record that value for future use. If + # we're expecting something to be there already, we need to call + # set_checkstring before we write anything to tell the first + # write about that. + self._written = False + + # When writing data to the storage servers, we get a read vector + # for free. We'll read the checkstring, which will help us + # figure out what's gone wrong if a write fails. + self._readv = [(0, struct.calcsize(MDMFCHECKSTRING))] + + # We calculate the number of segments because it tells us + # where the salt part of the file ends/share segment begins, + # and also because it provides a useful amount of bounds checking. + self._num_segments = mathutil.div_ceil(self._data_length, + self._segment_size) + self._block_size = self._segment_size / self._required_shares + # We also calculate the share size, to help us with block + # constraints later. + tail_size = self._data_length % self._segment_size + if not tail_size: + self._tail_block_size = self._block_size + else: + self._tail_block_size = mathutil.next_multiple(tail_size, + self._required_shares) + self._tail_block_size /= self._required_shares + + # We already know where the AES salts start; right after the end + # of the header (which is defined as the signable part + the offsets) + # We need to calculate where the share data starts, since we're + # responsible (after this method) for being able to write it. + self._offsets['share-data'] = MDMFHEADERSIZE + self._offsets['share-data'] += self._num_segments * SALT_SIZE + # We can also calculate where the encrypted private key begins + # from what we know know. + self._offsets['enc_privkey'] = self._offsets['share-data'] + self._offsets['enc_privkey'] += self._block_size * (self._num_segments - 1) + self._offsets['enc_privkey'] += self._tail_block_size + # We'll wait for the rest. Callers can now call my "put_block" and + # "set_checkstring" methods. + + + def set_checkstring(self, checkstring): + """ + Set checkstring checkstring for the given shnum. + + By default, I assume that I am writing new shares to the grid. + If you don't explcitly set your own checkstring, I will use + one that requires that the remote share not exist. You will want + to use this method if you are updating a share in-place; + otherwise, writes will fail. + """ + # You're allowed to overwrite checkstrings with this method; + # I assume that users know what they are doing when they call + # it. + if checkstring == "": + # We special-case this, since len("") = 0, but we need + # length of 1 for the case of an empty share to work on the + # storage server, which is what a checkstring that is the + # empty string means. + self._testvs = [] + else: + self._testvs = [] + self._testvs.append((0, len(checkstring), "eq", checkstring)) + + + def __repr__(self): + return "MDMFSlotWriteProxy for share %d" % self._shnum + + + def get_checkstring(self): + """ + Given a share number, I return a representation of what the + checkstring for that share on the server will look like. + """ + if self._root_hash: + roothash = self._root_hash + else: + roothash = "\x00" * 32 + # self._salt_hash and self._root_hash means that we've written + # both of these things to the server. self._salt_hash will be + # set first, though, and if self._root_hash isn't also set then + # neither of them are written to the server, so we need to leave + # them alone. + if self._salt_hash and self._root_hash: + salthash = self._salt_hash + else: + salthash = "\x00" * 32 + checkstring = struct.pack(MDMFCHECKSTRING, + 1, + self._seqnum, + roothash, + salthash) + return checkstring + + + def put_block(self, data, segnum, salt): + """ + Put the encrypted-and-encoded data segment in the slot, along + with the salt. + """ + if segnum >= self._num_segments: + raise LayoutInvalid("I won't overwrite the private key") + if len(salt) != SALT_SIZE: + raise LayoutInvalid("I was given a salt of size %d, but " + "I wanted a salt of size %d") + if segnum + 1 == self._num_segments: + if len(data) != self._tail_block_size: + raise LayoutInvalid("I was given the wrong size block to write") + elif len(data) != self._block_size: + raise LayoutInvalid("I was given the wrong size block to write") + + # We want to write at offsets['share-data'] + segnum * block_size. + assert self._offsets + assert self._offsets['share-data'] + + offset = self._offsets['share-data'] + segnum * self._block_size + datavs = [tuple([offset, data])] + # We also have to write the salt. This is at: + salt_offset = MDMFHEADERSIZE + SALT_SIZE * segnum + datavs.append(tuple([salt_offset, salt])) + return self._write(datavs) + + + def put_encprivkey(self, encprivkey): + """ + Put the encrypted private key in the remote slot. + """ + assert self._offsets + assert self._offsets['enc_privkey'] + # You shouldn't re-write the encprivkey after the block hash + # tree is written, since that could cause the private key to run + # into the block hash tree. Before it writes the block hash + # tree, the block hash tree writing method writes the offset of + # the salt hash tree. So that's a good indicator of whether or + # not the block hash tree has been written. + if "salt_hash_tree" in self._offsets: + raise LayoutInvalid("You must write this before the block hash tree") + + self._offsets['block_hash_tree'] = self._offsets['enc_privkey'] + len(encprivkey) + datavs = [(tuple([self._offsets['enc_privkey'], encprivkey]))] + def _on_failure(): + del(self._offsets['block_hash_tree']) + return self._write(datavs, on_failure=_on_failure) + + + def put_blockhashes(self, blockhashes): + """ + Put the block hash tree in the remote slot. + + The encrypted private key must be put before the block hash + tree, since we need to know how large it is to know where the + block hash tree should go. The block hash tree must be put + before the salt hash tree, since its size determines the + offset of the share hash chain. + """ + assert self._offsets + assert isinstance(blockhashes, list) + if "block_hash_tree" not in self._offsets: + raise LayoutInvalid("You must put the encrypted private key " + "before you put the block hash tree") + # If written, the share hash chain causes the signature offset + # to be defined. + if "share_hash_chain" in self._offsets: + raise LayoutInvalid("You must put the block hash tree before " + "you put the salt hash tree") + blockhashes_s = "".join(blockhashes) + self._offsets['salt_hash_tree'] = self._offsets['block_hash_tree'] + len(blockhashes_s) + datavs = [] + datavs.append(tuple([self._offsets['block_hash_tree'], blockhashes_s])) + def _on_failure(): + del(self._offsets['salt_hash_tree']) + return self._write(datavs, on_failure=_on_failure) + + + def put_salthashes(self, salthashes): + """ + Put the salt hash tree in the remote slot. + + The block hash tree must be put before the salt hash tree, since + its size tells us where we need to put the salt hash tree. This + method must be called before the share hash chain can be + uploaded, since the size of the salt hash tree tells us where + the share hash chain can go + """ + assert self._offsets + assert isinstance(salthashes, list) + if "salt_hash_tree" not in self._offsets: + raise LayoutInvalid("You must put the block hash tree " + "before putting the salt hash tree") + if "signature" in self._offsets: + raise LayoutInvalid("You must put the salt hash tree " + "before you put the share hash chain") + # The root of the salt hash tree is at index 0. We'll write this when + # we put the root hash later; we just keep track of it for now. + self._salt_hash = salthashes[0] + salthashes_s = "".join(salthashes[1:]) + self._offsets['share_hash_chain'] = self._offsets['salt_hash_tree'] + len(salthashes_s) + datavs = [] + datavs.append(tuple([self._offsets['salt_hash_tree'], salthashes_s])) + def _on_failure(): + del(self._offsets['share_hash_chain']) + return self._write(datavs, on_failure=_on_failure) + + + def put_sharehashes(self, sharehashes): + """ + Put the share hash chain in the remote slot. + + The salt hash tree must be put before the share hash chain, + since we need to know where the salt hash tree ends before we + can know where the share hash chain starts. The share hash chain + must be put before the signature, since the length of the packed + share hash chain determines the offset of the signature. Also, + semantically, you must know what the root of the salt hash tree + is before you can generate a valid signature. + """ + assert isinstance(sharehashes, dict) + if "share_hash_chain" not in self._offsets: + raise LayoutInvalid("You need to put the salt hash tree before " + "you can put the share hash chain") + # The signature comes after the share hash chain. If the + # signature has already been written, we must not write another + # share hash chain. The signature writes the verification key + # offset when it gets sent to the remote server, so we look for + # that. + if "verification_key" in self._offsets: + raise LayoutInvalid("You must write the share hash chain " + "before you write the signature") + datavs = [] + sharehashes_s = "".join([struct.pack(">H32s", i, sharehashes[i]) + for i in sorted(sharehashes.keys())]) + self._offsets['signature'] = self._offsets['share_hash_chain'] + len(sharehashes_s) + datavs.append(tuple([self._offsets['share_hash_chain'], sharehashes_s])) + def _on_failure(): + del(self._offsets['signature']) + return self._write(datavs, on_failure=_on_failure) + + + def put_root_hash(self, roothash): + """ + Put the root hash (the root of the share hash tree) in the + remote slot. + """ + # It does not make sense to be able to put the root + # hash without first putting the share hashes, since you need + # the share hashes to generate the root hash. + # + # Signature is defined by the routine that places the share hash + # chain, so it's a good thing to look for in finding out whether + # or not the share hash chain exists on the remote server. + if "signature" not in self._offsets: + raise LayoutInvalid("You need to put the share hash chain " + "before you can put the root share hash") + if len(roothash) != HASH_SIZE: + raise LayoutInvalid("hashes and salts must be exactly %d bytes" + % HASH_SIZE) + datavs = [] + self._root_hash = roothash + # To write both of these values, we update the checkstring on + # the remote server, which includes them + checkstring = self.get_checkstring() + datavs.append(tuple([0, checkstring])) + # This write, if successful, changes the checkstring, so we need + # to update our internal checkstring to be consistent with the + # one on the server. + def _on_success(): + self._testvs = [(0, len(checkstring), "eq", checkstring)] + def _on_failure(): + self._root_hash = None + self._salt_hash = None + return self._write(datavs, + on_success=_on_success, + on_failure=_on_failure) + + + def get_signable(self): + """ + Get the first eight fields of the mutable file; the parts that + are signed. + """ + if not self._root_hash or not self._salt_hash: + raise LayoutInvalid("You need to set the root hash and the " + "salt hash before getting something to " + "sign") + return struct.pack(MDMFSIGNABLEHEADER, + 1, + self._seqnum, + self._root_hash, + self._salt_hash, + self._required_shares, + self._total_shares, + self._segment_size, + self._data_length) + + + def put_signature(self, signature): + """ + Put the signature field to the remote slot. + + I require that the root hash and share hash chain have been put + to the grid before I will write the signature to the grid. + """ + if "signature" not in self._offsets: + raise LayoutInvalid("You must put the share hash chain " + # It does not make sense to put a signature without first + # putting the root hash and the salt hash (since otherwise + # the signature would be incomplete), so we don't allow that. + "before putting the signature") + if not self._root_hash: + raise LayoutInvalid("You must complete the signed prefix " + "before computing a signature") + # If we put the signature after we put the verification key, we + # could end up running into the verification key, and will + # probably screw up the offsets as well. So we don't allow that. + # The method that writes the verification key defines the EOF + # offset before writing the verification key, so look for that. + if "EOF" in self._offsets: + raise LayoutInvalid("You must write the signature before the verification key") + + self._offsets['verification_key'] = self._offsets['signature'] + len(signature) + datavs = [] + datavs.append(tuple([self._offsets['signature'], signature])) + def _on_failure(): + del(self._offsets['verification_key']) + return self._write(datavs, on_failure=_on_failure) + + + def put_verification_key(self, verification_key): + """ + Put the verification key into the remote slot. + + I require that the signature have been written to the storage + server before I allow the verification key to be written to the + remote server. + """ + if "verification_key" not in self._offsets: + raise LayoutInvalid("You must put the signature before you " + "can put the verification key") + self._offsets['EOF'] = self._offsets['verification_key'] + len(verification_key) + datavs = [] + datavs.append(tuple([self._offsets['verification_key'], verification_key])) + def _on_failure(): + del(self._offsets['EOF']) + return self._write(datavs, on_failure=_on_failure) + + + def finish_publishing(self): + """ + Write the offset table and encoding parameters to the remote + slot, since that's the only thing we have yet to publish at this + point. + """ + if "EOF" not in self._offsets: + raise LayoutInvalid("You must put the verification key before " + "you can publish the offsets") + offsets_offset = struct.calcsize(MDMFHEADERWITHOUTOFFSETS) + offsets = struct.pack(MDMFOFFSETS, + self._offsets['share-data'], + self._offsets['enc_privkey'], + self._offsets['block_hash_tree'], + self._offsets['salt_hash_tree'], + self._offsets['share_hash_chain'], + self._offsets['signature'], + self._offsets['verification_key'], + self._offsets['EOF']) + datavs = [] + datavs.append(tuple([offsets_offset, offsets])) + encoding_parameters_offset = struct.calcsize(MDMFCHECKSTRING) + params = struct.pack(">BBQQ", + self._required_shares, + self._total_shares, + self._segment_size, + self._data_length) + datavs.append(tuple([encoding_parameters_offset, params])) + return self._write(datavs) + + + def _write(self, datavs, on_failure=None, on_success=None): + """I write the data vectors in datavs to the remote slot.""" + tw_vectors = {} + new_share = False + if not self._testvs: + self._testvs = [] + self._testvs.append(tuple([0, 1, "eq", ""])) + new_share = True + if not self._written: + # Write a new checkstring to the share when we write it, so + # that we have something to check later. + new_checkstring = self.get_checkstring() + datavs.append((0, new_checkstring)) + def _first_write(): + self._written = True + self._testvs = [(0, len(new_checkstring), "eq", new_checkstring)] + on_success = _first_write + tw_vectors[self._shnum] = (self._testvs, datavs, None) + datalength = sum([len(x[1]) for x in datavs]) + d = self._rref.callRemote("slot_testv_and_readv_and_writev", + self._storage_index, + self._secrets, + tw_vectors, + self._readv) + def _result(results): + if isinstance(results, failure.Failure) or not results[0]: + # Do nothing; the write was unsuccessful. + if on_failure: on_failure() + else: + if on_success: on_success() + return results + d.addCallback(_result) + return d + + +class MDMFSlotReadProxy: + """ + I read from a mutable slot filled with data written in the MDMF data + format (which is described above). + + I can be initialized with some amount of data, which I will use (if + it is valid) to eliminate some of the need to fetch it from servers. + """ + def __init__(self, + rref, + storage_index, + shnum, + data=""): + # Start the initialization process. + self._rref = rref + self._storage_index = storage_index + self.shnum = shnum + + # Before doing anything, the reader is probably going to want to + # verify that the signature is correct. To do that, they'll need + # the verification key, and the signature. To get those, we'll + # need the offset table. So fetch the offset table on the + # assumption that that will be the first thing that a reader is + # going to do. + + # The fact that these encoding parameters are None tells us + # that we haven't yet fetched them from the remote share, so we + # should. We could just not set them, but the checks will be + # easier to read if we don't have to use hasattr. + self._version_number = None + self._sequence_number = None + self._root_hash = None + self._salt_hash = None + self._salt = None + self._required_shares = None + self._total_shares = None + self._segment_size = None + self._data_length = None + self._offsets = None + + # If the user has chosen to initialize us with some data, we'll + # try to satisfy subsequent data requests with that data before + # asking the storage server for it. If + self._data = data + # The way callers interact with cache in the filenode returns + # None if there isn't any cached data, but the way we index the + # cached data requires a string, so convert None to "". + if self._data == None: + self._data = "" + + self._queue_observers = observer.ObserverList() + self._readvs = [] + + + def _maybe_fetch_offsets_and_header(self, force_remote=False): + """ + I fetch the offset table and the header from the remote slot if + I don't already have them. If I do have them, I do nothing and + return an empty Deferred. + """ + if self._offsets: + return defer.succeed(None) + # At this point, we may be either SDMF or MDMF. Fetching 91 + # bytes will be enough to get information for both SDMF and + # MDMF, though we'll be left with about 20 more bytes than we + # need if this ends up being SDMF. We could just fetch the first + # byte, which would save the extra bytes at the cost of an + # additional roundtrip after we parse the result. + readvs = [(0, 91)] + d = self._read(readvs, force_remote) + d.addCallback(self._process_encoding_parameters) + + # Now, we have the encoding parameters, which will tell us + # where we need to look for the offset table. + def _fetch_offsets(ignored): + if self._version_number == 0: + # In SDMF, the offset table starts at byte 75, and + # extends for 32 bytes + readv = [(75, 32)] # struct.calcsize(">LLLLQQ") == 32 + + elif self._version_number == 1: + # In MDMF, the offset table starts at byte 91 and + # extends for 60 bytes + readv = [(91, 60)] # struct.calcsize(">LQQQQQQQ") == 60 + else: + raise LayoutInvalid("I only understand SDMF and MDMF") + return readv + + d.addCallback(_fetch_offsets) + d.addCallback(lambda readv: + self._read(readv, force_remote)) + d.addCallback(self._process_offsets) + return d + + + def _process_encoding_parameters(self, encoding_parameters): + assert self.shnum in encoding_parameters + encoding_parameters = encoding_parameters[self.shnum][0] + # The first byte is the version number. It will tell us what + # to do next. + (verno,) = struct.unpack(">B", encoding_parameters[:1]) + if verno == MDMF_VERSION: + (verno, + seqnum, + root_hash, + salt_hash, + k, + n, + segsize, + datalen) = struct.unpack(MDMFHEADERWITHOUTOFFSETS, + encoding_parameters) + self._salt_hash = salt_hash + if segsize == 0 and datalen == 0: + # Empty file, no segments. + self._num_segments = 0 + else: + self._num_segments = mathutil.div_ceil(datalen, segsize) + + elif verno == SDMF_VERSION: + (verno, + seqnum, + root_hash, + salt, + k, + n, + segsize, + datalen) = struct.unpack(">BQ32s16s BBQQ", + encoding_parameters[:75]) + self._salt = salt + if segsize == 0 and datalen == 0: + # empty file + self._num_segments = 0 + else: + # non-empty SDMF files have one segment. + self._num_segments = 1 + else: + raise UnknownVersionError("You asked me to read mutable file " + "version %d, but I only understand " + "%d and %d" % (verno, SDMF_VERSION, + MDMF_VERSION)) + + self._version_number = verno + self._sequence_number = seqnum + self._root_hash = root_hash + self._required_shares = k + self._total_shares = n + self._segment_size = segsize + self._data_length = datalen + + self._block_size = self._segment_size / self._required_shares + # We can upload empty files, and need to account for this fact + # so as to avoid zero-division and zero-modulo errors. + if datalen > 0: + tail_size = self._data_length % self._segment_size + else: + tail_size = 0 + if not tail_size: + self._tail_block_size = self._block_size + else: + self._tail_block_size = mathutil.next_multiple(tail_size, + self._required_shares) + self._tail_block_size /= self._required_shares + + + def _process_offsets(self, offsets): + assert self.shnum in offsets + offsets = offsets[self.shnum][0] + if self._version_number == 0: + (signature, + share_hash_chain, + block_hash_tree, + share_data, + enc_privkey, + EOF) = struct.unpack(">LLLLQQ", offsets) + self._offsets = {} + self._offsets['signature'] = signature + self._offsets['share_data'] = share_data + self._offsets['block_hash_tree'] = block_hash_tree + self._offsets['share_hash_chain'] = share_hash_chain + self._offsets['enc_privkey'] = enc_privkey + self._offsets['EOF'] = EOF + elif self._version_number == 1: + (share_data, + encprivkey, + blockhashes, + salthashes, + sharehashes, + signature, + verification_key, + eof) = struct.unpack(MDMFOFFSETS, offsets) + self._offsets = {} + self._offsets['share_data'] = share_data + self._offsets['enc_privkey'] = encprivkey + self._offsets['block_hash_tree'] = blockhashes + self._offsets['salt_hash_tree']= salthashes + self._offsets['share_hash_chain'] = sharehashes + self._offsets['signature'] = signature + self._offsets['verification_key'] = verification_key + self._offsets['EOF'] = eof + + + def get_block_and_salt(self, segnum, queue=False): + """ + I return (block, salt), where block is the block data and + salt is the salt used to encrypt that segment. + """ + d = self._maybe_fetch_offsets_and_header() + def _then(ignored): + base_share_offset = self._offsets['share_data'] + if self._version_number == 1: + base_salt_offset = struct.calcsize(MDMFHEADER) + salt_offset = base_salt_offset + SALT_SIZE * segnum + else: + salt_offset = None # no per-segment salts in SDMF + return base_share_offset, salt_offset + + d.addCallback(_then) + + def _calculate_share_offset(share_and_salt_offset): + base_share_offset, salt_offset = share_and_salt_offset + if segnum + 1 > self._num_segments: + raise LayoutInvalid("Not a valid segment number") + + share_offset = base_share_offset + self._block_size * segnum + if segnum + 1 == self._num_segments: + data = self._tail_block_size + else: + data = self._block_size + readvs = [(share_offset, data)] + if salt_offset: + readvs.insert(0,(salt_offset, SALT_SIZE)) + return readvs + + d.addCallback(_calculate_share_offset) + d.addCallback(lambda readvs: + self._read(readvs, queue=queue)) + def _process_results(results): + assert self.shnum in results + if self._version_number == 0: + # We only read the share data, but we know the salt from + # when we fetched the header + data = results[self.shnum] + if not data: + data = "" + else: + assert len(data) == 1 + data = data[0] + salt = self._salt + else: + data = results[self.shnum] + if not data: + salt = data = "" + else: + assert len(data) == 2 + salt, data = results[self.shnum] + return data, salt + d.addCallback(_process_results) + return d + + + def get_blockhashes(self, needed=None, queue=False, force_remote=False): + """ + I return the block hash tree + + I take an optional argument, needed, which is a set of indices + correspond to hashes that I should fetch. If this argument is + missing, I will fetch the entire block hash tree; otherwise, I + may attempt to fetch fewer hashes, based on what needed says + that I should do. Note that I may fetch as many hashes as I + want, so long as the set of hashes that I do fetch is a superset + of the ones that I am asked for, so callers should be prepared + to tolerate additional hashes. + """ + # TODO: Return only the parts of the block hash tree necessary + # to validate the blocknum provided? + # This is a good idea, but it is hard to implement correctly. It + # is bad to fetch any one block hash more than once, so we + # probably just want to fetch the whole thing at once and then + # serve it. + if needed == set([]): + return defer.succeed([]) + d = self._maybe_fetch_offsets_and_header() + def _then(ignored): + blockhashes_offset = self._offsets['block_hash_tree'] + if self._version_number == 1: + blockhashes_length = self._offsets['salt_hash_tree'] - blockhashes_offset + else: + blockhashes_length = self._offsets['share_data'] - blockhashes_offset + readvs = [(blockhashes_offset, blockhashes_length)] + return readvs + d.addCallback(_then) + d.addCallback(lambda readvs: + self._read(readvs, queue=queue, force_remote=force_remote)) + def _build_block_hash_tree(results): + assert self.shnum in results + + rawhashes = results[self.shnum][0] + results = [rawhashes[i:i+HASH_SIZE] + for i in range(0, len(rawhashes), HASH_SIZE)] + return results + d.addCallback(_build_block_hash_tree) + return d + + + def get_salthashes(self, needed=None, queue=False): + """ + I return the salt hash tree. + + I accept an optional argument, needed, which is a set of indices + corresponding to hashes that I should fetch. If this argument is + missing, I will fetch and return the entire salt hash tree. + Otherwise, I may fetch any part of the salt hash tree, so long + as the part that I fetch and return is a superset of the part + that my caller has asked for. Callers should be prepared to + tolerate this behavior. + + This method is only meaningful for MDMF files, as only MDMF + files have a salt hash tree. If the remote file is an SDMF file, + this method will return False. + """ + # TODO: Only get the leaves nodes implied by salthashes + if needed == set([]): + return defer.succeed([]) + d = self._maybe_fetch_offsets_and_header() + def _then(ignored): + if self._version_number == 0: + return [] + else: + salthashes_offset = self._offsets['salt_hash_tree'] + salthashes_length = self._offsets['share_hash_chain'] - salthashes_offset + return [(salthashes_offset, salthashes_length)] + d.addCallback(_then) + def _maybe_read(readvs): + if readvs: + return self._read(readvs, queue=queue) + else: + return False + d.addCallback(_maybe_read) + def _process_results(results): + if not results: + return False + assert self.shnum in results + + rawhashes = results[self.shnum][0] + results = [rawhashes[i:i+HASH_SIZE] + for i in range(0, len(rawhashes), HASH_SIZE)] + return results + d.addCallback(_process_results) + return d + + + def get_sharehashes(self, needed=None, queue=False, force_remote=False): + """ + I return the part of the share hash chain placed to validate + this share. + + I take an optional argument, needed. Needed is a set of indices + that correspond to the hashes that I should fetch. If needed is + not present, I will fetch and return the entire share hash + chain. Otherwise, I may fetch and return any part of the share + hash chain that is a superset of the part that I am asked to + fetch. Callers should be prepared to deal with more hashes than + they've asked for. + """ + if needed == set([]): + return defer.succeed([]) + d = self._maybe_fetch_offsets_and_header() + + def _make_readvs(ignored): + sharehashes_offset = self._offsets['share_hash_chain'] + if self._version_number == 0: + sharehashes_length = self._offsets['block_hash_tree'] - sharehashes_offset + else: + sharehashes_length = self._offsets['signature'] - sharehashes_offset + readvs = [(sharehashes_offset, sharehashes_length)] + return readvs + d.addCallback(_make_readvs) + d.addCallback(lambda readvs: + self._read(readvs, queue=queue, force_remote=force_remote)) + def _build_share_hash_chain(results): + assert self.shnum in results + + sharehashes = results[self.shnum][0] + results = [sharehashes[i:i+(HASH_SIZE + 2)] + for i in range(0, len(sharehashes), HASH_SIZE + 2)] + results = dict([struct.unpack(">H32s", data) + for data in results]) + return results + d.addCallback(_build_share_hash_chain) + return d + + + def get_encprivkey(self, queue=False): + """ + I return the encrypted private key. + """ + d = self._maybe_fetch_offsets_and_header() + + def _make_readvs(ignored): + privkey_offset = self._offsets['enc_privkey'] + if self._version_number == 0: + privkey_length = self._offsets['EOF'] - privkey_offset + else: + privkey_length = self._offsets['block_hash_tree'] - privkey_offset + readvs = [(privkey_offset, privkey_length)] + return readvs + d.addCallback(_make_readvs) + d.addCallback(lambda readvs: + self._read(readvs, queue=queue)) + def _process_results(results): + assert self.shnum in results + privkey = results[self.shnum][0] + return privkey + d.addCallback(_process_results) + return d + + + def get_signature(self, queue=False): + """ + I return the signature of my share. + """ + d = self._maybe_fetch_offsets_and_header() + + def _make_readvs(ignored): + signature_offset = self._offsets['signature'] + if self._version_number == 1: + signature_length = self._offsets['verification_key'] - signature_offset + else: + signature_length = self._offsets['share_hash_chain'] - signature_offset + readvs = [(signature_offset, signature_length)] + return readvs + d.addCallback(_make_readvs) + d.addCallback(lambda readvs: + self._read(readvs, queue=queue)) + def _process_results(results): + assert self.shnum in results + signature = results[self.shnum][0] + return signature + d.addCallback(_process_results) + return d + + + def get_verification_key(self, queue=False): + """ + I return the verification key. + """ + d = self._maybe_fetch_offsets_and_header() + + def _make_readvs(ignored): + if self._version_number == 1: + vk_offset = self._offsets['verification_key'] + vk_length = self._offsets['EOF'] - vk_offset + else: + vk_offset = struct.calcsize(">BQ32s16sBBQQLLLLQQ") + vk_length = self._offsets['signature'] - vk_offset + readvs = [(vk_offset, vk_length)] + return readvs + d.addCallback(_make_readvs) + d.addCallback(lambda readvs: + self._read(readvs, queue=queue)) + def _process_results(results): + assert self.shnum in results + verification_key = results[self.shnum][0] + return verification_key + d.addCallback(_process_results) + return d + + + def get_encoding_parameters(self): + """ + I return (k, n, segsize, datalen) + """ + d = self._maybe_fetch_offsets_and_header() + d.addCallback(lambda ignored: + (self._required_shares, + self._total_shares, + self._segment_size, + self._data_length)) + return d + + + def get_seqnum(self): + """ + I return the sequence number for this share. + """ + d = self._maybe_fetch_offsets_and_header() + d.addCallback(lambda ignored: + self._sequence_number) + return d + + + def get_root_hash(self): + """ + I return the root of the block hash tree + """ + d = self._maybe_fetch_offsets_and_header() + d.addCallback(lambda ignored: self._root_hash) + return d + + + def get_salt_hash(self): + """ + I return the flat salt hash + """ + d = self._maybe_fetch_offsets_and_header() + d.addCallback(lambda ignored: self._salt_hash) + return d + + + def get_checkstring(self): + """ + I return the packed representation of the following: + + - version number + - sequence number + - root hash + - salt hash + + which my users use as a checkstring to detect other writers. + """ + d = self._maybe_fetch_offsets_and_header() + def _build_checkstring(ignored): + if self._salt_hash: + checkstring = struct.pack(MDMFCHECKSTRING, + self._version_number, + self._sequence_number, + self._root_hash, + self._salt_hash) + else: + checkstring = strut.pack(PREFIX, + self._version_number, + self._sequence_number, + self._root_hash, + self._salt) + return checkstring + d.addCallback(_build_checkstring) + return d + + + def get_prefix(self, force_remote): + d = self._maybe_fetch_offsets_and_header(force_remote) + d.addCallback(lambda ignored: + self._build_prefix()) + return d + + + def _build_prefix(self): + # The prefix is another name for the part of the remote share + # that gets signed. It consists of everything up to and + # including the datalength, packed by struct. + if self._version_number == SDMF_VERSION: + format_string = SIGNED_PREFIX + salt_to_use = self._salt + else: + format_string = MDMFSIGNABLEHEADER + salt_to_use = self._salt_hash + return struct.pack(format_string, + self._version_number, + self._sequence_number, + self._root_hash, + salt_to_use, + self._required_shares, + self._total_shares, + self._segment_size, + self._data_length) + + + def _get_offsets_tuple(self): + # The offsets tuple is another component of the version + # information tuple. It is basically our offsets dictionary, + # itemized and in a tuple. + return self._offsets.copy() + + + def get_verinfo(self): + """ + I return my verinfo tuple. This is used by the ServermapUpdater + to keep track of versions of mutable files. + + The verinfo tuple for MDMF files contains: + - seqnum + - root hash + - salt hash + - segsize + - datalen + - k + - n + - prefix (the thing that you sign) + - a tuple of offsets + + The verinfo tuple for SDMF files is the same, but contains a + 16-byte IV instead of a hash of salts. + """ + d = self._maybe_fetch_offsets_and_header() + def _build_verinfo(ignored): + if self._version_number == SDMF_VERSION: + salt_to_use = self._salt + else: + salt_to_use = self._salt_hash + return (self._sequence_number, + self._root_hash, + salt_to_use, + self._segment_size, + self._data_length, + self._required_shares, + self._total_shares, + self._build_prefix(), + self._get_offsets_tuple()) + d.addCallback(_build_verinfo) + return d + + + def flush(self): + """ + I flush my queue of read vectors. + """ + d = self._read(self._readvs) + def _then(results): + self_readv = [] + self._queue_observers.notify(results) + self._queue_observers = observer.ObserverList() + d.addCallback(_then) + + + def _read(self, readvs, force_remote=False, queue=False): + unsatisfiable = filter(lambda x: x[0] + x[1] > len(self._data), readvs) + # TODO: It's entirely possible to tweak this so that it just + # fulfills the requests that it can, and not demand that all + # requests are satisfiable before running it. + if not unsatisfiable and not force_remote: + results = [self._data[offset:offset+length] + for (offset, length) in readvs] + results = {self.shnum: results} + return defer.succeed(results) + else: + if queue: + start = len(self._readvs) + self._readvs += readvs + end = len(self._readvs) + def _get_results(results, start, end): + if not self.shnum in results: + return {self._shnum: [""]} + return {self.shnum: results[self.shnum][start:end]} + d = defer.Deferred() + d.addCallback(_get_results, start, end) + self._queue_observers.subscribe(d.callback) + return d + return self._rref.callRemote("slot_readv", + self._storage_index, + [self.shnum], + readvs) + + + def is_sdmf(self): + """I tell my caller whether or not my remote file is SDMF or MDMF + """ + d = self._maybe_fetch_offsets_and_header() + d.addCallback(lambda ignored: + self._version_number == 0) + return d + + +class LayoutInvalid(Exception): + """ + This isn't a valid MDMF mutable file + """ hunk ./src/allmydata/test/test_storage.py 24 ReadBucketProxy from allmydata.mutable.layout import MDMFSlotWriteProxy, MDMFSlotReadProxy, \ LayoutInvalid, MDMFSIGNABLEHEADER, \ - SIGNED_PREFIX + SIGNED_PREFIX, MDMFHEADER from allmydata.interfaces import BadWriteEnablerError, MDMF_VERSION, \ SDMF_VERSION from allmydata.test.common import LoggingServiceParent, ShouldFailMixin hunk ./src/allmydata/test/test_storage.py 1321 self.encprivkey = "private" self.root_hash = self.block_hash self.salt_hash = self.root_hash + self.salt_hash_tree = [self.salt_hash for i in xrange(6)] self.block_hash_tree_s = self.serialize_blockhashes(self.block_hash_tree) self.share_hash_chain_s = self.serialize_sharehashes(self.share_hash_chain) hunk ./src/allmydata/test/test_storage.py 1324 + # blockhashes and salt hashes are serialized in the same way, + # only we lop off the first element and store that in the + # header. + self.salt_hash_tree_s = self.serialize_blockhashes(self.salt_hash_tree[1:]) def tearDown(self): hunk ./src/allmydata/test/test_storage.py 1393 salts = "" else: salts = self.salt * 6 - share_offset = 143 + len(salts) + share_offset = 151 + len(salts) if tail_segment: sharedata = self.block * 6 elif empty: hunk ./src/allmydata/test/test_storage.py 1404 encrypted_private_key_offset = share_offset + len(sharedata) # The blockhashes come after the private key blockhashes_offset = encrypted_private_key_offset + len(self.encprivkey) - # The sharehashes come after the blockhashes - sharehashes_offset = blockhashes_offset + len(self.block_hash_tree_s) + # The salthashes come after the blockhashes + salthashes_offset = blockhashes_offset + len(self.block_hash_tree_s) + # The sharehashes come after the salt hashes + sharehashes_offset = salthashes_offset + len(self.salt_hash_tree_s) # The signature comes after the share hash chain signature_offset = sharehashes_offset + len(self.share_hash_chain_s) # The verification key comes after the signature hunk ./src/allmydata/test/test_storage.py 1414 verification_offset = signature_offset + len(self.signature) # The EOF comes after the verification key eof_offset = verification_offset + len(self.verification_key) - data += struct.pack(">LQQQQQQ", + data += struct.pack(">LQQQQQQQ", share_offset, encrypted_private_key_offset, blockhashes_offset, hunk ./src/allmydata/test/test_storage.py 1418 + salthashes_offset, sharehashes_offset, signature_offset, verification_offset, hunk ./src/allmydata/test/test_storage.py 1427 self.offsets['share_data'] = share_offset self.offsets['enc_privkey'] = encrypted_private_key_offset self.offsets['block_hash_tree'] = blockhashes_offset + self.offsets['salt_hash_tree'] = salthashes_offset self.offsets['share_hash_chain'] = sharehashes_offset self.offsets['signature'] = signature_offset self.offsets['verification_key'] = verification_offset hunk ./src/allmydata/test/test_storage.py 1440 data += self.encprivkey # the block hash tree, data += self.block_hash_tree_s + # the salt hash tree + data += self.salt_hash_tree_s # the share hash chain, data += self.share_hash_chain_s # the signature, hunk ./src/allmydata/test/test_storage.py 1562 d.addCallback(lambda blockhashes: self.failUnlessEqual(self.block_hash_tree, blockhashes)) + d.addCallback(lambda ignored: + mr.get_salthashes()) + d.addCallback(lambda salthashes: + self.failUnlessEqual(self.salt_hash_tree[1:], salthashes)) + d.addCallback(lambda ignored: mr.get_sharehashes()) d.addCallback(lambda sharehashes: hunk ./src/allmydata/test/test_storage.py 1618 return d + def test_read_salthashes_on_sdmf_file(self): + self.write_sdmf_share_to_server("si1") + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d = defer.succeed(None) + d.addCallback(lambda ignored: + mr.get_salthashes()) + d.addCallback(lambda results: + self.failIf(results)) + return d + + def test_read_with_different_tail_segment_size(self): self.write_test_share_to_server("si1", tail_segment=True) mr = MDMFSlotReadProxy(self.rref, "si1", 0) hunk ./src/allmydata/test/test_storage.py 1744 mw.put_blockhashes(self.block_hash_tree)) d.addCallback(_check_next_write) d.addCallback(lambda ignored: + mw.put_salthashes(self.salt_hash_tree)) + d.addCallback(_check_next_write) + d.addCallback(lambda ignored: mw.put_sharehashes(self.share_hash_chain)) d.addCallback(_check_next_write) # Add the root hash and the salt hash. This should change the hunk ./src/allmydata/test/test_storage.py 1754 # now, since the read vectors are applied before the write # vectors. d.addCallback(lambda ignored: - mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + mw.put_root_hash(self.root_hash)) def _check_old_testv_after_new_one_is_written(results): result, readvs = results self.failUnless(result) hunk ./src/allmydata/test/test_storage.py 1775 return d - def test_blockhashes_after_share_hash_chain(self): + def test_blockhashes_after_salt_hash_tree(self): mw = self._make_new_mw("si1", 0) d = defer.succeed(None) hunk ./src/allmydata/test/test_storage.py 1778 - # Put everything up to and including the share hash chain + # Put everything up to and including the salt hash tree for i in xrange(6): d.addCallback(lambda ignored, i=i: mw.put_block(self.block, i, self.salt)) hunk ./src/allmydata/test/test_storage.py 1787 d.addCallback(lambda ignored: mw.put_blockhashes(self.block_hash_tree)) d.addCallback(lambda ignored: - mw.put_sharehashes(self.share_hash_chain)) - # Now try to put a block hash tree after the share hash chain. + mw.put_salthashes(self.salt_hash_tree)) + # Now try to put a block hash tree after the salt hash tree # This won't necessarily overwrite the share hash chain, but it # is a bad idea in general -- if we write one that is anything # other than the exact size of the initial one, we will either hunk ./src/allmydata/test/test_storage.py 1804 return d + def test_salt_hash_tree_after_share_hash_chain(self): + mw = self._make_new_mw("si1", 0) + d = defer.succeed(None) + # Put everything up to and including the share hash chain + for i in xrange(6): + d.addCallback(lambda ignored, i=i: + mw.put_block(self.block, i, self.salt)) + d.addCallback(lambda ignored: + mw.put_encprivkey(self.encprivkey)) + d.addCallback(lambda ignored: + mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(lambda ignored: + mw.put_salthashes(self.salt_hash_tree)) + d.addCallback(lambda ignored: + mw.put_sharehashes(self.share_hash_chain)) + + # Now try to put the salt hash tree again. This should fail for + # the same reason that it fails in the previous test. + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "test repeat salthashes", + None, + mw.put_salthashes, self.salt_hash_tree)) + return d + + def test_encprivkey_after_blockhashes(self): mw = self._make_new_mw("si1", 0) d = defer.succeed(None) hunk ./src/allmydata/test/test_storage.py 1859 d.addCallback(lambda ignored: mw.put_blockhashes(self.block_hash_tree)) d.addCallback(lambda ignored: + mw.put_salthashes(self.salt_hash_tree)) + d.addCallback(lambda ignored: mw.put_sharehashes(self.share_hash_chain)) d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 1863 - mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + mw.put_root_hash(self.root_hash)) d.addCallback(lambda ignored: mw.put_signature(self.signature)) # Now try to put the share hash chain again. This should fail hunk ./src/allmydata/test/test_storage.py 1886 d.addCallback(lambda ignored: mw.put_blockhashes(self.block_hash_tree)) d.addCallback(lambda ignored: + mw.put_salthashes(self.salt_hash_tree)) + d.addCallback(lambda ignored: mw.put_sharehashes(self.share_hash_chain)) d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 1890 - mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + mw.put_root_hash(self.root_hash)) d.addCallback(lambda ignored: mw.put_signature(self.signature)) d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 1991 mw.put_blockhashes(self.block_hash_tree)) d.addCallback(_check_success) d.addCallback(lambda ignored: + mw.put_salthashes(self.salt_hash_tree)) + d.addCallback(_check_success) + d.addCallback(lambda ignored: mw.put_sharehashes(self.share_hash_chain)) d.addCallback(_check_success) def _keep_old_checkstring(ignored): hunk ./src/allmydata/test/test_storage.py 2001 mw.set_checkstring("foobarbaz") d.addCallback(_keep_old_checkstring) d.addCallback(lambda ignored: - mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + mw.put_root_hash(self.root_hash)) d.addCallback(_check_failure) d.addCallback(lambda ignored: self.failUnlessEqual(self.old_checkstring, mw.get_checkstring())) hunk ./src/allmydata/test/test_storage.py 2009 mw.set_checkstring(self.old_checkstring) d.addCallback(_restore_old_checkstring) d.addCallback(lambda ignored: - mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + mw.put_root_hash(self.root_hash)) + d.addCallback(_check_success) # The checkstring should have been set appropriately for us on # the last write; if we try to change it to something else, # that change should cause the verification key step to fail. hunk ./src/allmydata/test/test_storage.py 2071 d.addCallback(_fix_checkstring) d.addCallback(lambda ignored: mw.put_blockhashes(self.block_hash_tree)) + d.addCallback(lambda ignored: + mw.put_salthashes(self.salt_hash_tree)) d.addCallback(_break_checkstring) d.addCallback(lambda ignored: mw.put_sharehashes(self.share_hash_chain)) hunk ./src/allmydata/test/test_storage.py 2079 d.addCallback(lambda ignored: self.shouldFail(LayoutInvalid, "out-of-order root hash", None, - mw.put_root_and_salt_hashes, - self.root_hash, self.salt_hash)) + mw.put_root_hash, self.root_hash)) d.addCallback(_fix_checkstring) d.addCallback(lambda ignored: mw.put_sharehashes(self.share_hash_chain)) hunk ./src/allmydata/test/test_storage.py 2085 d.addCallback(_break_checkstring) d.addCallback(lambda ignored: - mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + mw.put_root_hash(self.root_hash)) d.addCallback(lambda ignored: self.shouldFail(LayoutInvalid, "out-of-order signature", None, hunk ./src/allmydata/test/test_storage.py 2092 mw.put_signature, self.signature)) d.addCallback(_fix_checkstring) d.addCallback(lambda ignored: - mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + mw.put_root_hash(self.root_hash)) d.addCallback(_break_checkstring) d.addCallback(lambda ignored: mw.put_signature(self.signature)) hunk ./src/allmydata/test/test_storage.py 2131 mw2 = self._make_new_mw("si1", 1) # Test writing some blocks. read = self.ss.remote_slot_readv + expected_salt_offset = struct.calcsize(MDMFHEADER) + expected_share_offset = expected_salt_offset + (16 * 6) def _check_block_write(i, share): hunk ./src/allmydata/test/test_storage.py 2134 - self.failUnlessEqual(read("si1", [share], [(239 + (i * 2), 2)]), + self.failUnlessEqual(read("si1", [share], [(expected_share_offset + (i * 2), 2)]), {share: [self.block]}) hunk ./src/allmydata/test/test_storage.py 2136 - self.failUnlessEqual(read("si1", [share], [(143 + (i * 16), 16)]), + self.failUnlessEqual(read("si1", [share], [(expected_salt_offset + (i * 16), 16)]), {share: [self.salt]}) d = defer.succeed(None) for i in xrange(6): hunk ./src/allmydata/test/test_storage.py 2151 d.addCallback(lambda ignored, i=i: _check_block_write(i, 1)) - def _spy_on_results(results): - print read("si1", [], [(0, 40000000)]) - return results - # Next, we make a fake encrypted private key, and put it onto the # storage server. d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 2160 # salts: 16 * 6 = 96 bytes # blocks: 2 * 6 = 12 bytes # = 251 bytes - expected_private_key_offset = 251 + expected_private_key_offset = expected_share_offset + len(self.block) * 6 self.failUnlessEqual(len(self.encprivkey), 7) d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 2163 - self.failUnlessEqual(read("si1", [0], [(251, 7)]), + self.failUnlessEqual(read("si1", [0], [(expected_private_key_offset, 7)]), {0: [self.encprivkey]})) # Next, we put a fake block hash tree. hunk ./src/allmydata/test/test_storage.py 2173 # header + salts + blocks: 251 bytes # encrypted private key: 7 bytes # = 258 bytes - expected_block_hash_offset = 258 + expected_block_hash_offset = expected_private_key_offset + len(self.encprivkey) self.failUnlessEqual(len(self.block_hash_tree_s), 32 * 6) d.addCallback(lambda ignored: self.failUnlessEqual(read("si1", [0], [(expected_block_hash_offset, 32 * 6)]), hunk ./src/allmydata/test/test_storage.py 2179 {0: [self.block_hash_tree_s]})) + # Next, we put a fake salt hash tree. + d.addCallback(lambda ignored: + mw.put_salthashes(self.salt_hash_tree)) + # The salt hash tree got inserted at + # header + salts + blocks + private key = 258 bytes + # block hash tree: 32 * 6 = 192 bytes + # = 450 bytes + expected_salt_hash_offset = expected_block_hash_offset + len(self.block_hash_tree_s) + d.addCallback(lambda ignored: + self.failUnlessEqual(read("si1", [0], [(expected_salt_hash_offset, 32 * 5)]), {0: [self.salt_hash_tree_s]})) + # Next, put a fake share hash chain d.addCallback(lambda ignored: mw.put_sharehashes(self.share_hash_chain)) hunk ./src/allmydata/test/test_storage.py 2196 # The share hash chain got inserted at: # header + salts + blocks + private key = 258 bytes # block hash tree: 32 * 6 = 192 bytes - # = 450 bytes - expected_share_hash_offset = 450 + # salt hash tree: 32 * 5 = 160 bytes + # = 610 + expected_share_hash_offset = expected_salt_hash_offset + len(self.salt_hash_tree_s) d.addCallback(lambda ignored: self.failUnlessEqual(read("si1", [0],[(expected_share_hash_offset, (32 + 2) * 6)]), {0: [self.share_hash_chain_s]})) hunk ./src/allmydata/test/test_storage.py 2204 # Next, we put what is supposed to be the root hash of - # our share hash tree but isn't, along with the flat hash - # of all the salts. + # our share hash tree but isn't d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 2206 - mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + mw.put_root_hash(self.root_hash)) # The root hash gets inserted at byte 9 (its position is in the header, # and is fixed). The salt is right after it. def _check(ignored): hunk ./src/allmydata/test/test_storage.py 2220 d.addCallback(lambda ignored: mw.put_signature(self.signature)) # The signature gets written to: - # header + salts + blocks + block and share hash tree = 654 - expected_signature_offset = 654 + # header + salts + blocks + block and salt and share hash tree = 814 + expected_signature_offset = expected_share_hash_offset + len(self.share_hash_chain_s) self.failUnlessEqual(len(self.signature), 9) d.addCallback(lambda ignored: self.failUnlessEqual(read("si1", [0], [(expected_signature_offset, 9)]), hunk ./src/allmydata/test/test_storage.py 2231 d.addCallback(lambda ignored: mw.put_verification_key(self.verification_key)) # The verification key gets written to: - # 654 + 9 = 663 bytes - expected_verification_key_offset = 663 + # 804 + 9 = 815 bytes + expected_verification_key_offset = expected_signature_offset + len(self.signature) self.failUnlessEqual(len(self.verification_key), 6) d.addCallback(lambda ignored: self.failUnlessEqual(read("si1", [0], [(expected_verification_key_offset, 6)]), hunk ./src/allmydata/test/test_storage.py 2256 # Next, we cause the offset table to be published. d.addCallback(lambda ignored: mw.finish_publishing()) - expected_eof_offset = 669 + expected_eof_offset = expected_verification_key_offset + len(self.verification_key) # The offset table starts at byte 91. Happily, we have already # worked out most of these offsets above, but we want to make hunk ./src/allmydata/test/test_storage.py 2289 self.failUnlessEqual(read("si1", [0], [(83, 8)]), {0: [expected_data_length]}) # 91 4 The offset of the share data - expected_offset = struct.pack(">L", 239) + expected_offset = struct.pack(">L", expected_share_offset) self.failUnlessEqual(read("si1", [0], [(91, 4)]), {0: [expected_offset]}) # 95 8 The offset of the encrypted private key hunk ./src/allmydata/test/test_storage.py 2300 expected_offset = struct.pack(">Q", expected_block_hash_offset) self.failUnlessEqual(read("si1", [0], [(103, 8)]), {0: [expected_offset]}) - # 111 8 The offset of the share hash chain - expected_offset = struct.pack(">Q", expected_share_hash_offset) + # 111 8 The offset of the salt hash tree + expected_offset = struct.pack(">Q", expected_salt_hash_offset) self.failUnlessEqual(read("si1", [0], [(111, 8)]), {0: [expected_offset]}) hunk ./src/allmydata/test/test_storage.py 2304 - # 119 8 The offset of the signature - expected_offset = struct.pack(">Q", expected_signature_offset) + # 119 8 The offset of the share hash chain + expected_offset = struct.pack(">Q", expected_share_hash_offset) self.failUnlessEqual(read("si1", [0], [(119, 8)]), {0: [expected_offset]}) hunk ./src/allmydata/test/test_storage.py 2308 - # 127 8 The offset of the verification key - expected_offset = struct.pack(">Q", expected_verification_key_offset) + # 127 8 The offset of the signature + expected_offset = struct.pack(">Q", expected_signature_offset) self.failUnlessEqual(read("si1", [0], [(127, 8)]), {0: [expected_offset]}) hunk ./src/allmydata/test/test_storage.py 2312 - # 135 8 offset of the EOF - expected_offset = struct.pack(">Q", expected_eof_offset) + # 135 8 offset of the verification_key + expected_offset = struct.pack(">Q", expected_verification_key_offset) self.failUnlessEqual(read("si1", [0], [(135, 8)]), {0: [expected_offset]}) hunk ./src/allmydata/test/test_storage.py 2316 - # = 143 bytes in total. + # 143 8 offset of the EOF + expected_offset = struct.pack(">Q", expected_eof_offset) + self.failUnlessEqual(read("si1", [0], [(143, 8)]), + {0: [expected_offset]}) d.addCallback(_check_offsets) return d hunk ./src/allmydata/test/test_storage.py 2362 return d - def test_write_rejected_with_invalid_salt_hash(self): - # Try writing an invalid salt hash. These should be SHA256d, and - # 32 bytes long as a result. - mw = self._make_new_mw("si2", 0) - invalid_salt_hash = "b" * 31 - d = defer.succeed(None) - # Before this test can work, we need to put some blocks + salts, - # a block hash tree, and a share hash tree. Otherwise, we'll see - # failures that match what we are looking for, but are caused by - # the constraints imposed on operation ordering. - for i in xrange(6): - d.addCallback(lambda ignored, i=i: - mw.put_block(self.block, i, self.salt)) - d.addCallback(lambda ignored: - mw.put_encprivkey(self.encprivkey)) - d.addCallback(lambda ignored: - mw.put_blockhashes(self.block_hash_tree)) - d.addCallback(lambda ignored: - mw.put_sharehashes(self.share_hash_chain)) - d.addCallback(lambda ignored: - self.shouldFail(LayoutInvalid, "invalid root hash", - None, mw.put_root_and_salt_hashes, - self.root_hash, invalid_salt_hash)) - return d - - def test_write_rejected_with_invalid_root_hash(self): # Try writing an invalid root hash. This should be SHA256d, and # 32 bytes long as a result. hunk ./src/allmydata/test/test_storage.py 2381 d.addCallback(lambda ignored: mw.put_blockhashes(self.block_hash_tree)) d.addCallback(lambda ignored: + mw.put_salthashes(self.salt_hash_tree)) + d.addCallback(lambda ignored: mw.put_sharehashes(self.share_hash_chain)) d.addCallback(lambda ignored: self.shouldFail(LayoutInvalid, "invalid root hash", hunk ./src/allmydata/test/test_storage.py 2386 - None, mw.put_root_and_salt_hashes, - invalid_root_hash, self.salt_hash)) + None, mw.put_root_hash, invalid_root_hash)) return d hunk ./src/allmydata/test/test_storage.py 2461 mw0.put_encprivkey(self.encprivkey)) + # Try to write the salt hash tree without writing the block hash + # tree. + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "salt hash tree before bht", + None, + mw0.put_salthashes, self.salt_hash_tree)) + + # Try to write the share hash chain without writing the block # hash tree d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 2473 self.shouldFail(LayoutInvalid, "share hash chain before " - "block hash tree", + "salt hash tree", None, mw0.put_sharehashes, self.share_hash_chain)) hunk ./src/allmydata/test/test_storage.py 2478 # Try to write the root hash and salt hash without writing either the - # block hashes or the share hashes + # block hashes or the salt hashes or the share hashes d.addCallback(lambda ignored: self.shouldFail(LayoutInvalid, "root hash before share hashes", None, hunk ./src/allmydata/test/test_storage.py 2482 - mw0.put_root_and_salt_hashes, - self.root_hash, self.salt_hash)) + mw0.put_root_hash, self.root_hash)) # Now write the block hashes and try again d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 2487 mw0.put_blockhashes(self.block_hash_tree)) + + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "share hash before salt hashes", + None, + mw0.put_sharehashes, self.share_hash_chain)) d.addCallback(lambda ignored: self.shouldFail(LayoutInvalid, "root hash before share hashes", hunk ./src/allmydata/test/test_storage.py 2494 - None, mw0.put_root_and_salt_hashes, - self.root_hash, self.salt_hash)) + None, mw0.put_root_hash, self.root_hash)) # We haven't yet put the root hash on the share, so we shouldn't # be able to sign it. hunk ./src/allmydata/test/test_storage.py 2512 None, mw0.put_verification_key, self.verification_key)) - # Now write the share hashes and verify that it works. + # Now write the salt hashes, and try again. d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 2514 - mw0.put_sharehashes(self.share_hash_chain)) + mw0.put_salthashes(self.salt_hash_tree)) + + d.addCallback(lambda ignored: + self.shouldFail(LayoutInvalid, "root hash before share hashes", + None, + mw0.put_root_hash, self.root_hash)) # We should still be unable to sign the header d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 2527 None, mw0.put_signature, self.signature)) + # Now write the share hashes. + d.addCallback(lambda ignored: + mw0.put_sharehashes(self.share_hash_chain)) # We should be able to write the root hash now too d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 2532 - mw0.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + mw0.put_root_hash(self.root_hash)) # We should still be unable to put the verification key d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 2569 d.addCallback(lambda ignored: mw.put_blockhashes(self.block_hash_tree)) d.addCallback(lambda ignored: + mw.put_salthashes(self.salt_hash_tree)) + d.addCallback(lambda ignored: mw.put_sharehashes(self.share_hash_chain)) d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 2573 - mw.put_root_and_salt_hashes(self.root_hash, self.salt_hash)) + mw.put_root_hash(self.root_hash)) d.addCallback(lambda ignored: mw.put_signature(self.signature)) d.addCallback(lambda ignored: hunk ./src/allmydata/test/test_storage.py 2768 # This should be enough to fill in both the encoding parameters # and the table of offsets, which will complete the version # information tuple. - d.addCallback(_make_mr, 143) + d.addCallback(_make_mr, 151) d.addCallback(lambda mr: mr.get_verinfo()) def _check_verinfo(verinfo): hunk ./src/allmydata/test/test_storage.py 2804 d.addCallback(_check_verinfo) # This is not enough data to read a block and a share, so the # wrapper should attempt to read this from the remote server. - d.addCallback(_make_mr, 143) + d.addCallback(_make_mr, 151) d.addCallback(lambda mr: mr.get_block_and_salt(0)) def _check_block_and_salt((block, salt)): hunk ./src/allmydata/test/test_storage.py 2815 # are 6 * 16 = 96 bytes of salts before we can write shares. # Each block has two bytes, so 143 + 96 + 2 = 241 bytes should # be enough to read one block. - d.addCallback(_make_mr, 241) + d.addCallback(_make_mr, 249) d.addCallback(lambda mr: mr.get_block_and_salt(0)) d.addCallback(_check_block_and_salt) hunk ./src/allmydata/test/test_storage.py 3022 return d + def test_reader_queue(self): + self.write_test_share_to_server('si1') + mr = MDMFSlotReadProxy(self.rref, "si1", 0) + d1 = mr.get_block_and_salt(0, queue=True) + d2 = mr.get_blockhashes(queue=True) + d3 = mr.get_salthashes(queue=True) + d4 = mr.get_sharehashes(queue=True) + d5 = mr.get_signature(queue=True) + d6 = mr.get_verification_key(queue=True) + dl = defer.DeferredList([d1, d2, d3, d4, d5, d6]) + mr.flush() + def _print(results): + self.failUnlessEqual(len(results), 6) + # We have one read for version information, one for offsets, and + # one for everything else. + self.failUnlessEqual(self.rref.read_count, 3) + block, salt = results[0][1] # results[0] is a boolean that says + # whether or not the operation + # worked. + self.failUnlessEqual(self.block, block) + self.failUnlessEqual(self.salt, salt) + + blockhashes = results[1][1] + self.failUnlessEqual(self.block_hash_tree, blockhashes) + + salthashes = results[2][1] + self.failUnlessEqual(self.salt_hash_tree[1:], salthashes) + + sharehashes = results[3][1] + self.failUnlessEqual(self.share_hash_chain, sharehashes) + + signature = results[4][1] + self.failUnlessEqual(self.signature, signature) + + verification_key = results[5][1] + self.failUnlessEqual(self.verification_key, verification_key) + dl.addCallback(_print) + return dl + + class Stats(unittest.TestCase): def setUp(self): } [A first stab at a segmented uploader Kevan Carstensen **20100623233248 Ignore-this: 8df33da0795f4ff5948d4878933d03a2 This uploader will upload, segment-by-segment, MDMF files. It will only do this if it thinks that the filenode that it is uploading represents an MDMF file; otherwise, it uploads the file as SDMF. My TODO list so far: - More robust peer selection; we'll want to use something like servers of happiness to figure out reliability and unreliability. - Clean up. ] { hunk ./src/allmydata/mutable/publish.py 8 from zope.interface import implements from twisted.internet import defer from twisted.python import failure -from allmydata.interfaces import IPublishStatus +from allmydata.interfaces import IPublishStatus, SDMF_VERSION, MDMF_VERSION from allmydata.util import base32, hashutil, mathutil, idlib, log from allmydata import hashtree, codec from allmydata.storage.server import si_b2a hunk ./src/allmydata/mutable/publish.py 19 UncoordinatedWriteError, NotEnoughServersError from allmydata.mutable.servermap import ServerMap from allmydata.mutable.layout import pack_prefix, pack_share, unpack_header, pack_checkstring, \ - unpack_checkstring, SIGNED_PREFIX + unpack_checkstring, SIGNED_PREFIX, MDMFSlotWriteProxy + +KiB = 1024 +DEFAULT_MAX_SEGMENT_SIZE = 128 * KiB class PublishStatus: implements(IPublishStatus) hunk ./src/allmydata/mutable/publish.py 112 self._status.set_helper(False) self._status.set_progress(0.0) self._status.set_active(True) + # We use this to control how the file is written. + version = self._node.get_version() + assert version in (SDMF_VERSION, MDMF_VERSION) + self._version = version def get_status(self): return self._status hunk ./src/allmydata/mutable/publish.py 134 simultaneous write. """ - # 1: generate shares (SDMF: files are small, so we can do it in RAM) - # 2: perform peer selection, get candidate servers - # 2a: send queries to n+epsilon servers, to determine current shares - # 2b: based upon responses, create target map - # 3: send slot_testv_and_readv_and_writev messages - # 4: as responses return, update share-dispatch table - # 4a: may need to run recovery algorithm - # 5: when enough responses are back, we're done + # 0. Setup encoding parameters, encoder, and other such things. + # 1. Encrypt, encode, and publish segments. self.log("starting publish, datalen is %s" % len(newdata)) self._status.set_size(len(newdata)) hunk ./src/allmydata/mutable/publish.py 187 self.bad_peers = set() # peerids who have errbacked/refused requests self.newdata = newdata - self.salt = os.urandom(16) hunk ./src/allmydata/mutable/publish.py 188 + # This will set self.segment_size, self.num_segments, and + # self.fec. self.setup_encoding_parameters() # if we experience any surprises (writes which were rejected because hunk ./src/allmydata/mutable/publish.py 238 self.bad_share_checkstrings[key] = old_checkstring self.connections[peerid] = self._servermap.connections[peerid] - # create the shares. We'll discard these as they are delivered. SDMF: - # we're allowed to hold everything in memory. + # Now, the process dovetails -- if this is an SDMF file, we need + # to write an SDMF file. Otherwise, we need to write an MDMF + # file. + if self._version == MDMF_VERSION: + return self._publish_mdmf() + else: + return self._publish_sdmf() + #return self.done_deferred + + def _publish_mdmf(self): + # Next, we find homes for all of the shares that we don't have + # homes for yet. + # TODO: Make this part do peer selection. + self.update_goal() + self.writers = {} + # For each (peerid, shnum) in self.goal, we make an + # MDMFSlotWriteProxy for that peer. We'll use this to write + # shares to the peer. + for key in self.goal: + peerid, shnum = key + write_enabler = self._node.get_write_enabler(peerid) + renew_secret = self._node.get_renewal_secret(peerid) + cancel_secret = self._node.get_cancel_secret(peerid) + secrets = (write_enabler, renew_secret, cancel_secret) + + self.writers[shnum] = MDMFSlotWriteProxy(shnum, + self.connections[peerid], + self._storage_index, + secrets, + self._new_seqnum, + self.required_shares, + self.total_shares, + self.segment_size, + len(self.newdata)) + if (peerid, shnum) in self._servermap.servermap: + old_versionid, old_timestamp = self._servermap.servermap[key] + (old_seqnum, old_root_hash, old_salt, old_segsize, + old_datalength, old_k, old_N, old_prefix, + old_offsets_tuple) = old_versionid + old_checkstring = pack_checkstring(old_seqnum, + old_root_hash, + old_salt, 1) + self.writers[shnum].set_checkstring(old_checkstring) + + # Now, we start pushing shares. + self._status.timings["setup"] = time.time() - self._started + def _start_pushing(res): + self._started_pushing = time.time() + return res + + # First, we encrypt, encode, and publish the shares that we need + # to encrypt, encode, and publish. + + # This will eventually hold the block hash chain for each share + # that we publish. We define it this way so that empty publishes + # will still have something to write to the remote slot. + self.blockhashes = dict([(i, []) for i in xrange(self.total_shares)]) + self.sharehash_leaves = None # eventually [sharehashes] + self.sharehashes = {} # shnum -> [sharehash leaves necessary to + # validate the share] hunk ./src/allmydata/mutable/publish.py 299 + d = defer.succeed(None) + self.log("Starting push") + for i in xrange(self.num_segments - 1): + d.addCallback(lambda ignored, i=i: + self.push_segment(i)) + d.addCallback(self._turn_barrier) + # We have at least one segment, so we will have a tail segment + if self.num_segments > 0: + d.addCallback(lambda ignored: + self.push_tail_segment()) + + d.addCallback(lambda ignored: + self.push_encprivkey()) + d.addCallback(lambda ignored: + self.push_blockhashes()) + d.addCallback(lambda ignored: + self.push_salthashes()) + d.addCallback(lambda ignored: + self.push_sharehashes()) + d.addCallback(lambda ignored: + self.push_toplevel_hashes_and_signature()) + d.addCallback(lambda ignored: + self.finish_publishing()) + return d + + + def _publish_sdmf(self): self._status.timings["setup"] = time.time() - self._started hunk ./src/allmydata/mutable/publish.py 327 + self.salt = os.urandom(16) + d = self._encrypt_and_encode() d.addCallback(self._generate_shares) def _start_pushing(res): hunk ./src/allmydata/mutable/publish.py 340 return self.done_deferred + def setup_encoding_parameters(self): hunk ./src/allmydata/mutable/publish.py 342 - segment_size = len(self.newdata) + if self._version == MDMF_VERSION: + segment_size = DEFAULT_MAX_SEGMENT_SIZE # 128 KiB by default + else: + segment_size = len(self.newdata) # SDMF is only one segment # this must be a multiple of self.required_shares segment_size = mathutil.next_multiple(segment_size, self.required_shares) hunk ./src/allmydata/mutable/publish.py 355 segment_size) else: self.num_segments = 0 - assert self.num_segments in [0, 1,] # SDMF restrictions + if self._version == SDMF_VERSION: + assert self.num_segments in (0, 1) # SDMF + return + # calculate the tail segment size. + self.tail_segment_size = len(self.newdata) % segment_size + + if self.tail_segment_size == 0: + # The tail segment is the same size as the other segments. + self.tail_segment_size = segment_size + + # We'll make an encoder ahead-of-time for the normal-sized + # segments (defined as any segment of segment_size size. + # (the part of the code that puts the tail segment will make its + # own encoder for that part) + fec = codec.CRSEncoder() + fec.set_params(self.segment_size, + self.required_shares, self.total_shares) + self.piece_size = fec.get_block_size() + self.fec = fec + # This is not technically part of the encoding parameters, but + # that we are setting up the encoder and encoding parameters is + # a good indicator that we will soon need it. + self.salt_hashes = [] + + + def push_segment(self, segnum): + started = time.time() + segsize = self.segment_size + self.log("Pushing segment %d of %d" % (segnum + 1, self.num_segments)) + data = self.newdata[segsize * segnum:segsize*(segnum + 1)] + assert len(data) == segsize + + salt = os.urandom(16) + self.salt_hashes.append(hashutil.mutable_salt_hash(salt)) + + key = hashutil.ssk_readkey_data_hash(salt, self.readkey) + enc = AES(key) + crypttext = enc.process(data) + assert len(crypttext) == len(data) + + now = time.time() + self._status.timings["encrypt"] = now - started + started = now + + # now apply FEC + + self._status.set_status("Encoding") + crypttext_pieces = [None] * self.required_shares + piece_size = self.piece_size + for i in range(len(crypttext_pieces)): + offset = i * piece_size + piece = crypttext[offset:offset+piece_size] + piece = piece + "\x00"*(piece_size - len(piece)) # padding + crypttext_pieces[i] = piece + assert len(piece) == piece_size + d = self.fec.encode(crypttext_pieces) + def _done_encoding(res): + elapsed = time.time() - started + self._status.timings["encode"] = elapsed + return res + d.addCallback(_done_encoding) + + def _push_shares_and_salt(results): + shares, shareids = results + dl = [] + for i in xrange(len(shares)): + sharedata = shares[i] + shareid = shareids[i] + block_hash = hashutil.block_hash(sharedata) + self.blockhashes[shareid].append(block_hash) + + # find the writer for this share + d = self.writers[shareid].put_block(sharedata, segnum, salt) + dl.append(d) + # TODO: Naturally, we need to check on the results of these. + return defer.DeferredList(dl) + d.addCallback(_push_shares_and_salt) + return d + + + def push_tail_segment(self): + # This is essentially the same as push_segment, except that we + # don't use the cached encoder that we use elsewhere. + self.log("Pushing tail segment") + started = time.time() + segsize = self.segment_size + data = self.newdata[segsize * (self.num_segments-1):] + assert len(data) == self.tail_segment_size + salt = os.urandom(16) + self.salt_hashes.append(hashutil.mutable_salt_hash(salt)) + + key = hashutil.ssk_readkey_data_hash(salt, self.readkey) + enc = AES(key) + crypttext = enc.process(data) + assert len(crypttext) == len(data) + + now = time.time() + self._status.timings['encrypt'] = now - started + started = now + + self._status.set_status("Encoding") + tail_fec = codec.CRSEncoder() + tail_fec.set_params(self.tail_segment_size, + self.required_shares, + self.total_shares) + + crypttext_pieces = [None] * self.required_shares + piece_size = tail_fec.get_block_size() + for i in range(len(crypttext_pieces)): + offset = i * piece_size + piece = crypttext[offset:offset+piece_size] + piece = piece + "\x00"*(piece_size - len(piece)) # padding + crypttext_pieces[i] = piece + assert len(piece) == piece_size + d = tail_fec.encode(crypttext_pieces) + def _push_shares_and_salt(results): + shares, shareids = results + dl = [] + for i in xrange(len(shares)): + sharedata = shares[i] + shareid = shareids[i] + block_hash = hashutil.block_hash(sharedata) + self.blockhashes[shareid].append(block_hash) + # find the writer for this share + d = self.writers[shareid].put_block(sharedata, + self.num_segments - 1, + salt) + dl.append(d) + # TODO: Naturally, we need to check on the results of these. + return defer.DeferredList(dl) + d.addCallback(_push_shares_and_salt) + return d + + + def push_encprivkey(self): + started = time.time() + encprivkey = self._encprivkey + dl = [] + def _spy_on_writer(results): + print results + return results + for shnum, writer in self.writers.iteritems(): + d = writer.put_encprivkey(encprivkey) + dl.append(d) + d = defer.DeferredList(dl) + return d + + + def push_blockhashes(self): + started = time.time() + dl = [] + def _spy_on_results(results): + print results + return results + self.sharehash_leaves = [None] * len(self.blockhashes) + for shnum, blockhashes in self.blockhashes.iteritems(): + t = hashtree.HashTree(blockhashes) + self.blockhashes[shnum] = list(t) + # set the leaf for future use. + self.sharehash_leaves[shnum] = t[0] + d = self.writers[shnum].put_blockhashes(self.blockhashes[shnum]) + dl.append(d) + d = defer.DeferredList(dl) + return d + + + def push_salthashes(self): + started = time.time() + dl = [] + t = hashtree.HashTree(self.salt_hashes) + pushing = list(t) + for shnum in self.writers.iterkeys(): + d = self.writers[shnum].put_salthashes(t) + dl.append(d) + dl = defer.DeferredList(dl) + return dl + + + def push_sharehashes(self): + share_hash_tree = hashtree.HashTree(self.sharehash_leaves) + share_hash_chain = {} + ds = [] + def _spy_on_results(results): + print results + return results + for shnum in xrange(len(self.sharehash_leaves)): + needed_indices = share_hash_tree.needed_hashes(shnum) + self.sharehashes[shnum] = dict( [ (i, share_hash_tree[i]) + for i in needed_indices] ) + d = self.writers[shnum].put_sharehashes(self.sharehashes[shnum]) + ds.append(d) + self.root_hash = share_hash_tree[0] + d = defer.DeferredList(ds) + return d + + + def push_toplevel_hashes_and_signature(self): + # We need to to three things here: + # - Push the root hash and salt hash + # - Get the checkstring of the resulting layout; sign that. + # - Push the signature + ds = [] + def _spy_on_results(results): + print results + return results + for shnum in xrange(self.total_shares): + d = self.writers[shnum].put_root_hash(self.root_hash) + ds.append(d) + d = defer.DeferredList(ds) + def _make_and_place_signature(ignored): + signable = self.writers[0].get_signable() + self.signature = self._privkey.sign(signable) + + ds = [] + for (shnum, writer) in self.writers.iteritems(): + d = writer.put_signature(self.signature) + ds.append(d) + return defer.DeferredList(ds) + d.addCallback(_make_and_place_signature) + return d + + + def finish_publishing(self): + # We're almost done -- we just need to put the verification key + # and the offsets + ds = [] + verification_key = self._pubkey.serialize() + + def _spy_on_results(results): + print results + return results + for (shnum, writer) in self.writers.iteritems(): + d = writer.put_verification_key(verification_key) + d.addCallback(lambda ignored, writer=writer: + writer.finish_publishing()) + ds.append(d) + return defer.DeferredList(ds) + + + def _turn_barrier(self, res): + # putting this method in a Deferred chain imposes a guaranteed + # reactor turn between the pre- and post- portions of that chain. + # This can be useful to limit memory consumption: since Deferreds do + # not do tail recursion, code which uses defer.succeed(result) for + # consistency will cause objects to live for longer than you might + # normally expect. + return fireEventually(res) + def _fatal_error(self, f): self.log("error during loop", failure=f, level=log.UNUSUAL) hunk ./src/allmydata/mutable/publish.py 739 self.log_goal(self.goal, "after update: ") - def _encrypt_and_encode(self): # this returns a Deferred that fires with a list of (sharedata, # sharenum) tuples. TODO: cache the ciphertext, only produce the hunk ./src/allmydata/mutable/publish.py 780 d.addCallback(_done_encoding) return d + def _generate_shares(self, shares_and_shareids): # this sets self.shares and self.root_hash self.log("_generate_shares") hunk ./src/allmydata/mutable/publish.py 1168 self._status.set_progress(1.0) eventually(self.done_deferred.callback, res) - hunk ./src/allmydata/test/test_mutable.py 593 k, N, segsize, datalen) self.failUnless(p._pubkey.verify(sig_material, signature)) #self.failUnlessEqual(signature, p._privkey.sign(sig_material)) - self.failUnless(isinstance(share_hash_chain, dict)) - # TODO: Revisit this to make sure that the additional - # share hashes are really necessary. - # - # (just because they magically make the tests pass does - # not mean that they are necessary) - # ln2(10)++ + 1 for leaves. - self.failUnlessEqual(len(share_hash_chain), 5) + self.failUnlessEqual(len(share_hash_chain), 4) # ln2(10)++ for shnum,share_hash in share_hash_chain.items(): self.failUnless(isinstance(shnum, int)) self.failUnless(isinstance(share_hash, str)) } [Make the mutable downloader batch its reads Kevan Carstensen **20100623233503 Ignore-this: a948f48080d11f5d0c2c67be9105452b ] { hunk ./src/allmydata/mutable/retrieve.py 215 """ I set up the encoding parameters, including k, n, the number of segments associated with this file, and the segment decoder. - I do not set the tail segment decoder, which is set in the - method that decodes the tail segment, as it is single-use. """ hunk ./src/allmydata/mutable/retrieve.py 216 - # XXX: Or is it? What if servers fail in the last step? (seqnum, root_hash, IV, hunk ./src/allmydata/mutable/retrieve.py 457 self._active_readers.remove(reader) # TODO: self.readers.remove(reader)? for shnum in list(self.remaining_sharemap.keys()): - # TODO: Make sure that we set reader.peerid somewhere. self.remaining_sharemap.discard(shnum, reader.peerid) hunk ./src/allmydata/mutable/retrieve.py 486 self._bad_shares.add((reader.peerid, reader.shnum)) self._status.problems[reader.peerid] = f self._last_failure = f - self.notify_server_corruption(reader.peerid, reader.shnum, f.value) + self.notify_server_corruption(reader.peerid, reader.shnum, + str(f.value)) def _download_current_segment(self): hunk ./src/allmydata/mutable/retrieve.py 523 # successful, we will assemble the results into plaintext. ds = [] for reader in self._active_readers: - d = reader.get_block_and_salt(segnum) - d.addCallback(self._validate_block, segnum, reader) - d.addErrback(self._validation_failed, reader) - ds.append(d) + d = reader.get_block_and_salt(segnum, queue=True) + d2 = self._get_needed_hashes(reader, segnum) + dl = defer.DeferredList([d, d2]) + dl.addCallback(self._validate_block, segnum, reader) + dl.addErrback(self._validation_failed, reader) + ds.append(dl) + reader.flush() dl = defer.DeferredList(ds) dl.addCallback(self._maybe_decode_and_decrypt_segment, segnum) return dl hunk ./src/allmydata/mutable/retrieve.py 609 return - def _validate_block(self, (block, salt), segnum, reader): + def _validate_block(self, results, segnum, reader): """ I validate a block from one share on a remote server. """ hunk ./src/allmydata/mutable/retrieve.py 615 # Grab the part of the block hash tree that is necessary to # validate this block, then generate the block hash root. - d = self._get_needed_hashes(reader, segnum) - def _handle_validation(block_and_sharehashes): - self.log("validating share %d for segment %d" % (reader.shnum, + self.log("validating share %d for segment %d" % (reader.shnum, segnum)) hunk ./src/allmydata/mutable/retrieve.py 617 - blockhashes, sharehashes = block_and_sharehashes - blockhashes = dict(enumerate(blockhashes[1])) - bht = self._block_hash_trees[reader.shnum] - # If we needed sharehashes in the last step, we'll want to - # get those dealt with before we start processing the - # blockhashes. - if self.share_hash_tree.needed_hashes(reader.shnum): - try: - self.share_hash_tree.set_hashes(hashes=sharehashes[1]) - except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \ - IndexError), e: - # XXX: This is a stupid message -- make it more - # informative. - raise CorruptShareError(reader.peerid, - reader.shnum, - "corrupt hashes: %s" % e) + # Did we fail to fetch either of the things that we were + # supposed to? Fail if so. + if not results[0][0] and results[1][0]: + # handled by the errback handler. + raise CorruptShareError("Connection error") hunk ./src/allmydata/mutable/retrieve.py 623 - if not bht[0]: - share_hash = self.share_hash_tree.get_leaf(reader.shnum) - if not share_hash: - raise CorruptShareError(reader.peerid, - reader.shnum, - "missing the root hash") - bht.set_hashes({0: share_hash}) + block_and_salt, block_and_sharehashes = results + block, salt = block_and_salt[1] + blockhashes, sharehashes = block_and_sharehashes[1] hunk ./src/allmydata/mutable/retrieve.py 627 - if bht.needed_hashes(segnum, include_leaf=True): - try: - bht.set_hashes(blockhashes) - except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \ - IndexError), e: - raise CorruptShareError(reader.peerid, - reader.shnum, - "block hash tree failure: %s" % e) + blockhashes = dict(enumerate(blockhashes[1])) + self.log("the reader gave me the following blockhashes: %s" % \ + blockhashes.keys()) + self.log("the reader gave me the following sharehashes: %s" % \ + sharehashes[1].keys()) + bht = self._block_hash_trees[reader.shnum] hunk ./src/allmydata/mutable/retrieve.py 634 - blockhash = hashutil.block_hash(block) - self.log("got blockhash %s" % [blockhash]) - self.log("comparing to tree %s" % bht) - # If this works without an error, then validation is - # successful. + if bht.needed_hashes(segnum, include_leaf=True): try: hunk ./src/allmydata/mutable/retrieve.py 636 - bht.set_hashes(leaves={segnum: blockhash}) + bht.set_hashes(blockhashes) except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \ IndexError), e: raise CorruptShareError(reader.peerid, hunk ./src/allmydata/mutable/retrieve.py 643 reader.shnum, "block hash tree failure: %s" % e) - # TODO: Validate the salt, too. - self.log('share %d is valid for segment %d' % (reader.shnum, - segnum)) - return {reader.shnum: (block, salt)} - d.addCallback(_handle_validation) - return d + blockhash = hashutil.block_hash(block) + # If this works without an error, then validation is + # successful. + try: + bht.set_hashes(leaves={segnum: blockhash}) + except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \ + IndexError), e: + raise CorruptShareError(reader.peerid, + reader.shnum, + "block hash tree failure: %s" % e) + + # Reaching this point means that we know that this segment + # is correct. Now we need to check to see whether the share + # hash chain is also correct. + # SDMF wrote share hash chains that didn't contain the + # leaves, which would be produced from the block hash tree. + # So we need to validate the block hash tree first. If + # successful, then bht[0] will contain the root for the + # shnum, which will be a leaf in the share hash tree, which + # will allow us to validate the rest of the tree. + if self.share_hash_tree.needed_hashes(reader.shnum, + include_leaf=True): + try: + self.share_hash_tree.set_hashes(hashes=sharehashes[1], + leaves={reader.shnum: bht[0]}) + except (hashtree.BadHashError, hashtree.NotEnoughHashesError, \ + IndexError), e: + raise CorruptShareError(reader.peerid, + reader.shnum, + "corrupt hashes: %s" % e) + + # TODO: Validate the salt, too. + self.log('share %d is valid for segment %d' % (reader.shnum, + segnum)) + return {reader.shnum: (block, salt)} def _get_needed_hashes(self, reader, segnum): hunk ./src/allmydata/mutable/retrieve.py 695 # hash tree, and is a leaf in the share hash tree. This is fine, # since any share corruption will be detected in the share hash # tree. - needed.discard(0) - # XXX: not now, causes test failures. + #needed.discard(0) self.log("getting blockhashes for segment %d, share %d: %s" % \ (segnum, reader.shnum, str(needed))) hunk ./src/allmydata/mutable/retrieve.py 698 - d1 = reader.get_blockhashes(needed) + d1 = reader.get_blockhashes(needed, queue=True, force_remote=True) if self.share_hash_tree.needed_hashes(reader.shnum): need = self.share_hash_tree.needed_hashes(reader.shnum) self.log("also need sharehashes for share %d: %s" % (reader.shnum, hunk ./src/allmydata/mutable/retrieve.py 703 str(need))) - d2 = reader.get_sharehashes(need) + d2 = reader.get_sharehashes(need, queue=True, force_remote=True) else: hunk ./src/allmydata/mutable/retrieve.py 705 - d2 = defer.succeed(None) + d2 = defer.succeed({}) # the logic in the next method + # expects a dict dl = defer.DeferredList([d1, d2]) return dl } Context: [docs: about.html link to home page early on, and be decentralized storage instead of cloud storage this time around zooko@zooko.com**20100619065318 Ignore-this: dc6db03f696e5b6d2848699e754d8053 ] [docs: update about.html, especially to have a non-broken link to quickstart.html, and also to comment out the broken links to "for Paranoids" and "for Corporates" zooko@zooko.com**20100619065124 Ignore-this: e292c7f51c337a84ebfeb366fbd24d6c ] [TAG allmydata-tahoe-1.7.0 zooko@zooko.com**20100619052631 Ignore-this: d21e27afe6d85e2e3ba6a3292ba2be1 ] [docs: update relnotes.txt for Tahoe-LAFS v1.7.0! zooko@zooko.com**20100619052048 Ignore-this: 1dd2c851f02adf3ab5a33040051fe05a ... and remove relnotes-short.txt (just use the first section of relnotes.txt for that purpose) ] [docs: update known_issues.txt with more detail about web browser "safe-browsing" features and slightly tweaked formatting zooko@zooko.com**20100619051734 Ignore-this: afc10be0da2517ddd0b58e42ef9aa46d ] [docs: quickstart.html: link to 1.7.0 zip file and add UTF-8 BOM zooko@zooko.com**20100619050124 Ignore-this: 5104fc90af542b97662b4016da975f34 ] [docs: more CREDITS for Kevan, plus utf-8 BOM zooko@zooko.com**20100619045809 Ignore-this: ee9c3b7cf7e385c8ca396091cebc9ca6 ] [docs: update NEWS for release 1.7.0 zooko@zooko.com**20100619045750 Ignore-this: 112c352fd52297ebff8138896fc6353d ] [docs: apply patch from duck for #937 about "tahoe run" not working on introducers zooko@zooko.com**20100619040754 Ignore-this: d7213313f16e524996e91058e287a954 ] [webapi.txt: fix statement about leap seconds. david-sarah@jacaranda.org**20100619035603 Ignore-this: 80b685446e915877a421cf3e31cedf30 ] [running.html: Tahoe->Tahoe-LAFS in what used to be using.html, and #tahoe->#tahoe-lafs (IRC channel). david-sarah@jacaranda.org**20100619033152 Ignore-this: a0dfdfb46eab639aaa064981fb933c5c ] [test_backupdb.py: skip test_unicode if we can't represent the test filenames. david-sarah@jacaranda.org**20100619022620 Ignore-this: 6ee564b6c07f9bb0e89a25dc5b37194f ] [test_web.py: correct a test that was missed in the change to not write ctime/mtime. david-sarah@jacaranda.org**20100619021718 Ignore-this: 92edc2e1fd43b3e86e6b49bc43bae122 ] [dirnode.py: stop writing 'ctime' and 'mtime' fields. Includes documentation and test changes. david-sarah@jacaranda.org**20100618230119 Ignore-this: 709119898499769dd64c7977db7c84a6 ] [test_storage.py: print more information on test failures. david-sarah@jacaranda.org**20100617034623 Ignore-this: cc9a8656802a718ca4f2a6a530d35977 ] [running.html: describe where 'bin/tahoe' is only once. david-sarah@jacaranda.org**20100617033603 Ignore-this: 6d92d9d8c77f3dfddfa7d061cbf2a791 ] [Merge using.html into running.html. david-sarah@jacaranda.org**20100617012857 Ignore-this: a0fa8b56621fdb976bef4e5f4f6c824a ] [Remove firewall section from running.html and say to read configuration.txt instead. david-sarah@jacaranda.org**20100617004513 Ignore-this: d2e46fffa4855b01093e8240b5fd1eff ] [FTP-and-SFTP.txt: add Known Issues section. david-sarah@jacaranda.org**20100619004311 Ignore-this: 8d9b1da941cbc24657bb6ec268f984dd ] [FTP-and-SFTP.txt: remove description of public key format that is not actually implemented. Document that SFTP does not support server private keys with passphrases, and that FTP cannot list directories containing mutable files. david-sarah@jacaranda.org**20100619001738 Ignore-this: bf9ef53b85b934822ec76060e1fcb3cb ] [configuration.txt and servers-of-happiness.txt: 1 <= happy <= N, not k <= happy <= N. Also minor wording changes. david-sarah@jacaranda.org**20100618050710 Ignore-this: edac0716e753e1f1c4c755c85bec9a19 ] [test_cli.py: fix test failure in CLI.test_listdir_unicode_good due to filenames returned from listdir_unicode no longer being normalized. david-sarah@jacaranda.org**20100618045110 Ignore-this: 598ffaef02d71e075f7e08fac44f48ff ] [tahoe backup: unicode tests. david-sarah@jacaranda.org**20100618035211 Ignore-this: 88ebab9f3218f083fdc635bff6599b60 ] [CLI: allow Unicode patterns in exclude option to 'tahoe backup'. david-sarah@jacaranda.org**20100617033901 Ignore-this: 9d971129e1c8bae3c1cc3220993d592e ] [dirnodes: fix normalization hole where childnames in directories created by nodemaker.create_mutable/immutable_directory would not be normalized. Add a test that we normalize names coming out of a directory. david-sarah@jacaranda.org**20100618000249 Ignore-this: 46a9226eff1003013b067edbdbd4c25b ] [dirnode.py: comments about normalization changes. david-sarah@jacaranda.org**20100617041411 Ignore-this: 9040c4854e73a71dbbb55b50ea3b41b2 ] [stringutils.py: remove unused import. david-sarah@jacaranda.org**20100617034440 Ignore-this: 16ec7d737c34665156c2ac486acd545a ] [test_stringutils.py: take account of the output of listdir_unicode no longer being normalized. Also use Unicode escapes, not UTF-8. david-sarah@jacaranda.org**20100617034409 Ignore-this: 47f3f072f0e2efea0abeac130f84c56f ] [test_dirnode.py: partial tests for normalization changes. david-sarah@jacaranda.org**20100617034025 Ignore-this: 2e3169dd8b120d42dff35bd267dcb417 ] [SFTP: get 'ctime' attribute from 'tahoe:linkmotime'. david-sarah@jacaranda.org**20100617033744 Ignore-this: b2fabe12235f2e2a487c0b56c39953e7 ] [stringutils.py: don't NFC-normalize the output of listdir_unicode. david-sarah@jacaranda.org**20100617015537 Ignore-this: 93c9b6f3d7c6812a0afa8d9e1b0b4faa ] [stringutils.py: Add encoding argument to quote_output. Also work around a bug in locale.getpreferredencoding on older Pythons. david-sarah@jacaranda.org**20100616042012 Ignore-this: 48174c37ad95205997e4d3cdd81f1e28 ] [Provisional patch to NFC-normalize filenames going in and out of Tahoe directories. david-sarah@jacaranda.org**20100616031450 Ignore-this: ed08c9d8df37ef0b7cca42bb562c996b ] [how_to_make_a_tahoe-lafs_release.txt: reordering, add fuse-sshfs@lists.sourceforge.list as place to send relnotes. david-sarah@jacaranda.org**20100618041854 Ignore-this: 2e380a6e72917d3a20a65ceccd9a4df ] [running.html: fix overeager replacement of 'tahoe' with 'Tahoe-LAFS', and some simplifications. david-sarah@jacaranda.org**20100617000952 Ignore-this: 472b4b531c866574ed79f076b58495b5 ] [Add a specification for servers of happiness. Kevan Carstensen **20100524003508 Ignore-this: 982e2be8a411be5beaf3582bdfde6151 ] [Note that servers of happiness only applies to immutable files for the moment Kevan Carstensen **20100524042836 Ignore-this: cf83cac7a2b3ed347ae278c1a7d9a176 ] [Add a note about running Tahoe-LAFS on a small grid to running.html zooko@zooko.com**20100616140227 Ignore-this: 14dfbff0d47144f7c2375108c6055dc2 also Change "tahoe" and "Tahoe" to "Tahoe-LAFS" in running.html author: Kevan Carstensen ] [test_system.py: investigate failure in allmydata.test.test_system.SystemTest.test_upload_and_download_random_key due to bytes_sent not being an int david-sarah@jacaranda.org**20100616001648 Ignore-this: 9c78092ab7bfdc909acae3a144ddd1f8 ] [SFTP: remove a dubious use of 'pragma: no cover'. david-sarah@jacaranda.org**20100613164356 Ignore-this: 8f96a81b1196017ed6cfa1d914e56fa5 ] [SFTP: test that renaming onto a just-opened file fails. david-sarah@jacaranda.org**20100612033709 Ignore-this: 9b14147ad78b16a5ab0e0e4813491414 ] [SFTP: further small improvements to test coverage. Also ensure that after a test failure, later tests don't fail spuriously due to the checks for heisenfile leaks. david-sarah@jacaranda.org**20100612030737 Ignore-this: 4ec1dd3d7542be42007987a2f51508e7 ] [SFTP: further improve test coverage (paths containing '.', bad data for posix-rename extension, and error in test of openShell). david-sarah@jacaranda.org**20100611213142 Ignore-this: 956f9df7f9e8a66b506ca58dd9a5dbe7 ] [SFTP: improve test coverage for no-write on mutable files, and check for heisenfile table leaks in all relevant tests. Delete test_memory_leak since it is now redundant. david-sarah@jacaranda.org**20100611205752 Ignore-this: 88be1cf323c10dd534a4b8fdac121e31 ] [CLI.txt: introduce 'create-alias' before 'add-alias', document Unicode argument support, and other minor updates. david-sarah@jacaranda.org**20100610225547 Ignore-this: de7326e98d79291cdc15aed86ae61fe8 ] [SFTP: add test for extension of file opened with FXF_APPEND. david-sarah@jacaranda.org**20100610182647 Ignore-this: c0216d26453ce3cb4b92eef37d218fb4 ] [NEWS: add UTF-8 coding declaration. david-sarah@jacaranda.org**20100609234851 Ignore-this: 3e6ef125b278e0a982c88d23180a78ae ] [tests: bump up the timeout on this iputil test from 2s to 4s zooko@zooko.com**20100609143017 Ignore-this: 786b7f7bbc85d45cdf727a6293750798 ] [docs: a few tweaks to NEWS and CREDITS and make quickstart.html point to 1.7.0β! zooko@zooko.com**20100609142927 Ignore-this: f8097d3062f41f06c4420a7c84a56481 ] [docs: Update NEWS file with new features and bugfixes in 1.7.0 francois@ctrlaltdel.ch**20100609091120 Ignore-this: 8c1014e4469ef530e5ff48d7d6ae71c5 ] [docs: wording fix, thanks to Jeremy Visser, fix #987 francois@ctrlaltdel.ch**20100609081103 Ignore-this: 6d2e627e0f1cd58c0e1394e193287a4b ] [SFTP: fix most significant memory leak described in #1045 (due to a file being added to all_heisenfiles under more than one direntry when renamed). david-sarah@jacaranda.org**20100609080003 Ignore-this: 490b4c14207f6725d0dd32c395fbcefa ] [test_stringutils.py: Fix test failure on CentOS builder, possibly Python 2.4.3-related. david-sarah@jacaranda.org**20100609065056 Ignore-this: 503b561b213baf1b92ae641f2fdf080a ] [Fix for Unicode-related test failures on Zooko's OS X 10.6 machine. david-sarah@jacaranda.org**20100609055448 Ignore-this: 395ad16429e56623edfa74457a121190 ] [docs: update relnote.txt for Tahoe-LAFS v1.7.0β zooko@zooko.com**20100609054602 Ignore-this: 52e1bf86a91d45315960fb8806b7a479 ] [stringutils.py, sftpd.py: Portability fixes for Python <= 2.5. david-sarah@jacaranda.org**20100609013302 Ignore-this: 9d9ce476ee1b96796e0f48cc5338f852 ] [setup: move the mock library from install_requires to tests_require (re: #1016) zooko@zooko.com**20100609050542 Ignore-this: c51a4ff3e19ed630755be752d2233db4 ] [Back out Windows-specific Unicode argument support for v1.7. david-sarah@jacaranda.org**20100609000803 Ignore-this: b230ffe6fdaf9a0d85dfe745b37b42fb ] [_auto_deps.py: allow Python 2.4.3 on Redhat-based distributions. david-sarah@jacaranda.org**20100609003646 Ignore-this: ad3cafdff200caf963024873d0ebff3c ] [setup: show-tool-versions.py: print out the output from the unix command "locale" and re-arrange encoding data a little bit zooko@zooko.com**20100609040714 Ignore-this: 69382719b462d13ff940fcd980776004 ] [setup: add zope.interface to the packages described by show-tool-versions.py zooko@zooko.com**20100609034915 Ignore-this: b5262b2af5c953a5f68a60bd48dcaa75 ] [CREDITS: update François's Description zooko@zooko.com**20100608155513 Ignore-this: a266b438d25ca2cb28eafff75aa4b2a ] [CREDITS: jsgf zooko@zooko.com**20100608143052 Ignore-this: 10abe06d40b88e22a9107d30f1b84810 ] [setup: rename the setuptools_trial .egg that comes bundled in the base dir to not have "-py2.6" in its name, since it works with other versions of python as well zooko@zooko.com**20100608041607 Ignore-this: 64fe386d2e5fba0ab441116e74dad5a3 ] [setup: rename the darcsver .egg that comes bundled in the base dir to not have "-py2.6" in its name, since it works with other versions of python as well zooko@zooko.com**20100608041534 Ignore-this: 53f925f160256409cf01b76d2583f83f ] [SFTP: suppress NoSuchChildError if heisenfile attributes have been updated in setAttrs, in the case where the parent is available. david-sarah@jacaranda.org**20100608063753 Ignore-this: 8c72a5a9c15934f8fe4594ba3ee50ddd ] [SFTP: ignore permissions when opening a file (needed for sshfs interoperability). david-sarah@jacaranda.org**20100608055700 Ignore-this: f87f6a430f629326a324ddd94426c797 ] [test_web.py: fix pyflakes warnings introduced by byterange patch. david-sarah@jacaranda.org**20100608042012 Ignore-this: a7612724893b51d1154dec4372e0508 ] [Improve HTTP/1.1 byterange handling Jeremy Fitzhardinge **20100310025913 Ignore-this: 6d69e694973d618f0dc65983735cd9be Fix parsing of a Range: header to support: - multiple ranges (parsed, but not returned) - suffix byte ranges ("-2139") - correct handling of incorrectly formatted range headers (correct behaviour is to ignore the header and return the full file) - return appropriate error for ranges outside the file Multiple ranges are parsed, but only the first range is returned. Returning multiple ranges requires using the multipart/byterange content type. ] [tests: bump up the timeout on these tests; MM's buildslave is sometimes extremely slow on tests, but it will complete them if given enough time. MM is working on making that buildslave more predictable in how long it takes to run tests. zooko@zooko.com**20100608033754 Ignore-this: 98dc27692c5ace1e4b0650b6680629d7 ] [test_cli.py: remove invalid 'test_listdir_unicode_bad' test. david-sarah@jacaranda.org**20100607183730 Ignore-this: fadfe87980dc1862f349bfcc21b2145f ] [check_memory.py: adapt to servers-of-happiness changes. david-sarah@jacaranda.org**20100608013528 Ignore-this: c6b28411c543d1aea2f148a955f7998 ] [show-tool-versions.py: platform.linux_distribution() is not always available david-sarah@jacaranda.org**20100608004523 Ignore-this: 793fb4050086723af05d06bed8b1b92a ] [show-tool-versions.py: show platform.linux_distribution() david-sarah@jacaranda.org**20100608003829 Ignore-this: 81cb5e5fc6324044f0fc6d82903c8223 ] [Remove the 'tahoe debug consolidate' subcommand. david-sarah@jacaranda.org**20100607183757 Ignore-this: 4b14daa3ae557cea07d6e119d25dafe9 ] [common_http.py, tahoe_cp.py: Fix an error in calling the superclass constructor in HTTPError and MissingSourceError (introduced by the Unicode fixes). david-sarah@jacaranda.org**20100607174714 Ignore-this: 1a118d593d81c918a4717c887f033aec ] [tests: drastically increase timeout of this very time-consuming test in honor of François's ARM box zooko@zooko.com**20100607115929 Ignore-this: bf1bb52ffb6b5ccae71d4dde14621bc8 ] [setup: update authorship, datestamp, licensing, and add special exceptions to allow combination with Eclipse- and QPL- licensed code zooko@zooko.com**20100607062329 Ignore-this: 5a1d7b12dfafd61283ea65a245416381 ] [FTP-and-SFTP.txt: minor technical correction to doc for 'no-write' flag. david-sarah@jacaranda.org**20100607061600 Ignore-this: 66aee0c1b6c00538602d08631225e114 ] [test_stringutils.py: trivial error in exception message for skipped test. david-sarah@jacaranda.org**20100607061455 Ignore-this: f261a5d4e2b8fe3bcc37e02539ba1ae2 ] [More Unicode test fixes. david-sarah@jacaranda.org**20100607053358 Ignore-this: 6a271fb77c31f28cb7bdba63b26a2dd2 ] [Unicode fixes for platforms with non-native-Unicode filesystems. david-sarah@jacaranda.org**20100607043238 Ignore-this: 2134dc1793c4f8e50350bd749c4c98c2 ] [Unicode fixes. david-sarah@jacaranda.org**20100607010215 Ignore-this: d58727b5cd2ce00e6b6dae3166030138 ] [setup: organize misc/ scripts and tools and remove obsolete ones zooko@zooko.com**20100607051618 Ignore-this: 161db1158c6b7be8365b0b3dee2e0b28 This is for ticket #1068. ] [quickstart.html: link to snapshots page, sorted with most recent first. david-sarah@jacaranda.org**20100606221127 Ignore-this: 93ea7e6ee47acc66f6daac9cabffed2d ] [quickstart.html: We haven't released 1.7beta yet. david-sarah@jacaranda.org**20100606220301 Ignore-this: 4e18898cfdb08cc3ddd1ff94d43fdda7 ] [setup: loosen the Desert Island test to allow it to check the network for new packages as long as it doesn't actually download any zooko@zooko.com**20100606175717 Ignore-this: e438a8eb3c1b0e68080711ec6ff93ffa (You can look but don't touch.) ] [Raise Python version requirement to 2.4.4 for non-UCS-2 builds, to avoid a critical Python security bug. david-sarah@jacaranda.org**20100605031713 Ignore-this: 2df2b6d620c5d8191c79eefe655059e2 ] [setup: have the buildbots print out locale.getpreferredencoding(), locale.getdefaultlocale(), locale.getlocale(), and os.path.supports_unicode_filenames zooko@zooko.com**20100605162932 Ignore-this: 85e31e0e0e1364e9215420e272d58116 Even though that latter one is completely useless, I'm curious. ] [unicode tests: fix missing import zooko@zooko.com**20100604142630 Ignore-this: db437fe8009971882aaea9de05e2bc3 ] [unicode: make test_cli test a non-ascii argument, and make the fallback term encoding be locale.getpreferredencoding() zooko@zooko.com**20100604141251 Ignore-this: b2bfc07942f69141811e59891842bd8c ] [unicode: always decode json manifest as utf-8 then encode for stdout zooko@zooko.com**20100604084840 Ignore-this: ac481692315fae870a0f3562bd7db48e pyflakes pointed out that the exception handler fallback called an un-imported function, showing that the fallback wasn't being exercised. I'm not 100% sure that this patch is right and would appreciate François or someone reviewing it. ] [fix flakes zooko@zooko.com**20100604075845 Ignore-this: 3e6a84b78771b0ad519e771a13605f0 ] [fix syntax of assertion handling that isn't portable to older versions of Python zooko@zooko.com**20100604075805 Ignore-this: 3a12b293aad25883fb17230266eb04ec ] [test_stringutils.py: Skip test test_listdir_unicode_good if filesystem supports only ASCII filenames Francois Deppierraz **20100521160839 Ignore-this: f2ccdbd04c8d9f42f1efb0eb80018257 ] [test_stringutils.py: Skip test_listdir_unicode on mocked platform which cannot store non-ASCII filenames Francois Deppierraz **20100521160559 Ignore-this: b93fde736a8904712b506e799250a600 ] [test_stringutils.py: Add a test class for OpenBSD 4.1 with LANG=C Francois Deppierraz **20100521140053 Ignore-this: 63f568aec259cef0e807752fc8150b73 ] [test_stringutils.py: Mock the open() call in test_open_unicode Francois Deppierraz **20100521135817 Ignore-this: d8be4e56a6eefe7d60f97f01ea20ac67 This test ensure that open(a_unicode_string) is used on Unicode platforms (Windows or MacOS X) and that open(a_correctly_encoded_bytestring) on other platforms such as Unix. ] [test_stringutils.py: Fix a trivial Python 2.4 syntax incompatibility Francois Deppierraz **20100521093345 Ignore-this: 9297e3d14a0dd37d0c1a4c6954fd59d3 ] [test_cli.py: Fix tests when sys.stdout.encoding=None and refactor this code into functions Francois Deppierraz **20100520084447 Ignore-this: cf2286e225aaa4d7b1927c78c901477f ] [Fix handling of correctly encoded unicode filenames (#534) Francois Deppierraz **20100520004356 Ignore-this: 8a3a7df214a855f5a12dc0eeab6f2e39 Tahoe CLI commands working on local files, for instance 'tahoe cp' or 'tahoe backup', have been improved to correctly handle filenames containing non-ASCII characters. In the case where Tahoe encounters a filename which cannot be decoded using the system encoding, an error will be returned and the operation will fail. Under Linux, this typically happens when the filesystem contains filenames encoded with another encoding, for instance latin1, than the system locale, for instance UTF-8. In such case, you'll need to fix your system with tools such as 'convmv' before using Tahoe CLI. All CLI commands have been improved to support non-ASCII parameters such as filenames and aliases on all supported Operating Systems except Windows as of now. ] [stringutils.py: Unicode helper functions + associated tests Francois Deppierraz **20100520004105 Ignore-this: 7a73fc31de2fd39d437d6abd278bfa9a This file contains a bunch of helper functions which converts unicode string from and to argv, filenames and stdout. ] [Add dependency on Michael Foord's mock library Francois Deppierraz **20100519233325 Ignore-this: 9bb01bf1e4780f6b98ed394c3b772a80 ] [Resolve merge conflict for sftpd.py david-sarah@jacaranda.org**20100603182537 Ignore-this: ba8b543e51312ac949798eb8f5bd9d9c ] [SFTP: possible fix for metadata times being shown as the epoch. david-sarah@jacaranda.org**20100602234514 Ignore-this: bdd7dfccf34eff818ff88aa4f3d28790 ] [SFTP: further improvements to test coverage. david-sarah@jacaranda.org**20100602234422 Ignore-this: 87eeee567e8d7562659442ea491e187c ] [SFTP: improve test coverage. Also make creating a directory fail when permissions are read-only (rather than ignoring the permissions). david-sarah@jacaranda.org**20100602041934 Ignore-this: a5e9d9081677bc7f3ddb18ca7a1f531f ] [dirnode.py: fix a bug in the no-write change for Adder, and improve test coverage. Add a 'metadata' argument to create_subdirectory, with documentation. Also update some comments in test_dirnode.py made stale by the ctime/mtime change. david-sarah@jacaranda.org**20100602032641 Ignore-this: 48817b54cd63f5422cb88214c053b03b ] [SFTP: fix a bug that caused the temporary files underlying EncryptedTemporaryFiles not to be closed. david-sarah@jacaranda.org**20100601055310 Ignore-this: 44fee4cfe222b2b1690f4c5e75083a52 ] [SFTP: changes for #1063 ('no-write' field) including comment:1 (clearing owner write permission diminishes to a read cap). Includes documentation changes, but not tests for the new behaviour. david-sarah@jacaranda.org**20100601051139 Ignore-this: eff7c08bd47fd52bfe2b844dabf02558 ] [SFTP: the same bug as in _sync_heisenfiles also occurred in two other places. david-sarah@jacaranda.org**20100530060127 Ignore-this: 8d137658fc6e4596fa42697476c39aa3 ] [SFTP: another try at fixing the _sync_heisenfiles bug. david-sarah@jacaranda.org**20100530055254 Ignore-this: c15f76f32a60083a6b7de6ca0e917934 ] [SFTP: fix silly bug in _sync_heisenfiles ('f is not ignore' vs 'not (f is ignore)'). david-sarah@jacaranda.org**20100530053807 Ignore-this: 71c4bc62613bf8fef835886d8eb61c27 ] [SFTP: log when a sync completes. david-sarah@jacaranda.org**20100530051840 Ignore-this: d99765663ceb673c8a693dfcf88c25ea ] [SFTP: fix bug in previous logging patch. david-sarah@jacaranda.org**20100530050000 Ignore-this: 613e4c115f03fe2d04c621b510340817 ] [SFTP: more logging to track down OpenOffice hang. david-sarah@jacaranda.org**20100530040809 Ignore-this: 6c11f2d1eac9f62e2d0f04f006476a03 ] [SFTP: avoid blocking close on a heisenfile that has been abandoned or never changed. Also, improve the logging to help track down a case where OpenOffice hangs on opening a file with FXF_READ|FXF_WRITE. david-sarah@jacaranda.org**20100530025544 Ignore-this: 9919dddd446fff64de4031ad51490d1c ] [Move suppression of DeprecationWarning about BaseException.message from sftpd.py to main __init__.py. Also, remove the global suppression of the 'integer argument expected, got float' warning, which turned out to be a bug. david-sarah@jacaranda.org**20100529050537 Ignore-this: 87648afa0dec0d2e73614007de102a16 ] [SFTP: cater to clients that assume a file is created as soon as they have made an open request; also, fix some race conditions associated with closing a file at about the same time as renaming or removing it. david-sarah@jacaranda.org**20100529045253 Ignore-this: 2404076b2154ff2659e2b10e0b9e813c ] [SFTP: 'sync' any open files at a direntry before opening any new file at that direntry. This works around the sshfs misbehaviour of returning success to clients immediately on close. david-sarah@jacaranda.org**20100525230257 Ignore-this: 63245d6d864f8f591c86170864d7c57f ] [SFTP: handle removing a file while it is open. Also some simplifications of the logout handling. david-sarah@jacaranda.org**20100525184210 Ignore-this: 660ee80be6ecab783c60452a9da896de ] [SFTP: a posix-rename response should actually return an FXP_STATUS reply, not an FXP_EXTENDED_REPLY as Twisted Conch assumes. Work around this by raising an SFTPError with code FX_OK. david-sarah@jacaranda.org**20100525033323 Ignore-this: fe2914d3ef7f5194bbeaf3f2dda2ad7d ] [SFTP: fix problem with posix-rename code returning a Deferred for the renamed filenode, not for the result of the request (an empty string). david-sarah@jacaranda.org**20100525020209 Ignore-this: 69f7491df2a8f7ea92d999a6d9f0581d ] [SFTP: fix time handling to make sure floats are not passed into twisted.conch, and to print times in the future less ambiguously in directory listings. david-sarah@jacaranda.org**20100524230412 Ignore-this: eb1a3fb72492fa2fb19667b6e4300440 ] [SFTP: name of the POSIX rename extension should be 'posix-rename@openssh.com', not 'extposix-rename@openssh.com'. david-sarah@jacaranda.org**20100524021156 Ignore-this: f90eb1ff9560176635386ee797a3fdc7 ] [SFTP: avoid race condition where .write could be called on an OverwriteableFileConsumer after it had been closed. david-sarah@jacaranda.org**20100523233830 Ignore-this: 55d381064a15bd64381163341df4d09f ] [SFTP: log tracebacks for RAISEd exceptions. david-sarah@jacaranda.org**20100523221535 Ignore-this: c76a7852df099b358642f0631237cc89 ] [SFTP: more logging to investigate behaviour of getAttrs(path). david-sarah@jacaranda.org**20100523204236 Ignore-this: e58fd35dc9015316e16a9f49f19bb469 ] [SFTP: fix pyflakes warnings; drop 'noisy' versions of eventually_callback and eventually_errback; robustify conversion of exception messages to UTF-8. david-sarah@jacaranda.org**20100523140905 Ignore-this: 420196fc58646b05bbc9c3732b6eb314 ] [SFTP: fixes and test cases for renaming of open files. david-sarah@jacaranda.org**20100523032549 Ignore-this: 32e0726be0fc89335f3035157e202c68 ] [SFTP: Increase test_sftp timeout to cater for francois' ARM buildslave. david-sarah@jacaranda.org**20100522191639 Ignore-this: a5acf9660d304677048ab4dd72908ad8 ] [SFTP: Fix error in support for getAttrs on an open file, to index open files by directory entry rather than path. Extend that support to renaming open files. Also, implement the extposix-rename@openssh.org extension, and some other minor refactoring. david-sarah@jacaranda.org**20100522035836 Ignore-this: 8ef93a828e927cce2c23b805250b81a4 ] [SFTP tests: fix test_openDirectory_and_attrs that was failing in timezones west of UTC. david-sarah@jacaranda.org**20100520181027 Ignore-this: 9beaf602beef437c11c7e97f54ce2599 ] [SFTP: allow getAttrs to succeed on a file that has been opened for creation but not yet uploaded or linked (part of #1050). david-sarah@jacaranda.org**20100520035613 Ignore-this: 2f59107d60d5476edac19361ccf6cf94 ] [SFTP: improve logging so that results of requests are (usually) logged. david-sarah@jacaranda.org**20100520003652 Ignore-this: 3f59eeee374a3eba71db9be31d5a95 ] [SFTP: add tests for more combinations of open flags. david-sarah@jacaranda.org**20100519053933 Ignore-this: b97ee351b1e8ecfecabac70698060665 ] [SFTP: allow FXF_WRITE | FXF_TRUNC (#1050). david-sarah@jacaranda.org**20100519043240 Ignore-this: bd70009f11d07ac6e9fd0d1e3fa87a9b ] [SFTP: remove another case where we were logging data. david-sarah@jacaranda.org**20100519012713 Ignore-this: 83115daf3a90278fed0e3fc267607584 ] [SFTP: avoid logging all data passed to callbacks. david-sarah@jacaranda.org**20100519000651 Ignore-this: ade6d69a473ada50acef6389fc7fdf69 ] [SFTP: fixes related to reporting of permissions (needed for sshfs). david-sarah@jacaranda.org**20100518054521 Ignore-this: c51f8a5d0dc76b80d33ffef9b0541325 ] [SFTP: change error code returned for ExistingChildError to FX_FAILURE (fixes gvfs with some picky programs such as gedit). david-sarah@jacaranda.org**20100518004205 Ignore-this: c194c2c9aaf3edba7af84b7413cec375 ] [SFTP: fixed bugs that caused hangs during write (#1037). david-sarah@jacaranda.org**20100517044228 Ignore-this: b8b95e82c4057367388a1e6baada993b ] [SFTP: work around a probable bug in twisted.conch.ssh.session:loseConnection(). Also some minor error handling cleanups. david-sarah@jacaranda.org**20100517012606 Ignore-this: 5d3da7c4219cb0c14547e7fd70c74204 ] [SFTP: Support statvfs extensions, avoid logging actual data, and decline shell sessions politely. david-sarah@jacaranda.org**20100516154347 Ignore-this: 9d05d23ba77693c03a61accd348ccbe5 ] [SFTP: fix error in SFTPUserHandler arguments introduced by execCommand patch. david-sarah@jacaranda.org**20100516014045 Ignore-this: f5ee494dc6ad6aa536cc8144bd2e3d19 ] [SFTP: implement execCommand to interoperate with clients that issue a 'df -P -k /' command. Also eliminate use of Zope adaptation. david-sarah@jacaranda.org**20100516012754 Ignore-this: 2d0ed28b759f67f83875b1eaf5778992 ] [sftpd.py: 'log.OPERATIONAL' should be just 'OPERATIONAL'. david-sarah@jacaranda.org**20100515155533 Ignore-this: f2347cb3301bbccc086356f6edc685 ] [Attempt to fix #1040 by making SFTPUser implement ISession. david-sarah@jacaranda.org**20100515005719 Ignore-this: b3baaf088ba567e861e61e347195dfc4 ] [Eliminate Windows newlines from sftpd.py. david-sarah@jacaranda.org**20100515005656 Ignore-this: cd54fd25beb957887514ae76e08c277 ] [Update SFTP implementation and tests: fix #1038 and switch to foolscap logging; also some code reorganization. david-sarah@jacaranda.org**20100514043113 Ignore-this: 262f76d953dcd4317210789f2b2bf5da ] [Tests for new SFTP implementation david-sarah@jacaranda.org**20100512060552 Ignore-this: 20308d4a59b3ebc868aad55ae0a7a981 ] [New SFTP implementation: mutable files, read/write support, streaming download, Unicode filenames, and more david-sarah@jacaranda.org**20100512055407 Ignore-this: 906f51c48d974ba9cf360c27845c55eb ] [setup: adjust make clean target to ignore our bundled build tools zooko@zooko.com**20100604051250 Ignore-this: d24d2a3b849000790cfbfab69237454e ] [setup: bundle a copy of setuptools_trial as an unzipped egg in the base dir of the Tahoe-LAFS source tree zooko@zooko.com**20100604044648 Ignore-this: a4736e9812b4dab2d5a2bc4bfc5c3b28 This is to work-around this Distribute issue: http://bitbucket.org/tarek/distribute/issue/55/revision-control-plugin-automatically-installed-as-a-build-dependency-is-not-present-when-another-build-dependency-is-being ] [setup: bundle a copy of darcsver in unzipped egg form in the root of the Tahoe-LAFS source tree zooko@zooko.com**20100604044146 Ignore-this: a51a52e82dd3a39225657ffa27decae2 This is to work-around this Distribute issue: http://bitbucket.org/tarek/distribute/issue/55/revision-control-plugin-automatically-installed-as-a-build-dependency-is-not-present-when-another-build-dependency-is-being ] [quickstart.html: warn against installing Python at a path containing spaces. david-sarah@jacaranda.org**20100604032413 Ignore-this: c7118332573abd7762d9a897e650bc6a ] [setup: undo the previous patch to quote the executable in scripts zooko@zooko.com**20100604025204 Ignore-this: beda3b951c49d1111478618b8cabe005 The problem isn't in the script, it is in the cli.exe script that is built by setuptools. This might be related to http://bugs.python.org/issue6792 and http://bugs.python.org/setuptools/issue2 Or it might be a separate issue involving the launcher.c code e.g. http://tahoe-lafs.org/trac/zetuptoolz/browser/launcher.c?rev=576#L210 and its handling of the interpreter name. ] [setup: put quotes around the path to executable in case it has spaces in it, when building a tahoe.exe for win32 zooko@zooko.com**20100604020836 Ignore-this: 478684843169c94a9c14726fedeeed7d ] [Add must_exist, must_be_directory, and must_be_file arguments to DirectoryNode.delete. This will be used to fixes a minor condition in the SFTP frontend. david-sarah@jacaranda.org**20100527194529 Ignore-this: 6d8114cef4450c52c57639f82852716f ] [Fix test failures in test_web caused by changes to web page titles in #1062. Also, change a 'target' field to '_blank' instead of 'blank' in welcome.xhtml. david-sarah@jacaranda.org**20100603232105 Ignore-this: 6e2cc63f42b07e2a3b2d1a857abc50a6 ] [misc/show-tool-versions.py: Display additional Python interpreter encoding informations (stdout, stdin and filesystem) Francois Deppierraz **20100521094313 Ignore-this: 3ae9b0b07fd1d53fb632ef169f7c5d26 ] [dirnode.py: Fix bug that caused 'tahoe' fields, 'ctime' and 'mtime' not to be updated when new metadata is present. david-sarah@jacaranda.org**20100602014644 Ignore-this: 5bac95aa897b68f2785d481e49b6a66 ] [dirnode.py: Fix #1034 (MetadataSetter does not enforce restriction on setting 'tahoe' subkeys), and expose the metadata updater for use by SFTP. Also, support diminishing a child cap to read-only if 'no-write' is set in the metadata. david-sarah@jacaranda.org**20100601045428 Ignore-this: 14f26e17e58db97fad0dcfd350b38e95 ] [Change doc comments in interfaces.py to take into account unknown nodes. david-sarah@jacaranda.org**20100528171922 Ignore-this: d2fde6890b3bca9c7275775f64fbff56 ] [Trivial whitespace changes. david-sarah@jacaranda.org**20100527194114 Ignore-this: 98d611bc54ee20b01a5f6b334ff61b2d ] [Suppress 'integer argument expected, got float' DeprecationWarning everywhere david-sarah@jacaranda.org**20100523221157 Ignore-this: 80efd7e27798f5d2ad66c7a53e7048e5 ] [Change shouldFail to avoid Unicode errors when converting Failure to str david-sarah@jacaranda.org**20100512060754 Ignore-this: 86ed419d332d9c33090aae2cde1dc5df ] [SFTP: relax pyasn1 version dependency to >= 0.0.8a. david-sarah@jacaranda.org**20100520181437 Ignore-this: 2c7b3dee7b7e14ba121d3118193a386a ] [SFTP: add pyasn1 as dependency, needed if we are using Twisted >= 9.0.0. david-sarah@jacaranda.org**20100516193710 Ignore-this: 76fd92e8a950bb1983a90a09e89c54d3 ] [allmydata.org -> tahoe-lafs.org in __init__.py david-sarah@jacaranda.org**20100603063530 Ignore-this: f7d82331d5b4a3c4c0938023409335af ] [small change to CREDITS david-sarah@jacaranda.org**20100603062421 Ignore-this: 2909cdbedc19da5573dec810fc23243 ] [Resolve conflict in patch to change imports to absolute. david-sarah@jacaranda.org**20100603054608 Ignore-this: 15aa1caa88e688ffa6dc53bed7dcca7d ] [Minor documentation tweaks. david-sarah@jacaranda.org**20100603054458 Ignore-this: e30ae407b0039dfa5b341d8f88e7f959 ] [title_rename_xhtml.dpatch.txt freestorm77@gmail.com**20100529172542 Ignore-this: d2846afcc9ea72ac443a62ecc23d121b - Renamed xhtml Title from "Allmydata - Tahoe" to "Tahoe-LAFS" - Renamed Tahoe to Tahoe-LAFS in page content - Changed Tahoe-LAFS home page link to http://tahoe-lafs.org (added target="blank") - Deleted commented css script in info.xhtml ] [tests: refactor test_web.py to have less duplication of literal caps-from-the-future zooko@zooko.com**20100519055146 Ignore-this: 49e5412e6cc4566ca67f069ffd850af6 This is a prelude to a patch which will add tests of caps from the future which have non-ascii chars in them. ] [doc_reformat_stats.txt freestorm77@gmail.com**20100424114615 Ignore-this: af315db5f7e3a17219ff8fb39bcfcd60 - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content **END OF DESCRIPTION*** Place the long patch description above the ***END OF DESCRIPTION*** marker. The first line of this file will be the patch name. This patch contains the following changes: M ./docs/stats.txt -2 +2 ] [doc_reformat_performance.txt freestorm77@gmail.com**20100424114444 Ignore-this: 55295ff5cd8a5b67034eb661a5b0699d - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content ] [doc_refomat_logging.txt freestorm77@gmail.com**20100424114316 Ignore-this: 593f0f9914516bf1924dfa6eee74e35f - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content ] [doc_reformat_known_issues.txt freestorm77@gmail.com**20100424114118 Ignore-this: 9577c3965d77b7ac18698988cfa06049 - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content ] [doc_reformat_helper.txt freestorm77@gmail.com**20100424120649 Ignore-this: de2080d6152ae813b20514b9908e37fb - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content ] [doc_reformat_garbage-collection.txt freestorm77@gmail.com**20100424120830 Ignore-this: aad3e4c99670871b66467062483c977d - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content ] [doc_reformat_FTP-and-SFTP.txt freestorm77@gmail.com**20100424121334 Ignore-this: 3736b3d8f9a542a3521fbb566d44c7cf - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content ] [doc_reformat_debian.txt freestorm77@gmail.com**20100424120537 Ignore-this: 45fe4355bb869e55e683405070f47eff - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content ] [doc_reformat_configuration.txt freestorm77@gmail.com**20100424104903 Ignore-this: 4fbabc51b8122fec69ce5ad1672e79f2 - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content ] [doc_reformat_CLI.txt freestorm77@gmail.com**20100424121512 Ignore-this: 2d3a59326810adcb20ea232cea405645 - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content ] [doc_reformat_backupdb.txt freestorm77@gmail.com**20100424120416 Ignore-this: fed696530e9d2215b6f5058acbedc3ab - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content ] [doc_reformat_architecture.txt freestorm77@gmail.com**20100424120133 Ignore-this: 6e2cab4635080369f2b8cadf7b2f58e - Added heading format begining and ending by "==" - Added Index - Added Title Note: No change are made in paragraphs content ] [Correct harmless indentation errors found by pylint david-sarah@jacaranda.org**20100226052151 Ignore-this: 41335bce830700b18b80b6e00b45aef5 ] [Change relative imports to absolute david-sarah@jacaranda.org**20100226071433 Ignore-this: 32e6ce1a86e2ffaaba1a37d9a1a5de0e ] [Document reason for the trialcoverage version requirement being 0.3.3. david-sarah@jacaranda.org**20100525004444 Ignore-this: 2f9f1df6882838b000c063068f258aec ] [Downgrade version requirement for trialcoverage to 0.3.3 (from 0.3.10), to avoid needing to compile coveragepy on Windows. david-sarah@jacaranda.org**20100524233707 Ignore-this: 9c397a374c8b8017e2244b8a686432a8 ] [Suppress deprecation warning for twisted.web.error.NoResource when using Twisted >= 9.0.0. david-sarah@jacaranda.org**20100516205625 Ignore-this: 2361a3023cd3db86bde5e1af759ed01 ] [docs: CREDITS for Jeremy Visser zooko@zooko.com**20100524081829 Ignore-this: d7c1465fd8d4e25b8d46d38a1793465b ] [test: show stdout and stderr in case of non-zero exit code from "tahoe" command zooko@zooko.com**20100524073348 Ignore-this: 695e81cd6683f4520229d108846cd551 ] [setup: upgrade bundled zetuptoolz to zetuptoolz-0.6c15dev and make it unpacked and directly loaded by setup.py zooko@zooko.com**20100523205228 Ignore-this: 24fb32aaee3904115a93d1762f132c7 Also fix the relevant "make clean" target behavior. ] [setup: remove bundled zipfile egg of setuptools zooko@zooko.com**20100523205120 Ignore-this: c68b5f2635bb93d1c1fa7b613a026f9e We're about to replace it with bundled unpacked source code of setuptools, which is much nicer for debugging and evolving under revision control. ] [setup: remove bundled copy of setuptools_trial-0.5.2.tar zooko@zooko.com**20100522221539 Ignore-this: 140f90eb8fb751a509029c4b24afe647 Hopefully it will get installed automatically as needed and we won't bundle it anymore. ] [setup: remove bundled setuptools_darcs-1.2.8.tar zooko@zooko.com**20100522015333 Ignore-this: 378b1964b513ae7fe22bae2d3478285d This version of setuptools_darcs had a bug when used on Windows which has been fixed in setuptools_darcs-1.2.9. Hopefully we will not need to bundle a copy of setuptools_darcs-1.2.9 in with Tahoe-LAFS and can instead rely on it to be downloaded from PyPI or bundled in the "tahoe deps" separate tarball. ] [tests: fix pyflakes warnings in bench_dirnode.py zooko@zooko.com**20100521202511 Ignore-this: f23d55b4ed05e52865032c65a15753c4 ] [setup: if the string '--reporter=bwverbose-coverage' appears on sys.argv then you need trialcoverage zooko@zooko.com**20100521122226 Ignore-this: e760c45dcfb5a43c1dc1e8a27346bdc2 ] [tests: don't let bench_dirnode.py do stuff and have side-effects at import time (unless __name__ == '__main__') zooko@zooko.com**20100521122052 Ignore-this: 96144a412250d9bbb5fccbf83b8753b8 ] [tests: increase timeout to give François's ARM buildslave a chance to complete the tests zooko@zooko.com**20100520134526 Ignore-this: 3dd399fdc8b91149c82b52f955b50833 ] [run_trial.darcspath freestorm77@gmail.com**20100510232829 Ignore-this: 5ebb4df74e9ea8a4bdb22b65373d1ff2 ] [docs: line-wrap README.txt zooko@zooko.com**20100518174240 Ignore-this: 670a02d360df7de51ebdcf4fae752577 ] [Hush pyflakes warnings Kevan Carstensen **20100515184344 Ignore-this: fd602c3bba115057770715c36a87b400 ] [setup: new improved misc/show-tool-versions.py zooko@zooko.com**20100516050122 Ignore-this: ce9b1de1b35b07d733e6cf823b66335a ] [Improve code coverage of the Tahoe2PeerSelector tests. Kevan Carstensen **20100515032913 Ignore-this: 793151b63ffa65fdae6915db22d9924a ] [Remove a comment that no longer makes sense. Kevan Carstensen **20100514203516 Ignore-this: 956983c7e7c7e4477215494dfce8f058 ] [docs: update docs/architecture.txt to more fully and correctly explain the upload procedure zooko@zooko.com**20100514043458 Ignore-this: 538b6ea256a49fed837500342092efa3 ] [Fix up the behavior of #778, per reviewers' comments Kevan Carstensen **20100514004917 Ignore-this: 9c20b60716125278b5456e8feb396bff - Make some important utility functions clearer and more thoroughly documented. - Assert in upload.servers_of_happiness that the buckets attributes of PeerTrackers passed to it are mutually disjoint. - Get rid of some silly non-Pythonisms that I didn't see when I first wrote these patches. - Make sure that should_add_server returns true when queried about a shnum that it doesn't know about yet. - Change Tahoe2PeerSelector.preexisting_shares to map a shareid to a set of peerids, alter dependencies to deal with that. - Remove upload.should_add_servers, because it is no longer necessary - Move upload.shares_of_happiness and upload.shares_by_server to a utility file. - Change some points in Tahoe2PeerSelector. - Compute servers_of_happiness using a bipartite matching algorithm that we know is optimal instead of an ad-hoc greedy algorithm that isn't. - Change servers_of_happiness to just take a sharemap as an argument, change its callers to merge existing_shares and used_peers before calling it. - Change an error message in the encoder to be more appropriate for servers of happiness. - Clarify the wording of an error message in immutable/upload.py - Refactor a happiness failure message to happinessutil.py, and make immutable/upload.py and immutable/encode.py use it. - Move the word "only" as far to the right as possible in failure messages. - Use a better definition of progress during peer selection. - Do read-only peer share detection queries in parallel, not sequentially. - Clean up logging semantics; print the query statistics whenever an upload is unsuccessful, not just in one case. ] [Alter the error message when an upload fails, per some comments in #778. Kevan Carstensen **20091230210344 Ignore-this: ba97422b2f9737c46abeb828727beb1 When I first implemented #778, I just altered the error messages to refer to servers where they referred to shares. The resulting error messages weren't very good. These are a bit better. ] [Change "UploadHappinessError" to "UploadUnhappinessError" Kevan Carstensen **20091205043037 Ignore-this: 236b64ab19836854af4993bb5c1b221a ] [Alter the error message returned when peer selection fails Kevan Carstensen **20091123002405 Ignore-this: b2a7dc163edcab8d9613bfd6907e5166 The Tahoe2PeerSelector returned either NoSharesError or NotEnoughSharesError for a variety of error conditions that weren't informatively described by them. This patch creates a new error, UploadHappinessError, replaces uses of NoSharesError and NotEnoughSharesError with it, and alters the error message raised with the errors to be more in line with the new servers_of_happiness behavior. See ticket #834 for more information. ] [Eliminate overcounting iof servers_of_happiness in Tahoe2PeerSelector; also reorganize some things. Kevan Carstensen **20091118014542 Ignore-this: a6cb032cbff74f4f9d4238faebd99868 ] [Change stray "shares_of_happiness" to "servers_of_happiness" Kevan Carstensen **20091116212459 Ignore-this: 1c971ba8c3c4d2e7ba9f020577b28b73 ] [Alter Tahoe2PeerSelector to make sure that it recognizes existing shares on readonly servers, fixing an issue in #778 Kevan Carstensen **20091116192805 Ignore-this: 15289f4d709e03851ed0587b286fd955 ] [Alter 'immutable/encode.py' and 'immutable/upload.py' to use servers_of_happiness instead of shares_of_happiness. Kevan Carstensen **20091104111222 Ignore-this: abb3283314820a8bbf9b5d0cbfbb57c8 ] [Alter the signature of set_shareholders in IEncoder to add a 'servermap' parameter, which gives IEncoders enough information to perform a sane check for servers_of_happiness. Kevan Carstensen **20091104033241 Ignore-this: b3a6649a8ac66431beca1026a31fed94 ] [Alter CiphertextDownloader to work with servers_of_happiness Kevan Carstensen **20090924041932 Ignore-this: e81edccf0308c2d3bedbc4cf217da197 ] [Revisions of the #778 tests, per reviewers' comments Kevan Carstensen **20100514012542 Ignore-this: 735bbc7f663dce633caeb3b66a53cf6e - Fix comments and confusing naming. - Add tests for the new error messages suggested by David-Sarah and Zooko. - Alter existing tests for new error messages. - Make sure that the tests continue to work with the trunk. - Add a test for a mutual disjointedness assertion that I added to upload.servers_of_happiness. - Fix the comments to correctly reflect read-onlyness - Add a test for an edge case in should_add_server - Add an assertion to make sure that share redistribution works as it should - Alter tests to work with revised servers_of_happiness semantics - Remove tests for should_add_server, since that function no longer exists. - Alter tests to know about merge_peers, and to use it before calling servers_of_happiness. - Add tests for merge_peers. - Add Zooko's puzzles to the tests. - Edit encoding tests to expect the new kind of failure message. - Edit tests to expect error messages with the word "only" moved as far to the right as possible. - Extended and cleaned up some helper functions. - Changed some tests to call more appropriate helper functions. - Added a test for the failing redistribution algorithm - Added a test for the progress message - Added a test for the upper bound on readonly peer share discovery. ] [Alter various unit tests to work with the new happy behavior Kevan Carstensen **20100107181325 Ignore-this: 132032bbf865e63a079f869b663be34a ] [Replace "UploadHappinessError" with "UploadUnhappinessError" in tests. Kevan Carstensen **20091205043453 Ignore-this: 83f4bc50c697d21b5f4e2a4cd91862ca ] [Add tests for the behavior described in #834. Kevan Carstensen **20091123012008 Ignore-this: d8e0aa0f3f7965ce9b5cea843c6d6f9f ] [Re-work 'test_upload.py' to be more readable; add more tests for #778 Kevan Carstensen **20091116192334 Ignore-this: 7e8565f92fe51dece5ae28daf442d659 ] [Test Tahoe2PeerSelector to make sure that it recognizeses existing shares on readonly servers Kevan Carstensen **20091109003735 Ignore-this: 12f9b4cff5752fca7ed32a6ebcff6446 ] [Add more tests for comment:53 in ticket #778 Kevan Carstensen **20091104112849 Ignore-this: 3bb2edd299a944cc9586e14d5d83ec8c ] [Add a test for upload.shares_by_server Kevan Carstensen **20091104111324 Ignore-this: f9802e82d6982a93e00f92e0b276f018 ] [Minor tweak to an existing test -- make the first server read-write, instead of read-only Kevan Carstensen **20091104034232 Ignore-this: a951a46c93f7f58dd44d93d8623b2aee ] [Alter tests to use the new form of set_shareholders Kevan Carstensen **20091104033602 Ignore-this: 3deac11fc831618d11441317463ef830 ] [Refactor some behavior into a mixin, and add tests for the behavior described in #778 "Kevan Carstensen" **20091030091908 Ignore-this: a6f9797057ca135579b249af3b2b66ac ] [Alter NoNetworkGrid to allow the creation of readonly servers for testing purposes. Kevan Carstensen **20091018013013 Ignore-this: e12cd7c4ddeb65305c5a7e08df57c754 ] [Update 'docs/architecture.txt' to reflect readonly share discovery kevan@isnotajoke.com**20100514003852 Ignore-this: 7ead71b34df3b1ecfdcfd3cb2882e4f9 ] [Alter the wording in docs/architecture.txt to more accurately describe the servers_of_happiness behavior. Kevan Carstensen **20100428002455 Ignore-this: 6eff7fa756858a1c6f73728d989544cc ] [Alter wording in 'interfaces.py' to be correct wrt #778 "Kevan Carstensen" **20091205034005 Ignore-this: c9913c700ac14e7a63569458b06980e0 ] [Update 'docs/configuration.txt' to reflect the servers_of_happiness behavior. Kevan Carstensen **20091205033813 Ignore-this: 5e1cb171f8239bfb5b565d73c75ac2b8 ] [Clarify quickstart instructions for installing pywin32 david-sarah@jacaranda.org**20100511180300 Ignore-this: d4668359673600d2acbc7cd8dd44b93c ] [web: add a simple test that you can load directory.xhtml zooko@zooko.com**20100510063729 Ignore-this: e49b25fa3c67b3c7a56c8b1ae01bb463 ] [setup: fix typos in misc/show-tool-versions.py zooko@zooko.com**20100510063615 Ignore-this: 2181b1303a0e288e7a9ebd4c4855628 ] [setup: show code-coverage tool versions in show-tools-versions.py zooko@zooko.com**20100510062955 Ignore-this: 4b4c68eb3780b762c8dbbd22b39df7cf ] [docs: update README, mv it to README.txt, update setup.py zooko@zooko.com**20100504094340 Ignore-this: 40e28ca36c299ea1fd12d3b91e5b421c ] [Dependency on Windmill test framework is not needed yet. david-sarah@jacaranda.org**20100504161043 Ignore-this: be088712bec650d4ef24766c0026ebc8 ] [tests: pass z to tar so that BSD tar will know to ungzip zooko@zooko.com**20100504090628 Ignore-this: 1339e493f255e8fc0b01b70478f23a09 ] [setup: update comments and URLs in setup.cfg zooko@zooko.com**20100504061653 Ignore-this: f97692807c74bcab56d33100c899f829 ] [setup: reorder and extend the show-tool-versions script, the better to glean information about our new buildslaves zooko@zooko.com**20100504045643 Ignore-this: 836084b56b8d4ee8f1de1f4efb706d36 ] [CLI: Support for https url in option --node-url Francois Deppierraz **20100430185609 Ignore-this: 1717176b4d27c877e6bc67a944d9bf34 This patch modifies the regular expression used for verifying of '--node-url' parameter. Support for accessing a Tahoe gateway over HTTPS was already present, thanks to Python's urllib. ] [backupdb.did_create_directory: use REPLACE INTO, not INSERT INTO + ignore error Brian Warner **20100428050803 Ignore-this: 1fca7b8f364a21ae413be8767161e32f This handles the case where we upload a new tahoe directory for a previously-processed local directory, possibly creating a new dircap (if the metadata had changed). Now we replace the old dirhash->dircap record. The previous behavior left the old record in place (with the old dircap and timestamps), so we'd never stop creating new directories and never converge on a null backup. ] ["tahoe webopen": add --info flag, to get ?t=info Brian Warner **20100424233003 Ignore-this: 126b0bb6db340fabacb623d295eb45fa Also fix some trailing whitespace. ] [docs: install.html http-equiv refresh to quickstart.html zooko@zooko.com**20100421165708 Ignore-this: 52b4b619f9dde5886ae2cd7f1f3b734b ] [docs: install.html -> quickstart.html zooko@zooko.com**20100421155757 Ignore-this: 6084e203909306bed93efb09d0e6181d It is not called "installing" because that implies that it is going to change the configuration of your operating system. It is not called "building" because that implies that you need developer tools like a compiler. Also I added a stern warning against looking at the "InstallDetails" wiki page, which I have renamed to "AdvancedInstall". ] [Fix another typo in tahoe_storagespace munin plugin david-sarah@jacaranda.org**20100416220935 Ignore-this: ad1f7aa66b554174f91dfb2b7a3ea5f3 ] [Add dependency on windmill >= 1.3 david-sarah@jacaranda.org**20100416190404 Ignore-this: 4437a7a464e92d6c9012926b18676211 ] [licensing: phrase the OpenSSL-exemption in the vocabulary of copyright instead of computer technology, and replicate the exemption from the GPL to the TGPPL zooko@zooko.com**20100414232521 Ignore-this: a5494b2f582a295544c6cad3f245e91 ] [munin-tahoe_storagespace freestorm77@gmail.com**20100221203626 Ignore-this: 14d6d6a587afe1f8883152bf2e46b4aa Plugin configuration rename ] [setup: add licensing declaration for setuptools (noticed by the FSF compliance folks) zooko@zooko.com**20100309184415 Ignore-this: 2dfa7d812d65fec7c72ddbf0de609ccb ] [setup: fix error in licensing declaration from Shawn Willden, as noted by the FSF compliance division zooko@zooko.com**20100309163736 Ignore-this: c0623d27e469799d86cabf67921a13f8 ] [CREDITS to Jacob Appelbaum zooko@zooko.com**20100304015616 Ignore-this: 70db493abbc23968fcc8db93f386ea54 ] [desert-island-build-with-proper-versions jacob@appelbaum.net**20100304013858] [docs: a few small edits to try to guide newcomers through the docs zooko@zooko.com**20100303231902 Ignore-this: a6aab44f5bf5ad97ea73e6976bc4042d These edits were suggested by my watching over Jake Appelbaum's shoulder as he completely ignored/skipped/missed install.html and also as he decided that debian.txt wouldn't help him with basic installation. Then I threw in a few docs edits that have been sitting around in my sandbox asking to be committed for months. ] [TAG allmydata-tahoe-1.6.1 david-sarah@jacaranda.org**20100228062314 Ignore-this: eb5f03ada8ea953ee7780e7fe068539 ] Patch bundle hash: d56838e7d35a3bd2740093ea894b5cf64e5169ba