MERRITT-L Archives

Merritt users list

MERRITT-L@LISTSERV.UCOP.EDU

Options: Use Forum View

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Perry Willett <[log in to unmask]>
Reply To:
Merritt users list <[log in to unmask]>
Date:
Wed, 27 May 2015 22:49:56 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (19 lines)
We fixed a significant problem with ingest last week. Connections between the ingest server and the storage server were being dropped randomly, and the submissions would fail as a result. The problem occurred regularly with large files (and we're seeing a lot more large files lately). It may have been happening with smaller files as well, but it was more likely to occur with large files. The current process will now resume the transfer if the connection is dropped, and at the point at which it was interrupted (similar to wget's "continue" command). We tested it with large files over the Memorial Day weekend, some over 100 GB, and they all succeeded, so we're confident that we've fixed this problem.

We determined that it only occurred in the UCOP Data Center, but couldn't diagnose it further. We could not reproduce the problem in the Berkeley Data Center. We're planning to move out of the UCOP Data Center to the Berkeley Data Center soon, so we don't plan to investigate more unless it reoccurs.

In addition to this problem, we also encountered problems with the capacity of our ingest servers. Our three ingest servers each had 200 GB of storage. We've expanded them to 1 TB each. Submissions are assembled on the ingest server before sending them on to storage. The size of an entire submission can now be up to one terabyte. If you're submitting a manifest, this means that the total size of all the objects in the manifest must be less than 1 TB. If you're submitting a container (zip, tar or gz), the total size of the container and the uncompressed object must be less than 1 TB, because we unpack the container on the ingest server. Once the submission has been fully processed and ingested into storage, the initial payload on the ingest server is deleted.

This should alleviate most of the recent problems with ingest that we and you have encountered. Be aware that other people might be submitting content at the same time as you. As mentioned, we have three ingest servers, each with 1 TB of storage. Submissions are sent to one of the three servers in a round robin. It could happen that two large submissions end up on the same ingest server if someone else submits two smaller ones in between (this has happened). If you plan to submit more than, say, 200 GB of data at any given time, you might contact us just to give us a heads-up. We can let you know if there is other significant activity. Let us know if you have any questions. Sorry for the inconvenience this has caused some of you, and best wishes,

Perry

Perry Willett
Digital Preservation Services Manager
California Digital Library
415 20th St., 4th Floor
Oakland CA 94612-2901
Ph: 510-987-0078
Fax: 510-893-5212
Email: [log in to unmask]<mailto:[log in to unmask]>

ATOM RSS1 RSS2