StackOverflow2013

Note that there are some explanatory texts on larger screens.

plurals

POOptimizing S3 image file access for image processing, within a Django app
primarykey
Id
6974196
data
AcceptedAnswerId
0
AnswerCount
0
ClosedDate
CommentCount
2
CommunityOwnedDate
CreationDate
2011-08-07T16:49:55.873
FavoriteCount
3
LastActivityDate
2011-08-14T14:20:29.153
LastEditDate
2011-08-14T14:20:29.153
LastEditorUserId
298171
OwnerUserId
298171
ParentId
0
PostTypeId
1
Score
2
ViewCount
507
LastEditorDisplayName
text
Body
For the django apps I typically build, S3 is a no-brainer for storing any non-trivial static data... most notably images. It makes page loads much faster than I could otherwise. I use the S3BotoStorage filesystem backend in the <a href="https://bitbucket.org/david/django-storages/wiki/Home" rel="nofollow">django-storages package</a> and I have found it to be fantastically transparent and hassle-free w/r/t implementation. Not so much w/r/t operation, though: now, I'm building out a small family of apps, which all depend on a Django-centric image-processing platform. Most of the processor-bound operations I'm doing can either be handled within an HTTP request lifecycle; for the few processes that are more demanding, I use an async signal queue and a RESTful API to defuse potential bottlenecks through timing and UI considerations. That's all great, when working with image data local to the processing app. S3 throws a monkey wrench into it by making all file-object operations totally nondeterministic. The problem isn't failures (I get a random IOError or somesuch from inside the django-storages app maybe once a week), but the time it takes to access files, and the total lack of any sort of filesystem cache. I've done a bit of refactoring to support S3 -- scrubbing all absolute paths out of the codebase; implementing retries and workarounds for uncoöperative Boto requests -- my impetus for building out the signal queue, in fact, was in order to mitigate the S3 file-access overhead (the details of which I will spare you). The point is that if I'm supporting S3, I'd like to support it in the most awesome/productive way possible. Naturally, I don't want to screw things up or complicate things further by putting in the kind of caching layer that will take babysitting -- I'm after a straightforward (and preferably uncomplicated) way to speed up the file-object operations I'm performing on S3 images. For example, if I read from a given file object several times within a reasonable timeframe, it'd be great if the subsequent reads were cached enough so that each read didn't have to fetch the file anew from S3. Does anyone have a module recommendation, a sample implementation, a configuration tactic, or any combo of the above, with which I might address my S3 file-op woes?
Tags
<django><caching><image-processing><file-io><amazon-s3>
Title
Optimizing S3 image file access for image processing, within a Django app
singulars
PostAcceptedAnswerId
1. This table or related slice is empty.
PostParentId
1. This table or related slice is empty.
PostTypePostTypeId
1. PTQuestion
UserLastEditorUserId
1. USfish2000
UserOwnerUserId
1. USfish2000
plurals
PostLinksPostIdRelatedPostId
1. This table or related slice is empty.
PostLinksRelatedPostIdPostId
1. This table or related slice is empty.
PostsAcceptedAnswerId
1. This table or related slice is empty.
PostsParentIdCreationDate
1. This table or related slice is empty.
VotesPostIdCreationDate
1. VO
 singulars
 PostPostId
 POOptimizing S3 image file access for image processing, within a Django app
 UserUserId
 USfish2000
 VoteTypeVoteTypeId
 VTFavorite
2. VO
 singulars
 PostPostId
 POOptimizing S3 image file access for image processing, within a Django app
 UserUserId
 USbastula
 VoteTypeVoteTypeId
 VTFavorite
3. VO
 singulars
 PostPostId
 POOptimizing S3 image file access for image processing, within a Django app
 UserUserId
 This table or related slice is empty.
 VoteTypeVoteTypeId
 VTUpMod
CommentsPostId
1. COJust wondering what you came up with to address the time-to-access issue. I have a similar situation where I'm using Django+S3 to store files, but have some server-side processing tasks requiring access to the files.
 singulars
 PostPostId
 POOptimizing S3 image file access for image processing, within a Django app
 UserUserId
 USFiver

Querying!

Guidance

A row detail

Detail views are divided into sections. All the information in the data section comes from columns in the selected row. The other sections display data from other, related rows.

Related data can be related in a to-one or a to-many fashion. Captions of data related in a to-many fashion link to a list view showing a filtered view of the table.

Try moving around until you find a non-empty to-many entry and click on the label to get to one. You can move back to the root by clicking on the database name in the header.