Postmortem

Exploding Image

On my last summer intern, I was assigned to a bug fixing job. It seems not a big deal at first. Only fixing timeout while uploading image. But, when I try to add timeout to 1 min, still got a timeout. Up to 5 min, didn’t fix yet. Even, for 30 min timeout, no improvement. Then, I asked ops team to check the debug log. And, kaboom!! We got a sawtooth memory pattern and some RuntimeErrorException leftover.

Sawtooth footprint

Surely, there is something wrong. I tried to figure out how image preprocess work. We used imgscalr back then. Their site claimed that it is (up until this post made) the fastest Java image scaling library ever. I ran a few test against large images, and even a corrupted image. And then, I realized something bad. The only pre-preprocessing job we done was checking image size. If size of the image didn’t exceess few MB, then crop, rescale. But, what if some huge images with only monotonic colors appear like this?

It's a trap!

If you could guess what happened here, you may want to skip this part below.

Image compression

Let’s review a bit about our sight first. We got image of our surrounding by receiving light emitted and/or reflected by anothers through our eyes. Then, our eye lenses focus the light toward the retina. Our retina contains million light receptor nerves –cone and rod cells. Every discrete signal come to the cone and rod cells, then merged and percepted by our brain to make a single, continous perception of image. Thanks to it, actually we could see a single line by connecting even only a few dots.

And then, monitor was invented. Based on the fact above, we started to make a discrete grids contain light with certain color to simulate a sight. Every square in the grid (later called as pixel), contains combination of 3 or 4 colors combined (sRGB or CMYK). Now, we could receive sufficient light to be percepted as an image.

We could just save this raw grid to, for example a 2D array. But, this representation doesn’t suit at all for storage. If we want to save let’s say 1024 * 720 image, we need about 1024 * 720 * 4 = 2949120 pixel bytes or about 2.6 MB. Back then, when average disk capacity only about 1 GB, it could only contain 350 images with such resolution. Way too little! Then, how come we could store videos with thousand frames?

Thats how image compression plays a role. Image compression helps us eliminate the information redundancy from pixels. There are various algorithm developed for compress an image. What we usually called image format, most of it tied with a certain compression method. For example JFIF/JPEG format which performs lossy compression and widely known for it’s tiny ratio.

Workaround

After knowing how image save as a compressed format, you may have guessed what the image did to the preprocessor. Yup, given those nearly uniform pics uncompressed, it could take a lot –and I really mean a lot– of space. And, unless your Java heap set to several giga and EC2 instance could afford those space too, your system will definitely crash.

One of simple trick to get rid of this is read metadata of height and width separately. After we are really sure the pics don’t excess our limit, load the content using whichever lib that suits. It should be enough for now. But, you know, what really drive us to keep moving forward are problem. I’m looking forward for it.