Discussion:
[compress] How to implement zip-bomb protection with Java 10
Dominik Stadler
2018-03-31 08:26:52 UTC
Permalink
Hi,

Apache POI is opening zip-files on a regular basis because Microsoft Excel/Word/... files are zip-files in their newer format. In order to prevent some types of denial-of-service-attacks, we have added functionality when opening Zip-files to not read files which expand a lot and thus could be used to overwhelm the main memory by providing small malicious file which explodes when uncompressed into memory. We call this zip-bomb protection.

Up to Java 9 we could use some workaround via reflection to inject a counting-InputStream into ZipFile/ZipEntry to detect an explosion in expanded data and this way prevent zip-bombs.

However in Java 10 this is not possible any more because the implementation of ZipFile was changed in a way that prevents this (hard cast to ZipFile$ZipFileInputStream in ZipFile).

So we are looking for a different way to count the number of extracted bytes while extracting to be able to stop as soon as the compression ratio reaches a certain limit.

Could we do this with Commons Compress? I.e. wrap the InputStreams with some counting so that we can stop during extracting.

Or does anybody know of a way to do zip-bomb-detection differently without resorting to reflection?

Thanks... Dominik.
Apache POI PMC/Comitter



---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@commons.apache.org
For additional commands, e-mail: user-***@commons.apache.org
Stefan Bodewig
2018-03-31 09:31:41 UTC
Permalink
Post by Dominik Stadler
Apache POI is opening zip-files on a regular basis because Microsoft
Excel/Word/... files are zip-files in their newer format. In order to
prevent some types of denial-of-service-attacks, we have added
functionality when opening Zip-files to not read files which expand a
lot and thus could be used to overwhelm the main memory by providing
small malicious file which explodes when uncompressed into memory. We
call this zip-bomb protection.
Up to Java 9 we could use some workaround via reflection to inject a
counting-InputStream into ZipFile/ZipEntry to detect an explosion in
expanded data and this way prevent zip-bombs.
This is using java.util.zip.ZipFile, not Commons Compress, right? I just
want to double check.
Post by Dominik Stadler
So we are looking for a different way to count the number of extracted
bytes while extracting to be able to stop as soon as the compression
ratio reaches a certain limit.
At least in Commons Compress, if you use ZipFile - as opposed to
ZipArchiveInputStream then the compressed and uncompressed sizes of each
ZipArchiveEntry are known before you try to read th stream. Can't you
simply reject reading entries who's uncompressed size is too big?

If you want to handle it on the stream level, there is a
BoundedInputStream class in Commons Compress that you could wrap around
the stream returned by ZipFile#getInputStream in order to enforce an
upper limit of bytes being read. This stream class expects to be used in
a certain context, though, and won't close the wrapped stream when close
is called - if you want to go down this route, you'll have to close the
underlying stream yourself.

Hope this helps

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@commons.apache.org
For additional commands, e-mail: user-***@commons.apache.org
kiwiwings
2018-03-31 18:05:23 UTC
Permalink
Post by Stefan Bodewig
This is using java.util.zip.ZipFile, not Commons Compress, right? I just
want to double check.
Correct
Post by Stefan Bodewig
... then the compressed and uncompressed sizes of each
ZipArchiveEntry are known before you try to read the stream. Can't you
simply reject reading entries who's uncompressed size is too big?
Are those sizes taken from the central directory?
Isn't it possible to tamper around those entries?
if so, would you rely on those?
Post by Stefan Bodewig
If you want to handle it on the stream level, there is a
BoundedInputStream class in Commons Compress that you could wrap around
the stream returned by ZipFile#getInputStream in order to enforce an
upper limit of bytes being read.
To prevent false positives with people who put an insane amount of data into
Excel sheets, I've chosen the ratio way, i.e. comparing how many bytes of
the compressed stream are read vs. the uncompressed size.
There's another limit check which does exactly what you've explained, but
this defaults to around 2GB.
Although POI has anyway a problem with the normal processing mode to deal
with such big files, there's also a SAX streaming mode which can take
benefit of this.

I haven't dived into the compress code yet, but my hope is, that the ratio
logic can be also applied there.

For a reference I've looked through the archives to find related discussions
[1], but maybe missed the private ones. So the discussion were similar to
this thread, i.e. about taking the metadata/sizes from the central
directory.

So my approach would be to try to move to commons-compress, check if the
ratio logic can be added and if this is possible, we might also need to look
at the performance trade-off (if any?) and decide if we want a switchable
implementation ...

Andi

[1]
https://issues.apache.org/jira/browse/COMPRESS-445
https://issues.apache.org/jira/browse/COMPRESS-386
http://apache-commons.680414.n4.nabble.com/compress-Security-considerations-bomb-links-absolute-paths-tp4698822.html



--
Sent from: http://apache-commons.680414.n4.nabble.com/Commons-User-f735979.html

---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@commons.apache.org
For additional commands, e-mail: user-***@commons.apache.org
Stefan Bodewig
2018-04-01 10:11:39 UTC
Permalink
Post by kiwiwings
Post by Stefan Bodewig
... then the compressed and uncompressed sizes of each
ZipArchiveEntry are known before you try to read the stream. Can't you
simply reject reading entries who's uncompressed size is too big?
Are those sizes taken from the central directory?
Isn't it possible to tamper around those entries?
if so, would you rely on those?
You are certainly correct. In that case wrapping the InputStream into a
BoundedInputStream capped at the claimed uncompressed size would
work. If the uncompressed size hasn't been tampered, then all is well,
otherwise the stream will simply stop reading any further.
Post by kiwiwings
To prevent false positives with people who put an insane amount of
data into Excel sheets, I've chosen the ratio way, i.e. comparing how
many bytes of the compressed stream are read vs. the uncompressed
size. There's another limit check which does exactly what you've
explained, but this defaults to around 2GB. Although POI has anyway a
problem with the normal processing mode to deal with such big files,
there's also a SAX streaming mode which can take benefit of this.
I haven't dived into the compress code yet, but my hope is, that the ratio
logic can be also applied there.
The CompressorInputStreams provide counts of the compressed bytes read,
but at least the InputStream provided by ZipFile#getInputStream does
not. So I'm afraid there currently is no easy way to tell how many
compressed bytes have been processed at any point in time.
Post by kiwiwings
For a reference I've looked through the archives to find related discussions
[1], but maybe missed the private ones.
I'm not aware of any private discussion.

In this thread as well as COMPRESS-445 we are only talking about ZIP,
but all the other compressing archive formats (and compression formats)
can certainly also be used for "bombs". Compress' test resources contain
tar.gz files of about 8MB size that expand to 8+GB - these are the tests
for really big tars.

What I'd really want to do is provide a solution that works for all our
formats and not just for ZIP.
Post by kiwiwings
So the discussion were similar to this thread, i.e. about taking the
metadata/sizes from the central directory.
True, and you are absolutely correct that this is short-sightened when
somebody who crafts a malicious input may just put the wrong values in
there.
Post by kiwiwings
So my approach would be to try to move to commons-compress, check if the
ratio logic can be added and if this is possible, we might also need to look
at the performance trade-off (if any?) and decide if we want a switchable
implementation ...
The first thing you'd need is to get the information of how much of the
uncompressed size has been read so far. Given an API of

InputStream getInputStream(ZipArchiveEntry e)

there isn't much room for providing that information apart from
documenting "InputStream will be a FooInputStream" and have
FooInputStream provide the information.

An alternative may be a progress listener approach which would
frequently provide information about the number of bytes processed. This
has been discussed before (on this list a few days ago and in
COMPRESS-207).

I'm absolutely open to providing the means to implement the
protection. The dev list might be the better place. Right now I'm not
convinced it should be enabled by default, but we can discuss that once
we get there.

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@commons.apache.org
For additional commands, e-mail: user-***@commons.apache.org
Andreas Beeker
2018-04-09 00:07:08 UTC
Permalink
Just a short update on this - I've provided a patch for POI to use commons compress [1]
So now we can focus on how the zip bomb handling can be provided by commons compress,
i.e. as you already have mentioned with "InputStream will be a FooInputStream",
some kind of interface which the InputStream can be cast to,
to request further compression ratio stats, is what I have in mind.

I think this pull mechanism is easier from user perspective than registering
a progress handler or getting the meta data pushed by a callback.

Andi.

[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=62187
Stefan Bodewig
2018-04-11 14:48:48 UTC
Permalink
Post by Andreas Beeker
Just a short update on this - I've provided a patch for POI to use
commons compress [1] So now we can focus on how the zip bomb handling
can be provided by commons compress, i.e. as you already have
mentioned with "InputStream will be a FooInputStream", some kind of
interface which the InputStream can be cast to, to request further
compression ratio stats, is what I have in mind.
I think this pull mechanism is easier from user perspective than
registering a progress handler or getting the meta data pushed by a
callback.
Sounds good to me.

If you want to discuss implementation details, we probably better move
over to the dev list.

Thanks

Stefan

---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@commons.apache.org
For additional commands, e-mail: user-***@commons.apache.org

Robert Paasche
2018-04-09 08:30:41 UTC
Permalink
Hi,

sorry it's a litte bit late, but have you tried to open an bug-report for
the hard cast on https://bugreport.java.com/ ?

Best
Rob

Robert Paasche
Senior Developer

pripares GmbH
Altheimer Eck 2
80331 MÃŒnchen

Tel +49 (0)89 45 22 808 - 30
Fax +49 (0)89 45 22 808 - 58
Mail ***@pripares.com
Web www.pripares.com

Handelsregister: Registergericht MÃŒnchen HRB 138701
Sitz der Gesellschaft: MÃŒnchen
GeschÀftsfÃŒhrer: Aßmann Christoph, Ertl Andreas

Diese E-Mail enthÀlt vertrauliche und/oder rechtlich geschÌtzte
Informationen. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail
irrtÃŒmlich erhalten haben, informieren Sie bitte sofort den Absender und
löschen Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte
Weitergabe dieser Mail und der darin enthaltenen Informationen sind nicht
gestattet.

This e-mail may contain confidential and/or privileged information. If you
are not the intended recipient (or have received this e-mail in error)
please notify the sender immediately and delete this e-mail. Any
unauthorized copying, disclosure or distribution of the material in this
e-mail is strictly forbidden.
Post by Dominik Stadler
Hi,
Apache POI is opening zip-files on a regular basis because Microsoft
Excel/Word/... files are zip-files in their newer format. In order to
prevent some types of denial-of-service-attacks, we have added
functionality when opening Zip-files to not read files which expand a lot
and thus could be used to overwhelm the main memory by providing small
malicious file which explodes when uncompressed into memory. We call this
zip-bomb protection.
Up to Java 9 we could use some workaround via reflection to inject a
counting-InputStream into ZipFile/ZipEntry to detect an explosion in
expanded data and this way prevent zip-bombs.
However in Java 10 this is not possible any more because the
implementation of ZipFile was changed in a way that prevents this (hard
cast to ZipFile$ZipFileInputStream in ZipFile).
So we are looking for a different way to count the number of extracted
bytes while extracting to be able to stop as soon as the compression ratio
reaches a certain limit.
Could we do this with Commons Compress? I.e. wrap the InputStreams with
some counting so that we can stop during extracting.
Or does anybody know of a way to do zip-bomb-detection differently without
resorting to reflection?
Thanks... Dominik.
Apache POI PMC/Comitter
---------------------------------------------------------------------
Continue reading on narkive:
Search results for '[compress] How to implement zip-bomb protection with Java 10' (Questions and Answers)
733
replies
How do you protect yourself against computer viruses?
started 2010-02-07 19:54:16 UTC
computers & internet
Loading...