Categories
Debugging Java

Breaking JarURLConnection

Consider the following:

URL url = new URL("jar:file:test.jar!file.txt");
String one = readStringFromUrl(url);
String two = readStringFromUrl(url);
assertThat(one, equalTo(two));

This should totally be a green test, right? And if you’re using a current Java version (like 21, or 17, hell, even 11 will do), it is a green test.

With Java 8 and a special JAR file, however, it is not… which I found out going the other way, i.e. with a test that failed with Java 11 when it was fine with Java 8.

This test created a JAR file with two files in it; both had the same name but different content. This may or may not be technically valid but it is possible to do that, by convincing Java’s JarFile (using evil reflection magic) that it has never seen the name of the file after writing it once. This specially-crafted JAR file was then handed into a ClassLoader, which was in turn used with a ServiceLoader. With Java 8, it then returned two objects, of two different classes, as expected.

(In hindsight, that expectation was… weird, to say the least, as I knew that the ClassLoader method used by the ServiceLoader returned URL objects, and I knew that the URLs it returned would be identical; why I thought they could ever yield different data, I’m not entirely sure.)

With Java 11, two objects were returned but they were both of the same class. At first I suspected that the JAR was somehow written incorrectly but could quickly verify that it was indeed very much written as intended, just like before.

The next step was to dive right into the ServiceLoader. It uses a lot of different iterators to create the iterator it returns but I finally managed to find the place where the URLs returned by the class loader were being opened, read, and parsed. The logic here changed a bit between Java 8 and Java 11 but I even after running it dozens of times in the debugger and staring at it for prolonged amounts of time I did not achieve any more clarity.

Okay, so, what about the code that reads data from the URLs? That job is delegated all the way down to a JarURLConnection (of which there are two!), and lo and behold, apparently it’s using a cache! A comparison between Java 8 and Java 11 showed that this particular code didn’t really change, though, so that cannot be the reason for the observed differences in behaviour, either.

So I dug deeper into how the URL connections were actually made, and here I finally struck gold: The implementation of ZipFile.getEntry() changed drastically between Java 8 and Java 11, from a native implementation in Java 8 to a Java-based solution. The latter uses a hash table based on the hash of an entry’s name so requesting two entries with the same name (as I did in the test) would indeed return the same entry. I am guessing that Java 8’s native implementation, when asked to locate an entry, actually locates the next entry with the given name, i.e. it did not always start at the beginning of the ZIP file’s central directory. I can’t be bothered to locate the native source code to confirm that, though.