You could utilize key-value pairs with hash tables.
i said this first.
i was laughed at
edit: does the fact that the files are logs from a server containing visiting ips help?
i guess it would help with bucketing/string matching, but i cant think of any other way that its different from the generic problem.
So another question: Are you sure it wasn't the right answer?
I've heard of people in our major having technical interviews in which, after they come to the correct solution, the interviewer asserts that it is not the correct solution in order to determine whether they will abandon the rational choice just because someone seemingly more knowledgeable disagrees.
... Though that said, if they are in fact an ip log, we can actually assume there is a pre-existing sort done on the data which can possibly be used to our advantage. A log of IPs connecting to a server will be sorted by time at which they connected due to the temporal nature of a log file. Depending on the sort of server, that may be
very useful data.
If it's a server in which the client machine checks in at precisely the same time each day, it becomes trivial to find matching connections.
Even if we assume a client machine is connecting only when a user tells it to (like a web server), it gives information which allows us to guess at what time a user is more likely to connect. Go to your (or someone else's) bay12 profile page and click 'Show Stats;' users would tend to have fairly regular patterns of activity. If I were a normal human, the probability of me posting at 4AM given a post at 4PM is much less than the probability of posting at 5PM given a post at 4PM. Though utilizing this temporal correlation becomes much more difficult and less useful the less regular the client connection is known to be.
Edit: Furthermore, IPs contain inherent data about physical location based on which blocks are allocated to which region. Users may also be much more likely to access a server at certain times of day relative to local time; and local time
can be roughly guessed at based on which RIR the IP is from. There may even be more precise methods of figuring out location; though I'm not personally familiar with them.
The problem may also have been a case where all the information wasn't supplied up front, and they wanted you to inquire further about details like the purpose of the server in order to prevent the answer from being too obvious from the start. I must say though, this problem is quite a fun little puzzle.