Topic: if self.isCoder(): post() #Programming Thread (Read 902533 times)

Nadaka · « **Reply #2415 on:** May 18, 2012, 12:58:01 pm »

Quote from: alway on May 18, 2012, 12:52:02 pm

You could utilize key-value pairs with hash tables.

if the list won't fit in memory, the hash table probably wont either.

Siquo · « **Reply #2416 on:** May 18, 2012, 01:01:03 pm »

Use collisions to make disk-based hash buckets?

Twiggie · « **Reply #2417 on:** May 18, 2012, 01:11:26 pm »

Quote from: alway on May 18, 2012, 12:52:02 pm

You could utilize key-value pairs with hash tables.

i said this first.

i was laughed at

edit: does the fact that the files are logs from a server containing visiting ips help?
i guess it would help with bucketing/string matching, but i cant think of any other way that its different from the generic problem.

Willfor · « **Reply #2418 on:** May 18, 2012, 02:53:35 pm »

I think I may have just figured out how to solve the inherent problems of digitally mixing music in the most stupidly obvious way possible. I keep thinking "someone must have figured this out before me!" but I keep googling it, and so far I can't find anyone else who has tried this particular method. Mostly because I think it adds quite a bit of a load onto everything. Either that, or people have run into brick walls trying to implement it.

Unfortunately, I think it will be a lot of work to try to do it myself, but damn...

Adding together Bézier curves! You'd have to render all of the mixing stems as Bézier curves, of course, but it might be far smoother than having faders [paraphrasing] controlling the bit depth of linear arrays of integers or floats.

alway · « **Reply #2419 on:** May 18, 2012, 03:17:20 pm »

Quote from: Twiggie on May 18, 2012, 01:11:26 pm

Quote from: alway on May 18, 2012, 12:52:02 pm
You could utilize key-value pairs with hash tables.

i said this first.

i was laughed at

edit: does the fact that the files are logs from a server containing visiting ips help?
i guess it would help with bucketing/string matching, but i cant think of any other way that its different from the generic problem.

So another question: Are you sure it wasn't the right answer?

I've heard of people in our major having technical interviews in which, after they come to the correct solution, the interviewer asserts that it is not the correct solution in order to determine whether they will abandon the rational choice just because someone seemingly more knowledgeable disagrees.

... Though that said, if they are in fact an ip log, we can actually assume there is a pre-existing sort done on the data which can possibly be used to our advantage. A log of IPs connecting to a server will be sorted by time at which they connected due to the temporal nature of a log file. Depending on the sort of server, that may be very useful data.
If it's a server in which the client machine checks in at precisely the same time each day, it becomes trivial to find matching connections.
Even if we assume a client machine is connecting only when a user tells it to (like a web server), it gives information which allows us to guess at what time a user is more likely to connect. Go to your (or someone else's) bay12 profile page and click 'Show Stats;' users would tend to have fairly regular patterns of activity. If I were a normal human, the probability of me posting at 4AM given a post at 4PM is much less than the probability of posting at 5PM given a post at 4PM. Though utilizing this temporal correlation becomes much more difficult and less useful the less regular the client connection is known to be.

Edit: Furthermore, IPs contain inherent data about physical location based on which blocks are allocated to which region. Users may also be much more likely to access a server at certain times of day relative to local time; and local time can be roughly guessed at based on which RIR the IP is from. There may even be more precise methods of figuring out location; though I'm not personally familiar with them.

The problem may also have been a case where all the information wasn't supplied up front, and they wanted you to inquire further about details like the purpose of the server in order to prevent the answer from being too obvious from the start. I must say though, this problem is quite a fun little puzzle.

Twiggie · « **Reply #2420 on:** May 18, 2012, 03:34:15 pm »

the files are too big to fit into memory. you're not going to get them to fit by using more memory...

interesting thoughts about using the locations. though ofc its not going to give you a perfect answer.

actually. could you implement a hashmap in secondary memory? hmmmmmmmmm

Sowelu · « **Reply #2421 on:** May 18, 2012, 03:42:58 pm »

Danger with a bucketed approach: You don't know, ahead of time, what distribution the IDs have. If they go from 0 to 2 million, and you say "I'll make four buckets in intervals of 500k", they might 99% go into the first bucket.

So your buckets need to be small enough that they can always fit in half your memory, even if completely full. Still, it's my preferred method by far. Two lists of buckets in secondary memory, one per input file. Append to the end of the appropriate bucket.

Can an ID appear more than once in the file? If so, things get ugly because a bucket still might not fit in primary memory until it's been made unique. You don't want to test whether a value is already present in secondary memory, and you really don't want to sort in secondary memory. Hm.

Twiggie · « **Reply #2422 on:** May 18, 2012, 03:54:30 pm »

They can appear more than once, I think, but more than one appearance in the same file can be ignored.

I guess that'll add another layer of complexity, having to check if it's already in the bucket.

And yes, it did just occur to me that the buckets everyone is talking about are essentially a chained hashmap in secondary memory with a low number of slots. :p

also @alway, yeah they were expressing doubt at my gut feelings to complexities, but when i showed they were right they agreed.

although im not sure they agreed with my assessment that using the shunting yard algorithm to convert between infix and postfix was O(n)...

kaenneth · « **Reply #2423 on:** May 18, 2012, 06:48:20 pm »

For my project of looking for files with duplicate content on my 10TB array I used a form of hash buckets, specifically I hashed the first 1KB into single bytes stuffed as a char into a string to make 256 starting buckets, then I only had to detail compare the items that were in the same bucket.

The main refinement to that was a rule that if a bucket had more than 2 items, the next 2k of it's contents would get hashed into another byte and appended, if still more than 4, the next 4k... more than 8, then next 8k... but that's not gonna apply to the user ID case given.

Blargityblarg · « **Reply #2424 on:** May 19, 2012, 09:37:35 am »

So I've got myself an assignment in Java to hand in, and it ain't playing nice; I had to code a bunch of classes and a few methods in a partly-prewritten class to match a supplied output trace. The issue is my 'populateWorld' method,; meant to instantiate a bunch of City objects and a bunch of Country objects, jam the cities into the countries and the countries into a world object that was given as a parameter to the method. It's instantiating everything just fine, but with the putting it all together, it appears that the addCountry method I'm using to put the countries into the World isn't correctly copying over the cities within the countries; every Country object inside the World object is claiming not to have any cities inside it, whereas checking the countries before they're put into the world is fine.

E: To clarify, each World object has an array of Countries within it, and each Country has an array of Cities within it.

Nadaka · « **Reply #2425 on:** May 19, 2012, 10:55:15 am »

Using buckets on IP's in the real world... to get a more uniform distribution of buckets your key should most likely NOT be the highest order part of the IP.

Aqizzar · « **Reply #2426 on:** May 19, 2012, 01:06:35 pm »

I love realizing that I've learned things. I also love object oriented programming. I can't even imagine how to program without it. You'd have to, like, instantiate everything...

I just wish I had more time to do this for myself. Now that I program for a living, by the time I get home I'm sick of seeing Visual Studio staring back at me. Dammit, I've been poking at this roguelike for over four months now, and I'm just now getting around to interacting with NPCs. Not to mention my right hand hurts just putting it on the mouse.

Quote from: Blargityblarg on May 19, 2012, 09:37:35 am

It's instantiating everything just fine, but with the putting it all together, it appears that the addCountry method I'm using to put the countries into the World isn't correctly copying over the cities within the countries; every Country object inside the World object is claiming not to have any cities inside it, whereas checking the countries before they're put into the world is fine.

I had issues like this in Java. Are you sure you're not copying new instances of the same Countries into the World? If you think you can get away with it, throw some code up here so we can look at it.

Skyrunner · « **Reply #2427 on:** May 19, 2012, 03:01:25 pm »

What's the difference between OOP and non-OPP? How can I tell if I'm OOPing or not? o_o

In othernews, I think I have to rewrite my simulator because all the bugs are confusing and hard to understand. ): For example, I increment a certain counter from zero to thirteen times. Yet somehow it gets from 31~60. T_T

And I always copy my array of genes in sequential order. That means since I generate it in ascending order, it should stay sequential after moving. But again something happens and the list is randomly shuffled...

I'm so confused D:

GalenEvil · « **Reply #2428 on:** May 19, 2012, 07:51:34 pm »

Yeah, I had some issues with Java and containers once also but I managed to work though it okay. I'd like to see your code also BlargityBlarg, if you can get away with it. If your assignment keeps you from getting full source code from outside sources we could probably put up a pseudocode answer for you

@Skyrunner: the difference that I know of that separates OOP and non OOP is how modular your code is. Such as, instead of having a worker class that does add/remove/arithmetic operations on things passed to it you have all that code within the main program and it just gets inlined into it with worker methods. OOP is really good for things that need code ambiguity or code reusability in multiple areas of the program, or need a lot of the same object type being used but with different information associated with it. To tell if you are OOPing it up look at how modular your code is

What is your simulator doing Sky? Or what is it supposed to do?

Skyrunner · « **Reply #2429 on:** May 19, 2012, 08:00:21 pm »

What my simulator is supposed to do:

(When told to simulate one generation)

Tell 2 foxes to hunt 15 times each for a total of 30 times.
Juggle the various gene-storing arrays.
Have them mostly be in order, because I didn't tell them to shuffle them.
On average, there should be 3 or so rabbits killed.
The gene list should be missing exactly the number of rabbits that are dead.

What it is doing:

The 2 foxes somehow hunt more than 30 times and somehow kill 31~40 rabbits
Shuffle the gene arrays
But somehow the gene list has increased by three.

...T_T

Bay 12 Games Forum

News:

Author Topic: if self.isCoder(): post() #Programming Thread (Read 902533 times)

Nadaka

Re: if self.isCoder(): post() #Programming Thread

Siquo

Re: if self.isCoder(): post() #Programming Thread

Twiggie

Re: if self.isCoder(): post() #Programming Thread

Willfor

Re: if self.isCoder(): post() #Programming Thread

alway

Re: if self.isCoder(): post() #Programming Thread

Twiggie

Re: if self.isCoder(): post() #Programming Thread

Sowelu

Re: if self.isCoder(): post() #Programming Thread

Twiggie

Re: if self.isCoder(): post() #Programming Thread

kaenneth

Re: if self.isCoder(): post() #Programming Thread

Blargityblarg

Re: if self.isCoder(): post() #Programming Thread

Nadaka

Re: if self.isCoder(): post() #Programming Thread

Aqizzar

Re: if self.isCoder(): post() #Programming Thread

Skyrunner

Re: if self.isCoder(): post() #Programming Thread

GalenEvil

Re: if self.isCoder(): post() #Programming Thread

Skyrunner

Re: if self.isCoder(): post() #Programming Thread