Topic: if self.isCoder(): post() #Programming Thread (Read 903734 times)

MagmaMcFry · « **Reply #6840 on:** January 05, 2015, 06:20:38 pm »

Alternative approach: Instead of keeping a filehandle open, you could store a short string buffer per file and refill that with the next part of your file every time it runs out.
Either you have lots of files open at once, or you need to open and close every file approximately Lots of times. You decide which is better. It probably depends on the numbers in your problem.

A variation of my approaches would be to run multiple iterations of merging just N files at once and writing the intermediate files to disk temporarily. It'd be much slower, but it would guarantee O(1) CPU memory usage. Again, you decide. I'm not familiar with those performance specifics.

da_nang · « **Reply #6841 on:** January 05, 2015, 06:54:59 pm »

If you have access to a self-balancing binary search tree class (or just a regular BST if you feel lucky), you should be able to fill a tree with the integers from all the files by parsing each file individually (noting that a tree has unique elements) and then simply traverse the tree. Excluding file operations, it should be O(n log n).

ed boy · « **Reply #6842 on:** January 05, 2015, 07:07:43 pm »

Quote from: da_nang on January 05, 2015, 06:54:59 pm

If you have access to a self-balancing binary search tree class (or just a regular BST if you feel lucky), you should be able to fill a tree with the integers from all the files by parsing each file individually (noting that a tree has unique elements) and then simply traverse the tree. Excluding file operations, it should be O(n log n).

how much memory would a binary search tree take compared to loading the values individually?

da_nang · « **Reply #6843 on:** January 05, 2015, 07:32:42 pm »

Quote from: ed boy on January 05, 2015, 07:07:43 pm

Quote from: da_nang on January 05, 2015, 06:54:59 pm
If you have access to a self-balancing binary search tree class (or just a regular BST if you feel lucky), you should be able to fill a tree with the integers from all the files by parsing each file individually (noting that a tree has unique elements) and then simply traverse the tree. Excluding file operations, it should be O(n log n).
how much memory would a binary search tree take compared to loading the values individually?

The amount of memory used by a BST is proportional to the number of unique integers in addition to implementation dependent overhead.
If you're worried about being unable to load all of these into memory then you can exploit the subtrees. Implement each tree with a max size of 2ⁿ⁺¹-1 integers (for natural number n) and wrap the tree with file pointers. One pointer for the root and at most 2ⁿ⁺¹ pointers for the leaves. Each file pointer either points to null or another tree file.

Each tree file will thus require size(int)*(2ⁿ⁺¹-1) + size(*File)*(2ⁿ⁺¹+1) plus overhead in memory. There will also be operational overhead due to having to open and close files.

I'm not so sure if there are any standard libraries implementing such trees, however.

alway · « **Reply #6844 on:** January 07, 2015, 02:43:30 am »

Quote from: MagmaMcFry on January 05, 2015, 06:20:38 pm

Alternative approach: Instead of keeping a filehandle open, you could store a short string buffer per file and refill that with the next part of your file every time it runs out.
Either you have lots of files open at once, or you need to open and close every file approximately Lots of times. You decide which is better. It probably depends on the numbers in your problem.

A variation of my approaches would be to run multiple iterations of merging just N files at once and writing the intermediate files to disk temporarily. It'd be much slower, but it would guarantee O(1) CPU memory usage. Again, you decide. I'm not familiar with those performance specifics.

In addition to aforementioned stuff with file handles, there may be a hard limit to the number of files you are able to keep open at one time. I'm not sure exactly what your limit will be, but a google search suggests it is dependent on your environment and such, but is probably around the 512-8192 range. So be sure to test that before embarking down any particular route that requires lots of them.

As for the costs of accessing large numbers of files and such scattered around, keep in mind that defragmenting a hard drive to improve speed is a thing. If you are using an SSD, I don't believe it will be as much of an issue, but if you are using an actual spinning platter... Well, it's a bloody long time. https://en.wikipedia.org/wiki/Hard_disk_drive#Time_to_access_data Several milliseconds per file, according to that. Overall, I recommend going with McFry's suggestion of merging it all to a single source rather than opening and closing files or such. Even that first access of thousands of files, you're probably talking several seconds of runtime.

miauw62 · « **Reply #6845 on:** January 07, 2015, 09:52:36 am »

I've done a tutorial on C++, but I want to actually put this into practice a bit to aid in learning.
Somebody in IRC suggested coding pong, but I actually have no idea how to start on this. I've never done my own rendering or anything. I know there are libraries for these things, but I dont know which ones I'd best use or how to use them.

Reelya · « **Reply #6846 on:** January 07, 2015, 10:03:24 am »

Start with curses (ascii), then move onto graphics packages. Tetris is a good first project for ascii

miauw62 · « **Reply #6847 on:** January 07, 2015, 10:54:09 am »

Is using curses on Windows a good idea, though?

Reelya · « **Reply #6848 on:** January 07, 2015, 10:57:44 am »

Sure, use PDcurses, and make sure you start with a completely blank project (not the windows console project) if you used VC++

Get the pdcdllw.zip file, which is the easiest way to get started. Just include the .h file, link the lib and put the dll in root folder for the project.

http://sourceforge.net/projects/pdcurses/files/pdcurses/3.4/

The tutorials you can find are all for ncurses, but I found that everything they mention worked perfectly for pdcurses (which is the open source windows port of the project).

Putnam · « **Reply #6849 on:** January 07, 2015, 05:03:35 pm »

Quote from: miauw62 on January 07, 2015, 10:54:09 am

Is using curses on Windows a good idea, though?

yes

Rose · « **Reply #6850 on:** January 08, 2015, 03:56:32 am »

I have a class that acts as a sort of fuzzy lookup table for DF materials, where I can apply a color to, for example, all leaves, by selecting materials by "PLANT:*:LEAF" and applying a green color to all of them.

It does this right now by going through the entire list of all materials every time a new color is applied, and testing the name against each one. Obviously, this is very slow.

Anybody have any ideas how to optimize it?

Most obvious optimization I can think of is to detect if there's no wildcards in the input string, and in that case just directly get the exact name, but that doesn't solve the problem for the rest.

This is the class:
https://github.com/JapaMala/armok-vision/blob/master/Assets/MapGen/MaterialMatcher.cs

Reelya · « **Reply #6851 on:** January 08, 2015, 04:34:15 am »

Break up the materials into a heirarchical database. Have a root node, and branches for each main type, INORGANIC, PLANT etc. Every node in the database becomes a token. You need to pre-prepare specialized data-structures to get this much faster than it is now.

Then, do matches on tokens that have already been split up, rather than an entire mat def with wildcards. e.g. PLANT would contain a vector referencing all PLANTs, so if your top-level search matches "PLANT" you only have to look inside the PLANT vector for future matches. Each PLANT would have a vector for all materials it contains, and this would point at a vector, e.g. a "LEAF" vector, which references all things with "LEAF" as a bottom-level material type.

For "LEAF" as a concept you could have all the plants with leaves containing a reference to a LEAF vector, this would tell you which plants have leaves, one way to do this would be by searching all the plants materials. But the LEAF data structure could actually contain references to all things that have leaves. So when you get the Query PLANT:*:LEAF, you could work backwards. Scan the LEAF vector, and for each item check if the mid-string matches the direct ancestor, you can actually skip the check altogether by detecting a pure '*' wildcard in the middle, then check for each one "is it a plant?" just in case some animal has leaves (Ents maybe).

Putnam · « **Reply #6852 on:** January 08, 2015, 05:08:28 am »

Looks like C# is the de facto language for DF library-ish stuff, huh.

Rose · « **Reply #6853 on:** January 08, 2015, 05:14:44 am »

It actually isn't, it's C++.

I'm using C# because Unity uses it, and also it's so nice.

Putnam · « **Reply #6854 on:** January 08, 2015, 05:29:41 am »

I mean non-DFHack library-ish stuff, heh.

Which isn't much, to be fair.

Bay 12 Games Forum

News:

Author Topic: if self.isCoder(): post() #Programming Thread (Read 903734 times)

MagmaMcFry

Re: if self.isCoder(): post() #Programming Thread

da_nang

Re: if self.isCoder(): post() #Programming Thread

ed boy

Re: if self.isCoder(): post() #Programming Thread

da_nang

Re: if self.isCoder(): post() #Programming Thread

alway

Re: if self.isCoder(): post() #Programming Thread

miauw62

Re: if self.isCoder(): post() #Programming Thread

Reelya

Re: if self.isCoder(): post() #Programming Thread

miauw62

Re: if self.isCoder(): post() #Programming Thread

Reelya

Re: if self.isCoder(): post() #Programming Thread

Putnam

Re: if self.isCoder(): post() #Programming Thread

Rose

Re: if self.isCoder(): post() #Programming Thread

Reelya

Re: if self.isCoder(): post() #Programming Thread

Putnam

Re: if self.isCoder(): post() #Programming Thread

Rose

Re: if self.isCoder(): post() #Programming Thread

Putnam

Re: if self.isCoder(): post() #Programming Thread