Back on topic.... Something Blacken pointed out to me via PM, the more coding-trick-based challenges (I.E. "write a quine") aren't particularly helpful as far as actually progressing your skill. They're far from useless, I thoroughly enjoy them, but if you're only doing the challenges to try to further yourself, you'd probably be better off tackling more generally applicable challenges, I.E. algorithm design, or data structures (such as linked-lists, trees, etc) and how they work.
The challenges Alexhans wrote up in the original post would be a great launching point for someone fairly new to a language, since they'd teach input/output, parsing (and the high resolution timer thing is just useful in general). Perhaps we should try to put together some increasingly complex challenges along similar lines? Seems to me a series of challenges that teach a language, but force the user to learn to "RTFM/STFW" rather than just copying code from a tutorial, would be a far more effective teaching device anyways. Besides, it'd be a far more constructive way to funnel the talents of the more experienced programmers (which I am most definitely not, although I'd be glad to help in whatever way I can).
Meh, just my two cents, take it or leave it, for what it's worth.
This is the best post in the span of multiple pages, and not just because I'm cited in it. (Unbunch your panties, you lot, that was a joke!)
Now for the new best post.
Algorithms and data structures, not quines and minor string manipulation, are how you learn to work
regardless of language. While I strongly advocate avoiding C/C++ for newbies, this is due mostly to both being terribly inconsistent with themselves and with everybody else; virtually any other modern language (except maybe Perl and PHP) are about equivalent as far as I'm concerned. If you know how to write a binary search tree in Python, you can do it in Java or C# or whatever else very trivially. (You can write it in C++ without too much trouble either.)
If you want a solid, rigorous set of exercises that will drive home most of the core essentials, here's one. If you follow this list, do yourself a favor and
don't skip any even if you think you know them; I promise, it'll make sense, I've taught this shit before and this is a slightly simplified port of my outline for when I was teaching this stuff. Reuse of your own code (if you've already written a single-linked list, for example), is fine. Some of them are decidedly nontrivial exercises, but they should all be relatively doable in the given order. If you want to look up others' code to see how they did it--go for it.
This is basically the first three semesters of computer science at my university, so if you can master this, you'll be in a great place if your intention is go to college for computer science or for programming.
I encourage you to post your answers to the non-code questions here (yes, qwertyuiopas, it's still a "programming challenge," about 70% of programming is writing what you think and documenting what you did), but please put them in spoiler tags. (I'd appreciate it if you'd keep my numbering system and mark that outside spoiler tags, so we can all be sure of exactly what you're working on.)
If using C++,
use references instead of pointers at all times unless indicated otherwise. There is a reason for this; you'll figure it out as we go along. If your language does not support generics (C++ templates count), then you need a different language--sorry.
- Build a singly-linked list class (should be doable in two classes: list and node). This linked list should accept integers as data values. With this singly-linked list, do the following:
- Read up on Big O notation. Make sure that you're familiar with the difference between Big O runtime and Big O space complexity; both can be important.
- What is the Big O runtime of searching your list?
- What is the Big O runtime of inserting an element at the beginning of the list?
- The end of the list?
- The middle of the list?
- (really easy one) What is the Big O space complexity of your list?
- Build an iterator for this list. This iterator should not have a publicly accessible constructor; a nested class is likely the easiest way. They will be created by the list's main class and returned for user manipulation.
- Build a doubly-linked list, and then an iterator for that list.
- Build a vector of integers (an automatically resizing array). Keep in mind that having to resize with every add kind of sucks; you may want to add some reserve space with every resize.
- Read this explanation about amortized Big O. Why is this relevant to building a vector?
- What's the amortized Big O of insertions on this vector?
- Build interfaces (abstract classes in C++, interfaces damn near everywhere else) that expose a sane API for a tree. I will stress again that this is an interface; if you have any logic code in this class you did it wrong. This tree should accept integers as data values.
- Build a binary tree that implements your interface from #3.
- Inherit from your binary tree and build a binary search tree. (This will require additional extension of the API. That's OK.) This tree won't balance itself. That's OK. We'll do that next.
- Assume you have a balanced tree. What is the Big O runtime for finding any node (a behavior that I will refer to from now on as "Big O for searching")? (Another way of putting it, to perhaps help with a little confusion, is "what is the Big O runtime to discover that a node is not in the tree?")
- What is the degenerate case of a binary search tree? What is the Big O for searching of the degenerate case?
- Since we've identified the degenerate case for a binary search tree, let's make sure we don't have that. Inherit from your binary search tree and build an AVL tree. This will be the hardest exercise yet for people without a lot of algorithmic background. Don't worry, they just get nastier from here.
- How does this prevent the degenerate case?
- Why are the Big O for searching, insertion, and deletion all O(log n)? (I ask this because Wikipedia shows them all on the page I just linked.)
- Hope you liked the AVL tree. We're now going to build the other "easy" binary search tree: the red-black tree. A poem from a professor of mine:
From black root to tips,
black nodes agree: New leaves turn
red; from red, all black.
- Why are the Big O for searching, insertion, and deletion all O(log n)?
- Go back to your binary tree (not your binary search tree) and inherit from it to build a binary heap. Min-heap or max-heap, doesn't matter to me, although you could do both as separate classes.
- Alright, let's step back from data structures for a moment (don't worry, we'll head back there) and look at algorithms. Using only one array, implement selection sort.
- Selection sort has an O(n2) runtime. Why is it still taught?
- In what situation is a selection sort going to be useful?
- Implement insertion sort.
- Insertion sort has an O(n2) worst-case. What differentiates it from selection sort? (Hint: read about adaptive sorts.)
- What is an online sort? Why does this matter?
- Using the binary heap you built earlier, implement heapsort. This can be tricky, but it's important, because it helps you realize the value of proper data storage to create an easier problem (the sorting) for yourself later.
- Implement mergesort on your doubly linked list. This one should be very quick to do.
- Mergesort is a great example of a "linearithmic", or O(n log n) function. Explain why it's n log n and not some other order.
- We haven't covered threads here yet, but if you already know how to use them or want to learn them now--write a parallelized mergesort. This is among the easiest sorts to parallelize; why that is the case is left to you.
- I'm not going to ask you to implement quicksort, as it's kind of tricky, but you can do so optionally. However, you should study the Wikipedia page for it and be able to answer the following:
- Why can quicksort sometimes be quadratic in its runtime?
- How might you place the pivot to maximize efficiency? (This is not quite a trick question, but it is close.)
- Back to data structures for a bit, but we're going to change it up. If your language supports generic programming (yes, C++ templates count), extend (yes, this means "rewrite") your single and double linked lists, your vector, and your trees/heaps to accept generic types. This can be a little trickier than it appears in the case of trees. Also take into account that many data types may not have a way to compare them with others of their type--many languages have a feature that allows you to constrain generic types to those that implement certain interfaces. Do so. (Hint: trees and binary trees don't have to have comparators; binary search trees and binary heaps do.)
- Write a set interface. Again: if there's operational logic in this interface, urdoinitrong. This interface should be generic.
- Implement a set using your interface, where the backing data store is an array.
- What are the Big O properties of your array-backed set?
- Why is this a lousy method of building a set?
- Implement a set using your interface, where the backing data store is your binary tree. The reason we're doing a second set here is because it demonstrates code reusability: you can build abstractions atop of other abstractions to actually build the data structure you really want.
- Build an associative array (also called a map). This can be built with a set; as DrPizza noted, (Map<K, V> <=> Set<Pair<K, V>>, Set<K> <=> Map<K, K>). extending your BinaryTreeSet class may not be the best method for you to do so; it's up to you.
- Probably the hardest thing I've suggested so far: build a generic hashtable. (You will need two generic parameters.)
- Why does hashing make the Big O of searching so fast?
- Analyze your hashing function. Is it an avalanche function? If not, do some research and implement one as your hash. Remember to handle collisions.
- What are the downsides to hashtables?
- A Google engineer specifically said to me during an interview, "we use hashtables for more stuff than you'd think." Think about Google's computer infrastructure and the properties of hashtables. Why might Google use hashtables a lot, despite their downsides?
I can't stress enough, however, that
this stuff is the foundation of being a kickass programmer. You might never write a hashtable again once you're done with this...but you'll be a much better programmer for following through this, and be better capable of understanding what you're really doing when you're writing code.
I'll expand on this more later and add additional projects if people show signs of getting to the end of this list.