I'll describe all three (plus divide and conquer) the way I was taught them and see them.
Greedy Algorithms:Greedy algorithms make locally optimal choices, hoping that it leads to an optimal overall solution or a close enough approximation. For some problems, a well-designed greedy algorithm will find the most optimal solution. Other times, it's simply a trade-off, gaining speed in exchange for accuracy.
Consider this extremely simple shortest path algorithm: at every node, choose the neighbor with the shortest distance. It sounds sensible upfront, but upon further examination, it's easy to see where it could give a solution that really isn't the shortest path: for example, say the algorithm give a path with the following distances 2, 7, 3, 8--20 overall. However, there could be a path like this too 3, 1, 2, 4--10 overall--yet the algorithm would miss it as the first 2 is shorter than the 3.
Dynamic Programming: I find it hard to define dynamic programming as what it is. Instead, I'm going to define it in terms of what you should use it for. Some people (and Wikipedia) will tell you that dynamic programming is useful for problems that exhibit Optimal Substructure and Overlapping Subproblems, and often makes use of memoization. That tells you pretty much everything and nothing at the same time.
What is Optimal Substructure?
Optimal substructure involves being able break a problem into smaller, concise subproblems. If you can solve these subproblems optimally and combine their solutions to get the optimal solution to the overall problem, you have optimal substructure. (NOTE: this is different than simply making the optimal local choice--the local choice is unaware, and often simply a step rather than an entire subproblem).
Example -- Shortest paths exhibit optimal substructure too. For example, if you know the shortest path to Rome from Vienna passes through Ravenna, then you can find that path by finding the shortest path from Vienna to Ravenna, and the shortest path from Ravenna to Rome and combine them. Likewise, many recursive problems exhibit optimal substructure. For example, Mephisto brought up Fibonacci numbers: Calculating Fib(5) requires calculating Fib(4) and Fib(3), if you calculate these optimally and add them, you will have calculated Fib(5) optimally.
What are Overlapping Subproblems?
Overlapping subproblems occur when a subproblem is encountered/depended upon multiple times when solving the overall problem. Fibbonacci numbers are pretty much the clearest example of this: calculating Fib(5) requires calculating Fib(4) and Fib(3), and calculating Fib(4) requires calculating Fib(3) and Fib(2), and so on. You can easily see that a subproblem, Fib(3) (as well as Fib(2) and Fib(1), are used more than once.
What is Memoization?
There's a natural observation you can make from optimal subproblems: if you encounter a subproblem more than once, why shouldn't you just save the answer to that subproblem for later? This is in essence memoization. Instead of spending
time calculating a sub-problem every time, you exchange that cost for memory
space by saving the answer.
Approaches to dynamic programming:
There are generally two approaches to dynamic programming: top-down and bottom-up. Conceptually, top-down starts from the overall problem and works its way through the subproblems; whereas bottom-up starts from the subproblems and builds up to the overall problem. More practically, top-down works from the start of the overall problem and saves the solutions to subproblems once they're encountered; whereas bottom-up calculates the subproblems first then fits them together to solve the overall problem.
Let's go back to the Fibonacci* example. The top down approach to solving Fib(5) will attempt to solve Fib(4) and Fib(3), which will both encounter Fib(1) and Fib(2). So, to get Fib(3), you use the two base cases Fib(1) and Fib(2). You then save Fib(3). For solving Fib(4), you need Fib(3), which you already solved and stored, and a base-case Fib(2).
The bottom-up approach to this would be to start from Fib(1) and calculate upward until you reach Fib(5).
Divide and Conquer:I'm mentioning divide and conquer because it's easy to understand in and of itself and is often included in many algorithm design classes. Divide and Conquer splits a problem into smaller subproblems repeatedly until it can solve one directly and recombine the solutions. Like Dynamic Programming, D&C problems exhibit optimal substructure; however, unlike Dynamic Programming, they may or may not have overlapping subproblems (they often don't). Often, subproblems themselves exhibit optimal substructure and are often extremely similar. Identifying a D&C algorithm is generally easy, but designing an efficient D&C algorithm can take years worth of skill.
I'm not going to type out an example of a D&C algorithm; rather, just watch the animated visualization for
Mergesort and
Bubblesort. You should be able to see why Mergesort is D&C given the stark differences between the two.
Backtracking:Backtracking builds up to a solution by discarding partial candidates until there's zero* or more candidates that are correct. A partial candidate solves part of the problem, however, continuing with this candidate may or may not be valid for the rest of the problem. For example, say you've solved 1 face of a Rubik's cube--this is a partial candidate for solving the whole cube. You might be able to solve the rest of the faces from this point, or the order of turns you made to solve that face might be wrong rendering you unable to solve the cube without undoing (aka backtracking) from the current partial candidate. I like to think of backtracking as an organized guess-and-check, but that somewhat oversimplifies it. Also, backtracking can only be used on problems that can have partial candidates. That is, if you look at the steps to solve a problem, and they branch out at different decisions which may or may not lead to the right answer, you probably have partial candidates.
*Sometimes it's useful or intended to find 0 solutions.
As Mephisto said, 8 Queens is one of the most cited examples of backtracking. Other examples are Sudoku or crossword solving algorithms, solving (not necessarily the most efficiently) the knapsack problem, and more. It's also used to form the internals of some Logic Programming Languages such as Proglog. It may actually be a good idea to play around with some Prolog examples to get the feel for it's backtracking. After a quick look, I like these:
http://www.doc.gold.ac.uk/~mas02gw/prolog_tutorial/prologpages/search.html