Trees
At one of my jobs I was required to write a Word Racer clone module in PHP. I fired up editplus and started to code a simple random grid solver. The dictionary I was provided with contains about 56000 words.
First approach was to load the dictionary into an array and recursively walk the grid in depth with backtracking inquiring at each step if the current combination is a valid word.
When first tested on a 3x3 square grid looking for words with length from 3 to 8 letters everything worked out smooth. After increasing the grid size to 6x6 performance droped dramatically as expected.
I took a break and tried to think of how others ( i.e. Yahoo ) do it but nothing came to my tired mind.
After googling around I ran into Jeremy Elson's personal site who describes his way at solving the problem :
This is the whole key ! This is what I was doing wrong. I should've kept the dictionary in a tree and early cut out impossible combinations. Now that I know better, I want to see the performance improvement on a C++ version of the generator.
I figured that since I need a lot of generations per minute, I should build the generator in C++ and call it in PHP with exec(), system() or any execution function.
First approach was to load the dictionary into an array and recursively walk the grid in depth with backtracking inquiring at each step if the current combination is a valid word.
When first tested on a 3x3 square grid looking for words with length from 3 to 8 letters everything worked out smooth. After increasing the grid size to 6x6 performance droped dramatically as expected.
I took a break and tried to think of how others ( i.e. Yahoo ) do it but nothing came to my tired mind.
After googling around I ran into Jeremy Elson's personal site who describes his way at solving the problem :
My solver uses a depth-first search to generate the letter combinations, with a bound on the maximum depth, and a binary search to quickly look up each letter combination in the dictionary to see if it's a word or not.
This is the whole key ! This is what I was doing wrong. I should've kept the dictionary in a tree and early cut out impossible combinations. Now that I know better, I want to see the performance improvement on a C++ version of the generator.
I figured that since I need a lot of generations per minute, I should build the generator in C++ and call it in PHP with exec(), system() or any execution function.