When digging through the Google cache, I found printview pages, which can contain hundreds or thousands of posts. These are generated for people to print, just like this one:
http://www.zoolook.nl/forum/viewtopic.p ... view=print
The old site has the whole topic as print page, the phpbb3 site has 15 posts per page, just like the forum boards.
As you can see, there's no reference to a post ID or anything. But since the HTML code is readable, I could process them to values in a table. I could identify every user and date for every post. The hardest part is the message itself. I can process the message, but matching with the existing post in the old forum's database is hard. Partly because of the html code in the printview.php page and the phpbb code in the database records. So what I've done is: I've put every post in a SOLR search index and let SOLR compare the message from the printview page, which will give me the post-id.
The past week I've written some software to match posts with what's in the SOLR search index and that gave me some promising results. Of course, if a message is "Thanks" or ":-)", then a match will be hard. Also, messages which quote other messages might give me false positives too. So I have to check the list of matches by hand. But for most of the times, matching was done pretty good. I had a topic of 1500 posts and the first 700 posts were matched correctly. Pretty cool.
Anyway, long story short: thanks to this action, I have dug through 40 printview topics, added 12215 lost posts. Amongst these are mega-threads with sometimes 2500+ posts like the Teo&Tea thread, Oxygene 3D, In-Doors 2009 tour (not complete), How hot/cold is it where you are and The last film you saw in the cinema. These are results I really like. I still have to go through 29 other topics I've managed to fish out of the Google/Bing/Yandex cache. These include some other mega-threads like Some JMJ's photos, the official blog/instagram/twitter and facebook topics and some more. There's now 76863 of the 202821 posts , or almost 38%, restored. WAY more than I expeced.