What you would need is

by lkt153

1) a program that'll fetch pages from the web. I suggest you use wget for that, I know there's a windows version of it out there, just not sure where. It is controlable via command-line, so should be your answer.

2) build a HTML parser. That's the tricky part. Go to www.w3.org and find out HTML specs, decide which tags to implement first and implement support for them, then go including others, till it just works. I recommend looking into how lynx work, though I suppose that's a linux-only text browser, but it's the only one I know.

hth,
lkt153

Posted on Mar 18, 2008, 12:45 PM
from IP address 189.29.89.211

Respond to this message   

Goto Forum Home