Class Spider

java.lang.Object
  extended byjava.lang.Thread
      extended bySpider
All Implemented Interfaces:
java.lang.Runnable

public class Spider
extends java.lang.Thread

Object used to search the web (or a subset of given domains) for a list of keywords

Author:
Mark Pendergast

Nested Class Summary
 class Spider.SpiderParserCallback
          Inner class used to html handle parser callbacks
 
Field Summary
 
Fields inherited from class java.lang.Thread
MAX_PRIORITY, MIN_PRIORITY, NORM_PRIORITY
 
Constructor Summary
Spider(javax.swing.JTree atree, javax.swing.JTextArea amessagearea, javax.swing.JLabel astatlabel, java.lang.String astartsite, java.lang.String[] akeywordlist, java.lang.String[] aipdomainlist, int asitelimit, int adepthlimit)
          Creates a new instance of Spider
 
Method Summary
 boolean depthLimitExceeded(javax.swing.tree.DefaultMutableTreeNode node)
          Check depth of search
static java.lang.String fixHref(java.lang.String href)
          repairs a sloppy href, flips backwards /, adds missing /
 void run()
          start running the search in a new thread
 void searchWeb(javax.swing.tree.DefaultMutableTreeNode parentnode, java.lang.String urlstr)
          recursive routine to search the web
 void stopSearch()
          Stops the search.
 boolean urlHasBeenVisited(java.lang.String urlstring)
          search the url search tree to see if we've already visited the specified url
 
Methods inherited from class java.lang.Thread
activeCount, checkAccess, countStackFrames, currentThread, destroy, dumpStack, enumerate, getContextClassLoader, getName, getPriority, getThreadGroup, holdsLock, interrupt, interrupted, isAlive, isDaemon, isInterrupted, join, join, join, resume, setContextClassLoader, setDaemon, setName, setPriority, sleep, sleep, start, stop, stop, suspend, toString, yield
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
 

Constructor Detail

Spider

public Spider(javax.swing.JTree atree,
              javax.swing.JTextArea amessagearea,
              javax.swing.JLabel astatlabel,
              java.lang.String astartsite,
              java.lang.String[] akeywordlist,
              java.lang.String[] aipdomainlist,
              int asitelimit,
              int adepthlimit)
Creates a new instance of Spider

Parameters:
atree - JTree used to display the search space
amessagearea - JTextArea used to display error/warning messages
astatlabel - JLabel to display number of searched sites and hits
akeywordlist - list of keywords to search for
aipdomainlist - list of top level domains
asitelimit - maximum number of web pages to look at
adepthlimit - maximum number of levels down to search (controls recursion)
astartsite - web site to use to start the search
Method Detail

run

public void run()
start running the search in a new thread


urlHasBeenVisited

public boolean urlHasBeenVisited(java.lang.String urlstring)
search the url search tree to see if we've already visited the specified url

Parameters:
urlstring - url to search for
Returns:
true if the url is already in the tree

depthLimitExceeded

public boolean depthLimitExceeded(javax.swing.tree.DefaultMutableTreeNode node)
Check depth of search

Parameters:
node - search tree node to test the depth limit of
Returns:
true if depth limit exceeded

fixHref

public static java.lang.String fixHref(java.lang.String href)
repairs a sloppy href, flips backwards /, adds missing /

Parameters:
href - web site reference
Returns:
repaired web page reference

searchWeb

public void searchWeb(javax.swing.tree.DefaultMutableTreeNode parentnode,
                      java.lang.String urlstr)
recursive routine to search the web

Parameters:
parentnode - parentnode in the search tree
urlstr - web page address to search

stopSearch

public void stopSearch()
Stops the search.