Java Doc for Crawler.java in » Search-Engine » BDDBot » bdd » search » spider » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation

1.	6.0 JDK Core
2.	6.0 JDK Modules
3.	6.0 JDK Modules com.sun
4.	6.0 JDK Modules com.sun.java
5.	6.0 JDK Modules sun
6.	6.0 JDK Platform
7.	Ajax
8.	Apache Harmony Java SE
9.	Aspect oriented
10.	Authentication Authorization
11.	Blogger System
12.	Build
13.	Byte Code
14.	Cache
15.	Chart
16.	Chat
17.	Code Analyzer
18.	Collaboration
19.	Content Management System
20.	Database Client
21.	Database DBMS
22.	Database JDBC Connection Pool
23.	Database ORM
24.	Development
25.	EJB Server geronimo
26.	EJB Server GlassFish
27.	EJB Server JBoss 4.2.1
28.	EJB Server resin 3.1.5
29.	ERP CRM Financial
30.	ESB
31.	Forum
32.	GIS
33.	Graphic Library
34.	Groupware
35.	HTML Parser
36.	IDE
37.	IDE Eclipse
38.	IDE Netbeans
39.	Installer
40.	Internationalization Localization
41.	Inversion of Control
42.	Issue Tracking
43.	J2EE
44.	JBoss
45.	JMS
46.	JMX
47.	Library
48.	Mail Clients
49.	Net
50.	Parser
51.	PDF
52.	Portal
53.	Profiler
54.	Project Management
55.	Report
56.	RSS RDF
57.	Rule Engine
58.	Science
59.	Scripting
60.	Search Engine
61.	Security
62.	Sevlet Container
63.	Source Control
64.	Swing Library
65.	Template Engine
66.	Test Coverage
67.	Testing
68.	UML
69.	Web Crawler
70.	Web Framework
71.	Web Mail
72.	Web Server
73.	Web Services
74.	Web Services apache cxf 2.0.1
75.	Web Services AXIS2
76.	Wiki Engine
77.	Workflow Engines
78.	XML
79.	XML UI

Java

Java Tutorial

Illustrator Tutorials

GIMP Tutorials

C# / C Sharp

C# / CSharp Tutorial

C# / CSharp Open Source

SQL Server / T-SQL Tutorial

Oracle PL / SQL

Oracle PL/SQL Tutorial

Flash / Flex / ActionScript

VBA / Excel / Access / Word

XML

XML Tutorial

Microsoft Office PowerPoint 2007 Tutorial

Microsoft Office Excel 2007 Tutorial

Microsoft Office Word 2007 Tutorial

Java Source Code / Java Documentation » Search Engine » BDDBot » bdd.search.spider

Source Cross Reference

Class Diagram

Java Document (Java Doc)

java.lang .Object

java.lang .Thread

bdd.search.spider .Crawler

Crawler
public class Crawler extends Thread (Code)
	Written by Tim Macinta 1997 Distributed under the GNU Public License (a copy of which is enclosed with the source). Calling the Crawler's start() method will cause the Crawler to index all of the sites in its queue and then replace the main index with the updated index when it completes. The Crawler's queue should be filled with the starting URLs before calling start().

Field Summary
EnginePrefs	eng_prefs
boolean	exit_when_done
Indexer	indexer
FIFOQueue	q
Hashtable	urls_done
File	working_dir

Constructor Summary
public	Crawler(File working_dir, EnginePrefs eng_prefs) "working_dir" should be a directory that only this Crawler and a given Indexer will be accessing.

Method Summary
public void	addURL(URL url_to_queue) Takes "url_to_queue" and adds it to this Crawler's queue of URLs. This method should be used to add all of the desired starting URLs to the queue before the Crawler is started.
public static void	main(String arg) This is the method that is called when this class is invoked from the command line.
public static void	main(File file, EnginePrefs prefs)
public static void	main(File file, EnginePrefs prefs, boolean exit)
public void	run() This is where the actual crawling occurs.
URL	simplify(URL url) Takes "url" and removes all references to "/./" and "/../" .

Field Detail

eng_prefs
EnginePrefs eng_prefs(Code)

exit_when_done
boolean exit_when_done(Code)

indexer
Indexer indexer(Code)

q
FIFOQueue q(Code)

urls_done
Hashtable urls_done(Code)

working_dir
File working_dir(Code)

Constructor Detail

Crawler
public Crawler(File working_dir, EnginePrefs eng_prefs)(Code)
	"working_dir" should be a directory that only this Crawler and a given Indexer will be accessing. This means that if several Crawlers are running simultaneously, they should all be given different "working_dir" directories. Also, no other threads should write to this directory (except for the selected Indexer).

Method Detail

addURL
public void addURL(URL url_to_queue)(Code)
	Takes "url_to_queue" and adds it to this Crawler's queue of URLs. This method should be used to add all of the desired starting URLs to the queue before the Crawler is started. If the URL has already been processed or if it is an unallowed URL it is not added.

main
public static void main(String arg)(Code)
	This is the method that is called when this class is invoked from the command line. calling this method will cause a Crawler to be created and started with the starting URLs being listed in a file specified by the first argument (arg[0]). The file listing the URLs should contain only the URLs with each URL on a line by itself. Blank lines are allowed and lines beginning with "#" are considered comments and are ignored.

main
public static void main(File file, EnginePrefs prefs)(Code)

main
public static void main(File file, EnginePrefs prefs, boolean exit)(Code)

run
public void run()(Code)
	This is where the actual crawling occurs.

simplify
URL simplify(URL url)(Code)
	Takes "url" and removes all references to "/./" and "/../" . This can be used to help eliminate looping. Also removes all anchors (i.e., everything after and including a '#').

Fields inherited from java.lang.Thread

final public static int MAX_PRIORITY(Code)(Java Doc)
final public static int MIN_PRIORITY(Code)(Java Doc)
final public static int NORM_PRIORITY(Code)(Java Doc)

Methods inherited from java.lang.Thread

public static int activeCount()(Code)(Java Doc)
final public void checkAccess()(Code)(Java Doc)
native public int countStackFrames()(Code)(Java Doc)
native public static Thread currentThread()(Code)(Java Doc)
public void destroy()(Code)(Java Doc)
public static void dumpStack()(Code)(Java Doc)
public static int enumerate(Thread tarray)(Code)(Java Doc)
public static Map<Thread, StackTraceElement[]> getAllStackTraces()(Code)(Java Doc)
public ClassLoader getContextClassLoader()(Code)(Java Doc)
public static UncaughtExceptionHandler getDefaultUncaughtExceptionHandler()(Code)(Java Doc)
public long getId()(Code)(Java Doc)
final public String getName()(Code)(Java Doc)
final public int getPriority()(Code)(Java Doc)
public StackTraceElement[] getStackTrace()(Code)(Java Doc)
public State getState()(Code)(Java Doc)
final public ThreadGroup getThreadGroup()(Code)(Java Doc)
public UncaughtExceptionHandler getUncaughtExceptionHandler()(Code)(Java Doc)
native public static boolean holdsLock(Object obj)(Code)(Java Doc)
public void interrupt()(Code)(Java Doc)
public static boolean interrupted()(Code)(Java Doc)
final native public boolean isAlive()(Code)(Java Doc)
final public boolean isDaemon()(Code)(Java Doc)
public boolean isInterrupted()(Code)(Java Doc)
final public synchronized void join(long millis) throws InterruptedException(Code)(Java Doc)
final public synchronized void join(long millis, int nanos) throws InterruptedException(Code)(Java Doc)
final public void join() throws InterruptedException(Code)(Java Doc)
final public void resume()(Code)(Java Doc)
public void run()(Code)(Java Doc)
public void setContextClassLoader(ClassLoader cl)(Code)(Java Doc)
final public void setDaemon(boolean on)(Code)(Java Doc)
public static void setDefaultUncaughtExceptionHandler(UncaughtExceptionHandler eh)(Code)(Java Doc)
final public void setName(String name)(Code)(Java Doc)
final public void setPriority(int newPriority)(Code)(Java Doc)
public void setUncaughtExceptionHandler(UncaughtExceptionHandler eh)(Code)(Java Doc)
native public static void sleep(long millis) throws InterruptedException(Code)(Java Doc)
public static void sleep(long millis, int nanos) throws InterruptedException(Code)(Java Doc)
public synchronized void start()(Code)(Java Doc)
final public void stop()(Code)(Java Doc)
final public synchronized void stop(Throwable obj)(Code)(Java Doc)
final public void suspend()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
native public static void yield()(Code)(Java Doc)

Methods inherited from java.lang.Object

native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us

All other trademarks are property of their respective owners.