Java Doc for Crawler.java in  » Search-Engine » BDDBot » bdd » search » spider » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Search Engine » BDDBot » bdd.search.spider 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   java.lang.Thread
      bdd.search.spider.Crawler

Crawler
public class Crawler extends Thread (Code)
Written by Tim Macinta 1997
Distributed under the GNU Public License (a copy of which is enclosed with the source).

Calling the Crawler's start() method will cause the Crawler to index all of the sites in its queue and then replace the main index with the updated index when it completes. The Crawler's queue should be filled with the starting URLs before calling start().


Field Summary
 EnginePrefseng_prefs
    
 booleanexit_when_done
    
 Indexerindexer
    
 FIFOQueueq
    
 Hashtableurls_done
    
 Fileworking_dir
    

Constructor Summary
public  Crawler(File working_dir, EnginePrefs eng_prefs)
     "working_dir" should be a directory that only this Crawler and a given Indexer will be accessing.

Method Summary
public  voidaddURL(URL url_to_queue)
     Takes "url_to_queue" and adds it to this Crawler's queue of URLs. This method should be used to add all of the desired starting URLs to the queue before the Crawler is started.
public static  voidmain(String arg)
     This is the method that is called when this class is invoked from the command line.
public static  voidmain(File file, EnginePrefs prefs)
    
public static  voidmain(File file, EnginePrefs prefs, boolean exit)
    
public  voidrun()
     This is where the actual crawling occurs.
 URLsimplify(URL url)
     Takes "url" and removes all references to "/./" and "/../" .

Field Detail
eng_prefs
EnginePrefs eng_prefs(Code)



exit_when_done
boolean exit_when_done(Code)



indexer
Indexer indexer(Code)



q
FIFOQueue q(Code)



urls_done
Hashtable urls_done(Code)



working_dir
File working_dir(Code)




Constructor Detail
Crawler
public Crawler(File working_dir, EnginePrefs eng_prefs)(Code)
"working_dir" should be a directory that only this Crawler and a given Indexer will be accessing. This means that if several Crawlers are running simultaneously, they should all be given different "working_dir" directories. Also, no other threads should write to this directory (except for the selected Indexer).




Method Detail
addURL
public void addURL(URL url_to_queue)(Code)
Takes "url_to_queue" and adds it to this Crawler's queue of URLs. This method should be used to add all of the desired starting URLs to the queue before the Crawler is started. If the URL has already been processed or if it is an unallowed URL it is not added.



main
public static void main(String arg)(Code)
This is the method that is called when this class is invoked from the command line. calling this method will cause a Crawler to be created and started with the starting URLs being listed in a file specified by the first argument (arg[0]). The file listing the URLs should contain only the URLs with each URL on a line by itself. Blank lines are allowed and lines beginning with "#" are considered comments and are ignored.



main
public static void main(File file, EnginePrefs prefs)(Code)



main
public static void main(File file, EnginePrefs prefs, boolean exit)(Code)



run
public void run()(Code)
This is where the actual crawling occurs.



simplify
URL simplify(URL url)(Code)
Takes "url" and removes all references to "/./" and "/../" . This can be used to help eliminate looping. Also removes all anchors (i.e., everything after and including a '#').



Fields inherited from java.lang.Thread
final public static int MAX_PRIORITY(Code)(Java Doc)
final public static int MIN_PRIORITY(Code)(Java Doc)
final public static int NORM_PRIORITY(Code)(Java Doc)

Methods inherited from java.lang.Thread
public static int activeCount()(Code)(Java Doc)
final public void checkAccess()(Code)(Java Doc)
native public int countStackFrames()(Code)(Java Doc)
native public static Thread currentThread()(Code)(Java Doc)
public void destroy()(Code)(Java Doc)
public static void dumpStack()(Code)(Java Doc)
public static int enumerate(Thread tarray)(Code)(Java Doc)
public static Map<Thread, StackTraceElement[]> getAllStackTraces()(Code)(Java Doc)
public ClassLoader getContextClassLoader()(Code)(Java Doc)
public static UncaughtExceptionHandler getDefaultUncaughtExceptionHandler()(Code)(Java Doc)
public long getId()(Code)(Java Doc)
final public String getName()(Code)(Java Doc)
final public int getPriority()(Code)(Java Doc)
public StackTraceElement[] getStackTrace()(Code)(Java Doc)
public State getState()(Code)(Java Doc)
final public ThreadGroup getThreadGroup()(Code)(Java Doc)
public UncaughtExceptionHandler getUncaughtExceptionHandler()(Code)(Java Doc)
native public static boolean holdsLock(Object obj)(Code)(Java Doc)
public void interrupt()(Code)(Java Doc)
public static boolean interrupted()(Code)(Java Doc)
final native public boolean isAlive()(Code)(Java Doc)
final public boolean isDaemon()(Code)(Java Doc)
public boolean isInterrupted()(Code)(Java Doc)
final public synchronized void join(long millis) throws InterruptedException(Code)(Java Doc)
final public synchronized void join(long millis, int nanos) throws InterruptedException(Code)(Java Doc)
final public void join() throws InterruptedException(Code)(Java Doc)
final public void resume()(Code)(Java Doc)
public void run()(Code)(Java Doc)
public void setContextClassLoader(ClassLoader cl)(Code)(Java Doc)
final public void setDaemon(boolean on)(Code)(Java Doc)
public static void setDefaultUncaughtExceptionHandler(UncaughtExceptionHandler eh)(Code)(Java Doc)
final public void setName(String name)(Code)(Java Doc)
final public void setPriority(int newPriority)(Code)(Java Doc)
public void setUncaughtExceptionHandler(UncaughtExceptionHandler eh)(Code)(Java Doc)
native public static void sleep(long millis) throws InterruptedException(Code)(Java Doc)
public static void sleep(long millis, int nanos) throws InterruptedException(Code)(Java Doc)
public synchronized void start()(Code)(Java Doc)
final public void stop()(Code)(Java Doc)
final public synchronized void stop(Throwable obj)(Code)(Java Doc)
final public void suspend()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
native public static void yield()(Code)(Java Doc)

Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.