Java Doc for CandidateURI.java in  » Web-Crawler » heritrix » org » archive » crawler » datamodel » Java Source Code / Java DocumentationJava Source Code and Java Documentation

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Web Crawler » heritrix » org.archive.crawler.datamodel 
Source Cross Reference  Class Diagram Java Document (Java Doc) 


java.lang.Object
   org.archive.crawler.datamodel.CandidateURI

All known Subclasses:   org.archive.crawler.datamodel.CrawlURI,
CandidateURI
public class CandidateURI implements Serializable,Reporter,CoreAttributeConstants(Code)
A URI, discovered or passed-in, that may be scheduled. When scheduled, a CandidateURI becomes a CrawlURI made with the data contained herein. A CandidateURI contains just the fields necessary to perform quick in-scope analysis.

Has a flexible attribute list that will be promoted into any CrawlURI created from this CandidateURI. Use it to add custom data or state needed later doing custom processing. See accessors/setters CandidateURI.putString(String,String) , CandidateURI.getString(String) , etc.
author:
   Gordon Mohr



Field Summary
final public static  intHIGH
     High scheduling priority.
final public static  intHIGHEST
     Highest scheduling priority.
final public static  intMEDIUM
     Medium priority.
final public static  intNORMAL
     Normal/low priority.

Constructor Summary
protected  CandidateURI()
     Constructor.
public  CandidateURI(UURI u)
    
public  CandidateURI(UURI u, String pathFromSeed, UURI via, CharSequence viaContext)
    

Method Summary
protected  voidclearAList()
    
public  booleancontainsKey(String key)
    
public  CandidateURIcreateCandidateURI(UURI baseUURI, Link link)
     Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
Parameters:
  baseUURI - BaseUURI for link.
Parameters:
  link - Link to wrap CandidateURI in.
public  CandidateURIcreateCandidateURI(UURI baseUURI, Link link, int scheduling, boolean seed)
     Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
Parameters:
  baseUURI - BaseUURI for link.
Parameters:
  link - Link to wrap CandidateURI in.
Parameters:
  scheduling - How new CandidateURI should be scheduled.
Parameters:
  seed - True if this CandidateURI is a seed.
public static  CandidateURIcreateSeedCandidateURI(UURI uuri)
    
public  StringflattenVia()
     Method returns string version of this URI's referral URI.
public  booleanforceFetch()
     If this method returns true, this URI should be fetched even though it already has been crawled.
public static  CandidateURIfromString(String uriHopsViaString)
     Given a string containing a URI, then optional whitespace delimited hops-path and via info, create a CandidateURI instance.
Parameters:
  uriHopsViaString - String with a URI.
public  AListgetAList()
     Assumption is that only one thread at a time will ever be accessing a particular CandidateURI.
public synchronized  StringgetCandidateURIString()
    
public  StringgetClassKey()
     Get the token (usually the hostname + port) which indicates what "class" this CrawlURI should be grouped with, for the purposes of ensuring only one item of the class is processed at once, all items of the class are held for a politeness period, etc.
public  intgetInt(String key)
    
public  longgetLong(String key)
    
public  ObjectgetObject(String key)
    
public  StringgetPathFromSeed()
    
public  String[]getReports()
    
public  intgetSchedulingDirective()
    
public  StringgetString(String key)
    
public  intgetTransHops()
     Tally up the number of transitive (non-simple-link) hops at the end of this CandidateURI's pathFromSeed. In some cases, URIs with greater than zero but less than some threshold such hops are treated specially.
public  StringgetURIString()
    
public  UURIgetUURI()
    
public  UURIgetVia()
    
public  CharSequencegetViaContext()
    
protected  voidinheritFrom(CandidateURI ancestor)
     Inherit (copy) the relevant keys-values from the ancestor.
public  booleanisLocation()
     True if this CandidateURI was result of a redirect:i.e.
public  booleanisSeed()
    
public  Iteratorkeys()
    
public  voidmakeHeritable(String key)
     Make the given key 'heritable', meaning its value will be added to descendant CandidateURIs.
public  voidmakeNonHeritable(String key)
     Make the given key non-'heritable', meaning its value will not be added to descendant CandidateURIs.
public  booleanneedsImmediateScheduling()
    
public  booleanneedsSoonScheduling()
    
public  voidputInt(String key, int value)
    
public  voidputLong(String key, long value)
    
public  voidputObject(String key, Object value)
    
public  voidputString(String key, String value)
    
protected  UURIreadUuri(String u)
    
public  voidremove(String key)
    
public  voidreportTo(String name, PrintWriter writer)
    
public  voidreportTo(PrintWriter writer)
    
public  booleansameDomainAs(CandidateURI other)
    
protected  voidsetAList(AList alist)
     Called when making a copy of another CandidateURI.
public  voidsetClassKey(String key)
    
public  voidsetForceFetch(boolean b)
     Method to signal that this URI should be fetched even though it already has been crawled.
public  voidsetIsSeed(boolean b)
     Set the isSeed attribute of this URI.
protected  voidsetPathFromSeed(String string)
    
public  voidsetSchedulingDirective(int schedulingDirective)
    
public  voidsetVia(UURI via)
    
public  StringsingleLineLegend()
    
public  StringsingleLineReport()
    
public  voidsingleLineReportTo(PrintWriter w)
    
public  StringtoString()
    

Field Detail
HIGH
final public static int HIGH(Code)
High scheduling priority. After any CandidateURI.HIGHEST .



HIGHEST
final public static int HIGHEST(Code)
Highest scheduling priority. Before any others of its class.



MEDIUM
final public static int MEDIUM(Code)
Medium priority. After any CandidateURI.HIGH .



NORMAL
final public static int NORMAL(Code)
Normal/low priority. Whenever/end of queue.




Constructor Detail
CandidateURI
protected CandidateURI()(Code)
Constructor. Protected access to block access to default constructor.



CandidateURI
public CandidateURI(UURI u)(Code)

Parameters:
  u - uuri instance this CandidateURI wraps.



CandidateURI
public CandidateURI(UURI u, String pathFromSeed, UURI via, CharSequence viaContext)(Code)

Parameters:
  u - uuri instance this CandidateURI wraps.
Parameters:
  pathFromSeed -
Parameters:
  via -
Parameters:
  viaContext -




Method Detail
clearAList
protected void clearAList()(Code)



containsKey
public boolean containsKey(String key)(Code)



createCandidateURI
public CandidateURI createCandidateURI(UURI baseUURI, Link link) throws URIException(Code)
Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
Parameters:
  baseUURI - BaseUURI for link.
Parameters:
  link - Link to wrap CandidateURI in. New candidateURI wrapper around link.
throws:
  URIException -



createCandidateURI
public CandidateURI createCandidateURI(UURI baseUURI, Link link, int scheduling, boolean seed) throws URIException(Code)
Utility method for creation of CandidateURIs found extracting links from this CrawlURI.
Parameters:
  baseUURI - BaseUURI for link.
Parameters:
  link - Link to wrap CandidateURI in.
Parameters:
  scheduling - How new CandidateURI should be scheduled.
Parameters:
  seed - True if this CandidateURI is a seed. New candidateURI wrapper around link.
throws:
  URIException -



createSeedCandidateURI
public static CandidateURI createSeedCandidateURI(UURI uuri)(Code)



flattenVia
public String flattenVia()(Code)
Method returns string version of this URI's referral URI. String version of referral URI



forceFetch
public boolean forceFetch()(Code)
If this method returns true, this URI should be fetched even though it already has been crawled. This also implies that this URI will be scheduled for crawl before any other waiting URIs for the same host. This value is used to refetch any expired robots.txt or dns-lookups. true if crawling of this URI should be forced



fromString
public static CandidateURI fromString(String uriHopsViaString) throws URIException(Code)
Given a string containing a URI, then optional whitespace delimited hops-path and via info, create a CandidateURI instance.
Parameters:
  uriHopsViaString - String with a URI. A CandidateURI made from passed uriHopsViaString.
throws:
  URIException -



getAList
public AList getAList()(Code)
Assumption is that only one thread at a time will ever be accessing a particular CandidateURI. the attribute list.



getCandidateURIString
public synchronized String getCandidateURIString()(Code)
This candidate URI as a string wrapped with 'CandidateURI(' +')'.



getClassKey
public String getClassKey()(Code)
Get the token (usually the hostname + port) which indicates what "class" this CrawlURI should be grouped with, for the purposes of ensuring only one item of the class is processed at once, all items of the class are held for a politeness period, etc. Token (usually the hostname) which indicateswhat "class" this CrawlURI should be grouped with.



getInt
public int getInt(String key)(Code)



getLong
public long getLong(String key)(Code)



getObject
public Object getObject(String key)(Code)



getPathFromSeed
public String getPathFromSeed()(Code)
path (hop-types) from seed



getReports
public String[] getReports()(Code)



getSchedulingDirective
public int getSchedulingDirective()(Code)
Returns the schedulingDirective.



getString
public String getString(String key)(Code)



getTransHops
public int getTransHops()(Code)
Tally up the number of transitive (non-simple-link) hops at the end of this CandidateURI's pathFromSeed. In some cases, URIs with greater than zero but less than some threshold such hops are treated specially.

TODO: consider moving link-count in here as well, caching calculation, and refactoring CrawlScope.exceedsMaxHops() to use this. Transhop count.




getURIString
public String getURIString()(Code)
URI StringCandidateURI.toString()



getUURI
public UURI getUURI()(Code)
UURI



getVia
public UURI getVia()(Code)
URI via which this one was discovered



getViaContext
public CharSequence getViaContext()(Code)
CharSequence context in which this one was discovered



inheritFrom
protected void inheritFrom(CandidateURI ancestor)(Code)
Inherit (copy) the relevant keys-values from the ancestor.
Parameters:
  ancestor -



isLocation
public boolean isLocation()(Code)
True if this CandidateURI was result of a redirect:i.e. Its parent URI redirected to here, this URI was what was in the 'Location:' or 'Content-Location:' HTTP Header.



isSeed
public boolean isSeed()(Code)
Whether seeded.



keys
public Iterator keys()(Code)



makeHeritable
public void makeHeritable(String key)(Code)
Make the given key 'heritable', meaning its value will be added to descendant CandidateURIs. Only keys with immutable values should be made heritable -- the value instance may be shared until the AList is serialized/deserialized.
Parameters:
  key - to make heritable



makeNonHeritable
public void makeNonHeritable(String key)(Code)
Make the given key non-'heritable', meaning its value will not be added to descendant CandidateURIs. Only meaningful if key was previously made heritable.
Parameters:
  key - to make non-heritable



needsImmediateScheduling
public boolean needsImmediateScheduling()(Code)
True if needs immediate scheduling.



needsSoonScheduling
public boolean needsSoonScheduling()(Code)
True if needs soon but not top scheduling.



putInt
public void putInt(String key, int value)(Code)



putLong
public void putLong(String key, long value)(Code)



putObject
public void putObject(String key, Object value)(Code)



putString
public void putString(String key, String value)(Code)



readUuri
protected UURI readUuri(String u)(Code)
Read a UURI from a String, handling a null or URIException
Parameters:
  u - String or null from which to create UURI the best UURI instance creatable



remove
public void remove(String key)(Code)



reportTo
public void reportTo(String name, PrintWriter writer)(Code)



reportTo
public void reportTo(PrintWriter writer) throws IOException(Code)



sameDomainAs
public boolean sameDomainAs(CandidateURI other) throws URIException(Code)
Compares the domain of this CandidateURI with that of another CandidateURI
Parameters:
  other - The other CandidateURI True if both are in the same domain, false otherwise.
throws:
  URIException -



setAList
protected void setAList(AList alist)(Code)
Called when making a copy of another CandidateURI.
Parameters:
  alist - AList to use.



setClassKey
public void setClassKey(String key)(Code)



setForceFetch
public void setForceFetch(boolean b)(Code)
Method to signal that this URI should be fetched even though it already has been crawled. Setting this to true also implies that this URI will be scheduled for crawl before any other waiting URIs for the same host. This value is used to refetch any expired robots.txt or dns-lookups.
Parameters:
  b - set to true to enforce the crawling of this URI



setIsSeed
public void setIsSeed(boolean b)(Code)
Set the isSeed attribute of this URI.
Parameters:
  b - Is this URI a seed, true or false.



setPathFromSeed
protected void setPathFromSeed(String string)(Code)

Parameters:
  string -



setSchedulingDirective
public void setSchedulingDirective(int schedulingDirective)(Code)

Parameters:
  schedulingDirective - The schedulingDirective to set.



setVia
public void setVia(UURI via)(Code)



singleLineLegend
public String singleLineLegend()(Code)



singleLineReport
public String singleLineReport()(Code)



singleLineReportTo
public void singleLineReportTo(PrintWriter w)(Code)



toString
public String toString()(Code)
The UURI this CandidateURI wraps as a string (We used return what CandidateURI.getCandidateURIString()returns on a toString -- use that method if you still needthis functionality).
See Also:   CandidateURI.getCandidateURIString()



Methods inherited from java.lang.Object
native protected Object clone() throws CloneNotSupportedException(Code)(Java Doc)
public boolean equals(Object obj)(Code)(Java Doc)
protected void finalize() throws Throwable(Code)(Java Doc)
final native public Class getClass()(Code)(Java Doc)
native public int hashCode()(Code)(Java Doc)
final native public void notify()(Code)(Java Doc)
final native public void notifyAll()(Code)(Java Doc)
public String toString()(Code)(Java Doc)
final native public void wait(long timeout) throws InterruptedException(Code)(Java Doc)
final public void wait(long timeout, int nanos) throws InterruptedException(Code)(Java Doc)
final public void wait() throws InterruptedException(Code)(Java Doc)

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.