
Descriptor
**********

Package for parsing and processing descriptor data.

**Module Overview:**

   parse_file - Parses the descriptors in a file.

   Descriptor - Common parent for all descriptor file types.
     |- get_path - location of the descriptor on disk if it came from a file
     |- get_archive_path - location of the descriptor within the archive it came from
     |- get_bytes - similar to str(), but provides our original bytes content
     |- get_unrecognized_lines - unparsed descriptor content
     +- __str__ - string that the descriptor was made from

stem.descriptor.__init__.DocumentHandler(enum)

   Ways in which we can parse a "NetworkStatusDocument".

   Both **ENTRIES** and **BARE_DOCUMENT** have a 'thin' document,
   which doesn't have a populated **routers** attribute. This allows
   for lower memory usage and upfront runtime. However, if read time
   and memory aren't a concern then **DOCUMENT** can provide you with
   a fully populated document.

   +---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | DocumentHandler     | Description                                                                                                                                                                               |
   +=====================+===========================================================================================================================================================================================+
   | **ENTRIES**         | Iterates over the contained "RouterStatusEntry". Each has a reference to the bare document it came from (through its **document** attribute).                                             |
   +---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | **DOCUMENT**        | "NetworkStatusDocument" with the "RouterStatusEntry" it contains (through its **routers** attribute).                                                                                     |
   +---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | **BARE_DOCUMENT**   | "NetworkStatusDocument" **without** a reference to its contents (the "RouterStatusEntry" are unread).                                                                                     |
   +---------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

stem.descriptor.__init__.parse_file(descriptor_file, descriptor_type=None, validate=False, document_handler='ENTRIES', **kwargs)

   Simple function to read the descriptor contents from a file,
   providing an iterator for its "Descriptor" contents.

   If you don't provide a **descriptor_type** argument then this
   automatically tries to determine the descriptor type based on the
   following...

   * The @type annotation on the first line. These are generally only
     found in the CollecTor archives.

   * The filename if it matches something from tor's data directory.
     For instance, tor's 'cached-descriptors' contains server
     descriptors.

   This is a handy function for simple usage, but if you're reading
   multiple descriptor files you might want to consider the
   "DescriptorReader".

   Descriptor types include the following, including further minor
   versions (ie. if we support 1.1 then we also support everything
   from 1.0 and most things from 1.2, but not 2.0)...

   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | Descriptor Type                           | Class                                                                                                                                           |
   +===========================================+=================================================================================================================================================+
   | server-descriptor 1.0                     | "RelayDescriptor"                                                                                                                               |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | extra-info 1.0                            | "RelayExtraInfoDescriptor"                                                                                                                      |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | microdescriptor 1.0                       | "Microdescriptor"                                                                                                                               |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | directory 1.0                             | **unsupported**                                                                                                                                 |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | network-status-2 1.0                      | "RouterStatusEntryV2" (with a "NetworkStatusDocumentV2")                                                                                        |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | dir-key-certificate-3 1.0                 | "KeyCertificate"                                                                                                                                |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | network-status-consensus-3 1.0            | "RouterStatusEntryV3" (with a "NetworkStatusDocumentV3")                                                                                        |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | network-status-vote-3 1.0                 | "RouterStatusEntryV3" (with a "NetworkStatusDocumentV3")                                                                                        |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | network-status-microdesc-consensus-3 1.0  | "RouterStatusEntryMicroV3" (with a "NetworkStatusDocumentV3")                                                                                   |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | bridge-network-status 1.0                 | "RouterStatusEntryV3" (with a "BridgeNetworkStatusDocument")                                                                                    |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | bridge-server-descriptor 1.0              | "BridgeDescriptor"                                                                                                                              |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | bridge-extra-info 1.1 or 1.2              | "BridgeExtraInfoDescriptor"                                                                                                                     |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | torperf 1.0                               | **unsupported**                                                                                                                                 |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | bridge-pool-assignment 1.0                | **unsupported**                                                                                                                                 |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | tordnsel 1.0                              | "TorDNSEL"                                                                                                                                      |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+
   | hidden-service-descriptor 1.0             | "HiddenServiceDescriptor"                                                                                                                       |
   +-------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------+

   If you're using **python 3** then beware that the open() function
   defaults to using text mode. **Binary mode** is strongly suggested
   because it's both faster (by my testing by about 33x) and doesn't
   do universal newline translation which can make us misparse the
   document.

      my_descriptor_file = open(descriptor_path, 'rb')

   Parameters:
      * **descriptor_file** (*str,file,tarfile*) -- path or opened
        file with the descriptor contents

      * **descriptor_type** (*str*) -- descriptor type, this is
        guessed if not provided

      * **validate** (*bool*) -- checks the validity of the
        descriptor's content if **True**, skips these checks otherwise

      * **document_handler**
        (*stem.descriptor.__init__.DocumentHandler*) -- method in
        which to parse the "NetworkStatusDocument"

      * **kwargs** (*dict*) -- additional arguments for the descriptor
        constructor

   Returns:
      iterator for "Descriptor" instances in the file

   Raises :
      * **ValueError** if the contents is malformed and validate is
        True

      * **TypeError** if we can't match the contents of the file to a
        descriptor type

      * **IOError** if unable to read from the descriptor_file

class class stem.descriptor.__init__.Descriptor(contents, lazy_load=False)

   Bases: "object"

   Common parent for all types of descriptors.

   get_path()

      Provides the absolute path that we loaded this descriptor from.

      Returns:
         **str** with the absolute path of the descriptor source

   get_archive_path()

      If this descriptor came from an archive then provides its path
      within the archive. This is only set if the descriptor came from
      a "DescriptorReader", and is **None** if this descriptor didn't
      come from an archive.

      Returns:
         **str** with the descriptor's path within the archive

   get_bytes()

      Provides the ASCII **bytes** of the descriptor. This only
      differs from **str()** if you're running python 3.x, in which
      case **str()** provides a **unicode** string.

      Returns:
         **bytes** for the descriptor's contents

   get_unrecognized_lines()

      Provides a list of lines that were either ignored or had data
      that we did not know how to process. This is most common due to
      new descriptor fields that this library does not yet know how to
      process. Patches welcome!

      Returns:
         **list** of lines of unrecognized content
