Enterprise Models¶
The Prototype Configuration Model¶
-
class
tgt_grease.enterprise.Model.
PrototypeConfig
(ioc=None)¶ Bases:
object
Responsible for Scanning/Detection/Scheduling configuration
Structure of Configuration:
{ 'configuration': { 'pkg': [], # <-- Loaded from pkg_resources.resource_filename('tgt_grease.enterprise.Model', 'config/') 'fs': [], # <-- Loaded from `<GREASE_DIR>/etc/*.config.json` 'mongo': [] # <-- Loaded from the Configuration Mongo Collection }, 'raw': [], # <-- All loaded configurations 'sources': [], # <-- list of sources found in configurations 'source': {} # <-- keys will be source values list of configs for that source 'names': [], # <-- all configs via their name so to allow dialing 'name': {} # <-- all configs via their name so to allow being dialing }
Structure of a configuration file:
{ "name": String, "job": String, "exe_env": String, # <-- If not provided will be default as 'general' "source": String, "logic": { # I need to be the logical blocks for Detection } }
-
ioc
¶ IOC access
Type: GreaseContainer
-
getConfiguration
()¶ Returns the Configuration Object loaded into memory
Returns: Configuration object Return type: dict
-
get_config
(name)¶ Get Configuration by name
Parameters: name (str) – Configuration name to get Returns: Configuration if found else empty dict Return type: dict
-
get_names
()¶ Returns the list of names of configs
Returns: List of config names Return type: list
-
get_source
(name)¶ Get all configuration by source by name
Parameters: name (str) – Source name to get Returns: Configuration if found else empty dict Return type: list[dict]
-
get_sources
()¶ Returns the list of sources to be scanned
Returns: List of sources Return type: list
-
load
(reloadConf=False, ConfigurationList=None)¶ [Re]loads configuration data about the current execution node
Configuration data loads from 3 places in GREASE. The first is internal to the package, if one were to manually add their own files into the package in the current directory following the file pattern. The next is following the same pattern but loaded from <GREASE_DIR>/etc/. The final place GREASE looks for configuration data is from the configuration collection in MongoDB
Parameters: - reloadConf (bool) – If True this will reload the global object. False will return the object
- ConfigurationList (list of dict) – If provided will load the list of dict for config after validation
Note
Providing a configuration automatically reloads the memory structure of prototype configuration
Returns: Current Configuration information Return type: dict
-
load_from_fs
(directory)¶ Loads configurations from provided directory
Note
Pattern is *.config.json
Parameters: directory (str) – Directory to load from Returns: configurations Return type: list of dict
-
load_from_mongo
()¶ Returns all active configurations from the mongo collection Configuration
Structure of Configuration expected in Mongo:
{ "name": String, "job": String, "exe_env": String, # <-- If not provided will be default as 'general' "active": Boolean, # <-- set to true to load configuration "type": "prototype_config", # <-- MUST BE THIS VALUE; For it is the config type :) "source": String, "logic": { # I need to be the logical blocks for Detection } }
Returns: Configurations Return type: list of dict
-
validate_config
(config)¶ Validates a configuration
The default JSON Schema is this:
{ "name": String, "job": String, "exe_env": String, # <-- If not provided will be default as 'general' "source": String, "logic": { # I need to be the logical blocks for Detection } }
Parameters: config (dict) – Configuration to validate Returns: If it is a valid configuration Return type: bool
-
validate_config_list
(configs)¶ Validates a configuration List
Parameters: configs (list[dict]) – Configuration List Returns: The Valid configurations Return type: list
-
The Base Source¶
-
class
tgt_grease.enterprise.Model.
BaseSourceClass
¶ Bases:
object
Base Class for all sources to implement
-
_data
¶ List of data to be returned to GREASE
Type: list[dict]
-
deduplication_strength
¶ Level of deduplication strength to use higher is stronger uniqueness
Type: float
-
field_set
¶ If none all fields found will be duplicated otherwise only fields listed will be
Type: None or list
-
deduplication_expiry
¶ Hours to retain deduplication data
Type: int
-
deduplication_expiry_max
¶ Days to deduplicate for maximum
Type: int
-
get_data
()¶ Returns data from source
Returns: List of single dimension dictionaries for GREASE to parse through other prototypes Return type: list[dict]
-
mock_data
(configuration)¶ Mock the source for data
Use this method to read through configuration provided to you, and mock getting data. This will always be called by the scan engine. Ensure you set any data to the `self._data` variable. A list of dictionaries for the engine to schedule for detection
Parameters: configuration (dict) – Configuration for the sourcing to occur with Note
This is the method to fill out to get data into GREASE.
Returns: mock data from source Return type: list[dict]
-
parse_source
(configuration)¶ Parse the source for data
Use this method to read through configuration provided to you, and get data. This will always be called by the scan engine. Ensure you set any data to the `self._data` variable. A list of dictionaries for the engine to schedule for detection
Parameters: configuration (dict) – Configuration for the sourcing to occur with Note
This is the method to fill out to get data into GREASE.
Returns: If True data will be scheduled for ingestion after deduplication. If False the engine will bail out Return type: bool
-
The Base Detector¶
-
class
tgt_grease.enterprise.Model.
Detector
(ioc=None)¶ Bases:
object
Base Detection Class
This is the abstract class for detectors to implement
-
ioc
¶ IOC Access
Type: GreaseContainer
-
processObject
(source, ruleConfig)¶ Processes an object and returns valid rule data
Data returned in the second parameter from this method should be in this form:
{ '<field>': Object # <-- if specified as a variable then return the key->Value pairs ... }
Parameters: - source (dict) – Source Data
- ruleConfig (list[dict]) – Rule Configuration Data
Returns: first element boolean for success; second dict for any fields returned as variables
Return type: tuple
-
The DeDuplication Engine¶
-
class
tgt_grease.enterprise.Model.
Deduplication
(ioc=None)¶ Bases:
object
Responsible for Deduplication Operations
Deduplication in GREASE is a multi-step process to ensure performance and accuracy of deduplication. The overview of this process is this:
- Step 1: Identify a Object Type 1 Hash Match. A Type 1 Object (T1) is a SHA256 hash of a dictionary in a data list. If we can hash the entire object and find a match then the object is 100% duplicate.
- Step 2: Object Type 2 Matching. If a Type 1 (T1) object cannot be found Type 2 Object (T2) deduplication occurs. This will introspect the dictionary for each field and map them against other likely objects of the same type. If a hash match is found (source + field + value as a SHA256) then the field is 100% duplicate. The aggregate score of all fields or the specified subset is above the provided threshold then the object is duplicate. This prevents similar objects from passing through when they are most likely updates to an original object that does not need to be computed on. If a field updates that you will need always then exclude it will need to be passed into the Deduplicate function.
Object examples:
# Type 1 Object { '_id': ObjectId, # <-- MongoDB ObjectID 'type: Int, # <-- Always Type 1 'hash': String, # <-- SHA256 hash of entire object 'expiry': DateTime, # <-- Expiration time if no objects are found to be duplicate after which object will be deleted 'max_expiry': DateTime, # <-- Expiration time for object to be deleted when reached 'score': Int, # <-- Amount of times this object has been found 'source': String # <-- Source of the object } # Type 2 Object { '_id': ObjectId, # <-- MongoDB ObjectID 'type: Int, # <-- Always Type 2 'source': String, # <-- Source of data 'field': String, # <-- Field in Object 'value': String, # <-- Value of Object's field 'hash': String, # <-- SHA256 of source + field + value 'expiry': DateTime, # <-- Expiration time if no objects are found to be duplicate after which object will be deleted 'max_expiry': DateTime, # <-- Expiration time for object to be deleted when reached 'score': Int, # <-- Amount of times this object has been found 'parentId': ObjectId # <-- T1 Object ID from parent }
-
ioc
¶ IoC access for DeDuplication
Type: GreaseContainer
-
Deduplicate
(data, source, configuration, threshold, expiry_hours, expiry_max, collection, field_set=None)¶ Deduplicate data
This method will deduplicate the data object to allow for only unique objects to be returned. The collection variable will be the collection deduplication data will be stored in
Parameters: - data (list[dict]) – list or single dimensional dictionaries to deduplicate
- source (str) – Source of data being deduplicated
- configuration (str) – Configuration Name Provided
- threshold (float) – level of duplication allowed in an object (the lower the threshold the more uniqueness is required)
- expiry_hours (int) – Hours to retain deduplication data
- expiry_max (int) – Maximum days to retain deduplication data
- collection (str) – Deduplication collection to use
- field_set (list, optional) – Fields to deduplicate on
Note
expiry_hours is specific to how many hours objects will be persisted for if they are not seen again
Returns: Deduplicated data Return type: list[dict]
-
static
deduplicate_object
(ioc, obj, expiry, expiry_max, threshold, source_name, configuration_name, final, collection, data_pointer=None, data_max=None, field_set=None)¶ DeDuplicate Object
This is the method to actually deduplicate an object. The final argument is appended to with the obj if it was successfully deduplicated.
Parameters: - ioc (GreaseContainer) – IoC for the instance
- obj (dict) – Object to be deduplicated
- expiry (int) – Hours to deduplicate for
- expiry_max (int) – Maximum days to deduplicate for
- threshold (float) – level of duplication allowed in an object (the lower the threshold the more uniqueness is required)
- source_name (str) – Source of data being deduplicated
- configuration_name (str) – Configuration being deduplicated for
- final (list) – List to append obj to if unique
- collection (str) – Name of deduplication collection
- data_pointer (int) – If provided will provide log information relating to thread (Typically used via Deduplicate)
- data_max (int) – If provided will provide log information relating to thread (Typically used via Deduplicate)
- field_set (list) – If provided will only deduplicate on list of fields provided
Returns: Nothing returned. Updates final object
Return type: None
-
static
generate_expiry_time
(hours)¶ Generates UTC Timestamp for hours in the future
Parameters: hours (int) – How many hours in the future to expire on Returns: Datetime object for hours in the future Return type: datetime.datetime
-
static
generate_hash_from_obj
(obj)¶ Takes an object and generates a SHA256 Hash of it
Parameters: obj (object) – Hashable object ot generate a SHA256 Returns: Object Hash Return type: str
-
static
generate_max_expiry_time
(days)¶ Generates UTC Timestamp for hours in the future
Parameters: days (int) – How many days in the future to expire on Returns: Datetime object for days in the future Return type: datetime.datetime
-
static
make_hashable
(obj)¶ Takes a dictionary and makes a sorted tuple of strings representing flattened key value pairs :param obj: A dictionary :type obj: dict
Returns: a sorted flattened tuple of the dictionary’s key value pairs Return type: tuple<str> Example
- {
- “a”: [“test1”, “test2”], “b”: [{“test2”: 21}, {“test1”: 1}, {“test7”: 3}], “c”: “test”
} becomes… ((‘a’, (‘test1’, ‘test2’)),
(‘b’, (((‘test1’, 1),), ((‘test2’, 21),), ((‘test7’, 3),))), (‘c’, ‘test’))
-
static
make_hashable_helper
(obj)¶ Recursively turns iterables into sorted tuples
-
static
object_field_score
(collection, ioc, source_name, configuration_name, obj, objectId, expiry, max_expiry, field_set=None)¶ Returns T2 average uniqueness
Takes a dictionary and returns the likelihood of that object being unique based on data in the collection
Parameters: - collection (str) – Deduplication collection name
- ioc (GreaseContainer) – IoC Access
- source_name (str) – source of data to be deduplicated
- configuration_name (str) – configuration name to be deduplicated
- obj (dict) – Single dimensional list to be compared against collection
- objectId (str) – T1 Hash Mongo ObjectId to be used to associate fields to a T1
- expiry (int) – Hours for deduplication to wait before removing a field if not seen again
- max_expiry (int) – Days for deduplication to wait before ensuring object is deleted
- field_set (list, optional) – List of fields to deduplicate with if provided. Else will use all keys
Returns: Duplication Probability
Return type: float
-
static
string_match_percentage
(constant, new_value)¶ Returns the percentage likelihood two strings are identical
Parameters: - constant (str) – Value to use as base standard
- new_value (str) – Value to compare constant against
Returns: Percentage likelihood of duplicate value
Return type: float
The Scheduling Engine¶
-
class
tgt_grease.enterprise.Model.
Scheduling
(ioc=None)¶ Bases:
object
Central scheduling class for GREASE
This class routes data to nodes within GREASE
-
ioc
¶ IoC access for DeDuplication
Type: GreaseContainer
-
determineDetectionServer
()¶ Determines detection server to use
Finds the detection server available for a new detection job
Returns: MongoDB Object ID of server & current job count Return type: tuple
-
determineExecutionServer
(role)¶ Determines execution server to use
Finds the execution server available for a new execution job
Returns: MongoDB Object ID of server; if one cannot be found then string will be empty Return type: str
-
determineSchedulingServer
()¶ Determines scheduling server to use
Finds the scheduling server available for a new scheduling job
Returns: MongoDB Object ID of server & current job count Return type: tuple
-
scheduleDetection
(source, configName, data)¶ Schedule a Source Parse to detection
This method will take a list of single dimension dictionaries and schedule them for detection
Parameters: - source (str) – Name of the source
- configName (str) – Configuration Data was sourced from
- data (list[dict]) – Data to be scheduled for detection
Returns: Scheduling success
Return type: bool
-
scheduleScheduling
(objectId)¶ Schedule a source for job scheduling
This method schedules a source for job scheduling
Parameters: objectId (str) – MongoDB ObjectId to schedule Returns: If scheduling was successful Return type: bool
-
The Scanning Processor¶
-
class
tgt_grease.enterprise.Model.
Scan
(ioc=None)¶ Bases:
object
Scanning class for GREASE Scanner
This is the model to actually utilize the scanners to parse the configured environments
-
ioc
¶ IOC for scanning
Type: GreaseContainer
-
conf
¶ Prototype configuration instance
Type: PrototypeConfig
-
impTool
¶ Import Utility Instance
Type: ImportTool
-
dedup
¶ Deduplication instance to be used
Type: Deduplication
-
Parse
(source=None, config=None)¶ This will read all configurations and attempt to scan the environment
This is the primary business logic for scanning in GREASE. This method will use configurations to parse the environment and attempt to schedule
Note
If a Source is specified then only that source is parsed. If a configuration is set then only that configuration is parsed. If both are provided then the configuration will only be parsed if it is of the source provided
Note
If mocking is enabled: Deduplication will not occur
Parameters: - source (str) – If set will only parse for the source listed
- config (str) – If set will only parse the specified config
Returns: True unless error
Return type: bool
-
static
ParseSource
(ioc, source, configuration, deduplication, scheduler)¶ Parses an individual source and attempts to schedule it
Parameters: - ioc (GreaseContainer) – IoC Instance
- source (BaseSourceClass) – Source to parse
- configuration (dict) – Prototype configuration to use
- deduplication (Deduplication) – Dedup engine instance
- scheduler (Scheduling) – Central Scheduling instance
Returns: Meant to be run in a thread
Return type: None
-
generate_config_set
(source=None, config=None)¶ Examines configuration and returns list of configs to parse
Note
If a Source is specified then only that source is parsed. If a configuration is set then only that configuration is parsed. If both are provided then the configuration will only be parsed if it is of the source provided
Parameters: - source (str) – If set will only parse for the source listed
- config (str) – If set will only parse the specified config
Returns: Returns Configurations to Parse for data
Return type: list[dict]
-
The Detection Processor¶
-
class
tgt_grease.enterprise.Model.
Detect
(ioc=None)¶ Bases:
object
Detection class for GREASE detect
This is the model to actually utilize the detectors to parse the sources from scan
-
ioc
¶ IOC for scanning
Type: GreaseContainer
-
impTool
¶ Import Utility Instance
Type: ImportTool
-
conf
¶ Prototype configuration tool
Type: PrototypeConfig
-
scheduler
¶ Prototype Scheduling Service Instance
Type: Scheduling
-
detectSource
()¶ This will perform detection the oldest source from SourceData
Returns: If detection process was successful Return type: bool
-
detection
(source, configuration)¶ Performs detection on a source with the provided configuration
Parameters: - source (dict) – Key->Value pairs from sourcing to detect upon
- configuration (dict) – Prototype configuration provided from sourcing
Returns: Detection Results; first boolean for success, second dict of variables for context
Return type: tuple
-
getScheduledSource
()¶ Queries for oldest source that has been assigned for detection
Returns: source awaiting detection Return type: dict
-
The Scheduling Processor¶
-
class
tgt_grease.enterprise.Model.
Scheduler
(ioc=None)¶ Bases:
object
Job Scheduler Model
This model will attempt to schedule a job for execution
-
ioc
¶ IOC for scanning
Type: GreaseContainer
-
impTool
¶ Import Utility Instance
Type: ImportTool
-
conf
¶ Prototype configuration tool
Type: PrototypeConfig
-
scheduler
¶ Prototype Scheduling Service Instance
Type: Scheduling
-
getDetectedSource
()¶ Gets the oldest successfully detected source
Returns: Object from MongoDB Return type: dict
-
schedule
(source)¶ Schedules source for execution
Returns: If scheduling was successful or not Return type: bool
-
scheduleExecution
()¶ Schedules the oldest successfully detected source to execution
Returns: True if detection is successful else false Return type: bool
-