Enterprise Models

The Prototype Configuration Model

class tgt_grease.enterprise.Model.PrototypeConfig(ioc=None)

Bases: object

Responsible for Scanning/Detection/Scheduling configuration

Structure of Configuration:

{
    'configuration': {
        'pkg': [], # <-- Loaded from pkg_resources.resource_filename('tgt_grease.enterprise.Model', 'config/')
        'fs': [], # <-- Loaded from `<GREASE_DIR>/etc/*.config.json`
        'mongo': [] # <-- Loaded from the Configuration Mongo Collection
    },
    'raw': [], # <-- All loaded configurations
    'sources': [], # <-- list of sources found in configurations
    'source': {} # <-- keys will be source values list of configs for that source
    'names': [], # <-- all configs via their name so to allow dialing
    'name': {} # <-- all configs via their name so to allow being dialing
}

Structure of a configuration file:

{
    "name": String,
    "job": String,
    "exe_env": String, # <-- If not provided will be default as 'general'
    "source": String,
    "logic": {
        # I need to be the logical blocks for Detection
    }
}
ioc

IOC access

Type:GreaseContainer
getConfiguration()

Returns the Configuration Object loaded into memory

Returns:Configuration object
Return type:dict
get_config(name)

Get Configuration by name

Parameters:name (str) – Configuration name to get
Returns:Configuration if found else empty dict
Return type:dict
get_names()

Returns the list of names of configs

Returns:List of config names
Return type:list
get_source(name)

Get all configuration by source by name

Parameters:name (str) – Source name to get
Returns:Configuration if found else empty dict
Return type:list[dict]
get_sources()

Returns the list of sources to be scanned

Returns:List of sources
Return type:list
load(reloadConf=False, ConfigurationList=None)

[Re]loads configuration data about the current execution node

Configuration data loads from 3 places in GREASE. The first is internal to the package, if one were to manually add their own files into the package in the current directory following the file pattern. The next is following the same pattern but loaded from <GREASE_DIR>/etc/. The final place GREASE looks for configuration data is from the configuration collection in MongoDB

Parameters:
  • reloadConf (bool) – If True this will reload the global object. False will return the object
  • ConfigurationList (list of dict) – If provided will load the list of dict for config after validation

Note

Providing a configuration automatically reloads the memory structure of prototype configuration

Returns:Current Configuration information
Return type:dict
load_from_fs(directory)

Loads configurations from provided directory

Note

Pattern is *.config.json

Parameters:directory (str) – Directory to load from
Returns:configurations
Return type:list of dict
load_from_mongo()

Returns all active configurations from the mongo collection Configuration

Structure of Configuration expected in Mongo:

{
    "name": String,
    "job": String,
    "exe_env": String, # <-- If not provided will be default as 'general'
    "active": Boolean, # <-- set to true to load configuration
    "type": "prototype_config", # <-- MUST BE THIS VALUE; For it is the config type :)
    "source": String,
    "logic": {
        # I need to be the logical blocks for Detection
    }
}
Returns:Configurations
Return type:list of dict
validate_config(config)

Validates a configuration

The default JSON Schema is this:

{
    "name": String,
    "job": String,
    "exe_env": String, # <-- If not provided will be default as 'general'
    "source": String,
    "logic": {
        # I need to be the logical blocks for Detection
    }
}
Parameters:config (dict) – Configuration to validate
Returns:If it is a valid configuration
Return type:bool
validate_config_list(configs)

Validates a configuration List

Parameters:configs (list[dict]) – Configuration List
Returns:The Valid configurations
Return type:list

The Base Source

class tgt_grease.enterprise.Model.BaseSourceClass

Bases: object

Base Class for all sources to implement

_data

List of data to be returned to GREASE

Type:list[dict]
deduplication_strength

Level of deduplication strength to use higher is stronger uniqueness

Type:float
field_set

If none all fields found will be duplicated otherwise only fields listed will be

Type:None or list
deduplication_expiry

Hours to retain deduplication data

Type:int
deduplication_expiry_max

Days to deduplicate for maximum

Type:int
get_data()

Returns data from source

Returns:List of single dimension dictionaries for GREASE to parse through other prototypes
Return type:list[dict]
mock_data(configuration)

Mock the source for data

Use this method to read through configuration provided to you, and mock getting data. This will always be called by the scan engine. Ensure you set any data to the `self._data` variable. A list of dictionaries for the engine to schedule for detection

Parameters:configuration (dict) – Configuration for the sourcing to occur with

Note

This is the method to fill out to get data into GREASE.

Returns:mock data from source
Return type:list[dict]
parse_source(configuration)

Parse the source for data

Use this method to read through configuration provided to you, and get data. This will always be called by the scan engine. Ensure you set any data to the `self._data` variable. A list of dictionaries for the engine to schedule for detection

Parameters:configuration (dict) – Configuration for the sourcing to occur with

Note

This is the method to fill out to get data into GREASE.

Returns:If True data will be scheduled for ingestion after deduplication. If False the engine will bail out
Return type:bool

The Base Detector

class tgt_grease.enterprise.Model.Detector(ioc=None)

Bases: object

Base Detection Class

This is the abstract class for detectors to implement

ioc

IOC Access

Type:GreaseContainer
processObject(source, ruleConfig)

Processes an object and returns valid rule data

Data returned in the second parameter from this method should be in this form:

{
    '<field>': Object # <-- if specified as a variable then return the key->Value pairs
    ...
}
Parameters:
  • source (dict) – Source Data
  • ruleConfig (list[dict]) – Rule Configuration Data
Returns:

first element boolean for success; second dict for any fields returned as variables

Return type:

tuple

The DeDuplication Engine

class tgt_grease.enterprise.Model.Deduplication(ioc=None)

Bases: object

Responsible for Deduplication Operations

Deduplication in GREASE is a multi-step process to ensure performance and accuracy of deduplication. The overview of this process is this:

  • Step 1: Identify a Object Type 1 Hash Match. A Type 1 Object (T1) is a SHA256 hash of a dictionary in a data list. If we can hash the entire object and find a match then the object is 100% duplicate.
  • Step 2: Object Type 2 Matching. If a Type 1 (T1) object cannot be found Type 2 Object (T2) deduplication occurs. This will introspect the dictionary for each field and map them against other likely objects of the same type. If a hash match is found (source + field + value as a SHA256) then the field is 100% duplicate. The aggregate score of all fields or the specified subset is above the provided threshold then the object is duplicate. This prevents similar objects from passing through when they are most likely updates to an original object that does not need to be computed on. If a field updates that you will need always then exclude it will need to be passed into the Deduplicate function.

Object examples:

# Type 1 Object

{
    '_id': ObjectId, # <-- MongoDB ObjectID
    'type: Int, # <-- Always Type 1
    'hash': String, # <-- SHA256 hash of entire object
    'expiry': DateTime, # <-- Expiration time if no objects are found to be duplicate after which object will be deleted
    'max_expiry': DateTime, # <-- Expiration time for object to be deleted when reached
    'score': Int, # <-- Amount of times this object has been found
    'source': String # <-- Source of the object
}
# Type 2 Object
{
    '_id': ObjectId, # <-- MongoDB ObjectID
    'type: Int, # <-- Always Type 2
    'source': String, # <-- Source of data
    'field': String, # <-- Field in Object
    'value': String, # <-- Value of Object's field
    'hash': String, # <-- SHA256 of source + field + value
    'expiry': DateTime, # <-- Expiration time if no objects are found to be duplicate after which object will be deleted
    'max_expiry': DateTime, # <-- Expiration time for object to be deleted when reached
    'score': Int, # <-- Amount of times this object has been found
    'parentId': ObjectId # <-- T1 Object ID from parent
}
ioc

IoC access for DeDuplication

Type:GreaseContainer
Deduplicate(data, source, configuration, threshold, expiry_hours, expiry_max, collection, field_set=None)

Deduplicate data

This method will deduplicate the data object to allow for only unique objects to be returned. The collection variable will be the collection deduplication data will be stored in

Parameters:
  • data (list[dict]) – list or single dimensional dictionaries to deduplicate
  • source (str) – Source of data being deduplicated
  • configuration (str) – Configuration Name Provided
  • threshold (float) – level of duplication allowed in an object (the lower the threshold the more uniqueness is required)
  • expiry_hours (int) – Hours to retain deduplication data
  • expiry_max (int) – Maximum days to retain deduplication data
  • collection (str) – Deduplication collection to use
  • field_set (list, optional) – Fields to deduplicate on

Note

expiry_hours is specific to how many hours objects will be persisted for if they are not seen again

Returns:Deduplicated data
Return type:list[dict]
static deduplicate_object(ioc, obj, expiry, expiry_max, threshold, source_name, configuration_name, final, collection, data_pointer=None, data_max=None, field_set=None)

DeDuplicate Object

This is the method to actually deduplicate an object. The final argument is appended to with the obj if it was successfully deduplicated.

Parameters:
  • ioc (GreaseContainer) – IoC for the instance
  • obj (dict) – Object to be deduplicated
  • expiry (int) – Hours to deduplicate for
  • expiry_max (int) – Maximum days to deduplicate for
  • threshold (float) – level of duplication allowed in an object (the lower the threshold the more uniqueness is required)
  • source_name (str) – Source of data being deduplicated
  • configuration_name (str) – Configuration being deduplicated for
  • final (list) – List to append obj to if unique
  • collection (str) – Name of deduplication collection
  • data_pointer (int) – If provided will provide log information relating to thread (Typically used via Deduplicate)
  • data_max (int) – If provided will provide log information relating to thread (Typically used via Deduplicate)
  • field_set (list) – If provided will only deduplicate on list of fields provided
Returns:

Nothing returned. Updates final object

Return type:

None

static generate_expiry_time(hours)

Generates UTC Timestamp for hours in the future

Parameters:hours (int) – How many hours in the future to expire on
Returns:Datetime object for hours in the future
Return type:datetime.datetime
static generate_hash_from_obj(obj)

Takes an object and generates a SHA256 Hash of it

Parameters:obj (object) – Hashable object ot generate a SHA256
Returns:Object Hash
Return type:str
static generate_max_expiry_time(days)

Generates UTC Timestamp for hours in the future

Parameters:days (int) – How many days in the future to expire on
Returns:Datetime object for days in the future
Return type:datetime.datetime
static make_hashable(obj)

Takes a dictionary and makes a sorted tuple of strings representing flattened key value pairs :param obj: A dictionary :type obj: dict

Returns:a sorted flattened tuple of the dictionary’s key value pairs
Return type:tuple<str>

Example

{
“a”: [“test1”, “test2”], “b”: [{“test2”: 21}, {“test1”: 1}, {“test7”: 3}], “c”: “test”

} becomes… ((‘a’, (‘test1’, ‘test2’)),

(‘b’, (((‘test1’, 1),), ((‘test2’, 21),), ((‘test7’, 3),))), (‘c’, ‘test’))
static make_hashable_helper(obj)

Recursively turns iterables into sorted tuples

static object_field_score(collection, ioc, source_name, configuration_name, obj, objectId, expiry, max_expiry, field_set=None)

Returns T2 average uniqueness

Takes a dictionary and returns the likelihood of that object being unique based on data in the collection

Parameters:
  • collection (str) – Deduplication collection name
  • ioc (GreaseContainer) – IoC Access
  • source_name (str) – source of data to be deduplicated
  • configuration_name (str) – configuration name to be deduplicated
  • obj (dict) – Single dimensional list to be compared against collection
  • objectId (str) – T1 Hash Mongo ObjectId to be used to associate fields to a T1
  • expiry (int) – Hours for deduplication to wait before removing a field if not seen again
  • max_expiry (int) – Days for deduplication to wait before ensuring object is deleted
  • field_set (list, optional) – List of fields to deduplicate with if provided. Else will use all keys
Returns:

Duplication Probability

Return type:

float

static string_match_percentage(constant, new_value)

Returns the percentage likelihood two strings are identical

Parameters:
  • constant (str) – Value to use as base standard
  • new_value (str) – Value to compare constant against
Returns:

Percentage likelihood of duplicate value

Return type:

float

The Scheduling Engine

class tgt_grease.enterprise.Model.Scheduling(ioc=None)

Bases: object

Central scheduling class for GREASE

This class routes data to nodes within GREASE

ioc

IoC access for DeDuplication

Type:GreaseContainer
determineDetectionServer()

Determines detection server to use

Finds the detection server available for a new detection job

Returns:MongoDB Object ID of server & current job count
Return type:tuple
determineExecutionServer(role)

Determines execution server to use

Finds the execution server available for a new execution job

Returns:MongoDB Object ID of server; if one cannot be found then string will be empty
Return type:str
determineSchedulingServer()

Determines scheduling server to use

Finds the scheduling server available for a new scheduling job

Returns:MongoDB Object ID of server & current job count
Return type:tuple
scheduleDetection(source, configName, data)

Schedule a Source Parse to detection

This method will take a list of single dimension dictionaries and schedule them for detection

Parameters:
  • source (str) – Name of the source
  • configName (str) – Configuration Data was sourced from
  • data (list[dict]) – Data to be scheduled for detection
Returns:

Scheduling success

Return type:

bool

scheduleScheduling(objectId)

Schedule a source for job scheduling

This method schedules a source for job scheduling

Parameters:objectId (str) – MongoDB ObjectId to schedule
Returns:If scheduling was successful
Return type:bool

The Scanning Processor

class tgt_grease.enterprise.Model.Scan(ioc=None)

Bases: object

Scanning class for GREASE Scanner

This is the model to actually utilize the scanners to parse the configured environments

ioc

IOC for scanning

Type:GreaseContainer
conf

Prototype configuration instance

Type:PrototypeConfig
impTool

Import Utility Instance

Type:ImportTool
dedup

Deduplication instance to be used

Type:Deduplication
Parse(source=None, config=None)

This will read all configurations and attempt to scan the environment

This is the primary business logic for scanning in GREASE. This method will use configurations to parse the environment and attempt to schedule

Note

If a Source is specified then only that source is parsed. If a configuration is set then only that configuration is parsed. If both are provided then the configuration will only be parsed if it is of the source provided

Note

If mocking is enabled: Deduplication will not occur

Parameters:
  • source (str) – If set will only parse for the source listed
  • config (str) – If set will only parse the specified config
Returns:

True unless error

Return type:

bool

static ParseSource(ioc, source, configuration, deduplication, scheduler)

Parses an individual source and attempts to schedule it

Parameters:
Returns:

Meant to be run in a thread

Return type:

None

generate_config_set(source=None, config=None)

Examines configuration and returns list of configs to parse

Note

If a Source is specified then only that source is parsed. If a configuration is set then only that configuration is parsed. If both are provided then the configuration will only be parsed if it is of the source provided

Parameters:
  • source (str) – If set will only parse for the source listed
  • config (str) – If set will only parse the specified config
Returns:

Returns Configurations to Parse for data

Return type:

list[dict]

The Detection Processor

class tgt_grease.enterprise.Model.Detect(ioc=None)

Bases: object

Detection class for GREASE detect

This is the model to actually utilize the detectors to parse the sources from scan

ioc

IOC for scanning

Type:GreaseContainer
impTool

Import Utility Instance

Type:ImportTool
conf

Prototype configuration tool

Type:PrototypeConfig
scheduler

Prototype Scheduling Service Instance

Type:Scheduling
detectSource()

This will perform detection the oldest source from SourceData

Returns:If detection process was successful
Return type:bool
detection(source, configuration)

Performs detection on a source with the provided configuration

Parameters:
  • source (dict) – Key->Value pairs from sourcing to detect upon
  • configuration (dict) – Prototype configuration provided from sourcing
Returns:

Detection Results; first boolean for success, second dict of variables for context

Return type:

tuple

getScheduledSource()

Queries for oldest source that has been assigned for detection

Returns:source awaiting detection
Return type:dict

The Scheduling Processor

class tgt_grease.enterprise.Model.Scheduler(ioc=None)

Bases: object

Job Scheduler Model

This model will attempt to schedule a job for execution

ioc

IOC for scanning

Type:GreaseContainer
impTool

Import Utility Instance

Type:ImportTool
conf

Prototype configuration tool

Type:PrototypeConfig
scheduler

Prototype Scheduling Service Instance

Type:Scheduling
getDetectedSource()

Gets the oldest successfully detected source

Returns:Object from MongoDB
Return type:dict
schedule(source)

Schedules source for execution

Returns:If scheduling was successful or not
Return type:bool
scheduleExecution()

Schedules the oldest successfully detected source to execution

Returns:True if detection is successful else false
Return type:bool