GREASE Sources

URL Parsing Source

class tgt_grease.enterprise.Sources.UrlParser.URLParser

Bases: tgt_grease.enterprise.Model.BaseSource.BaseSourceClass

Monitor URL’s as a source of information

This source is designed to provide source data on the URL’s configured for a GREASE sourcing cluster. A generic configuration looks like this for a url_source:

{
    'name': 'example_source', # <-- A name
    'job': 'example_job', # <-- Any job you want to run
    'exe_env': 'general', # <-- Selected execution environment; Can be anything!
    'source': 'url_source', # <-- This source
    'url': ['google.com', 'http://bing.com', '8.8.8.8'], # <-- List of URL's to parse
    'hour': 16, # <-- **OPTIONAL** 24hr time hour to poll URLs
    'minute': 30, # <-- **OPTIONAL** Minute to poll URLs
    'logic': {} # <-- Whatever logic your heart desires
}

Note

This configuration is an example

Note

If a URL in the url parameter is not prefixed with http:// then the class will do so for you

Note

without minute parameter the engine will poll for the entire hour

Note

Hour and minute parameters are in UTC time

Note

To only poll once an hour only set the minute field

mock_data(configuration)

Data from this source is mocked utilizing the GREASE Filesystem

Mock data for this source can be place in <GREASE_DIR>/etc/*.mock.url.json. This source will pick up all these files and load them into the returning object. They will need to follow this schema:

{
    'url': String, # <-- URL that would have been loaded
    'status_code': Int, # <-- HTTP Status code
    'headers': String, # <-- HTTP headers as a string
    'body': String # <-- HTTP response body
}
Parameters:configuration (dict) – Configuration Data for source

Note

Argument configuration is not honored here

Returns:Mocked Data
Return type:list[dict]
parse_source(configuration)

This will make a GET request to all URL’s in the list provided by your configuration

Parameters:configuration (dict) – Configuration of Source. See Class Documentation above for more info
Returns:If True data will be scheduled for ingestion after deduplication. If False the engine will bail out
Return type:bool

ElasticSearch Source

class tgt_grease.enterprise.Sources.ElasticSearch.ElasticSource

Bases: tgt_grease.enterprise.Model.BaseSource.BaseSourceClass

Source data from ElasticSearch

This Source is designed to query ElasticSearch for data. A generic configuration looks like this for a elastic_source:

{
    'name': 'example_source', # <-- A name
    'job': 'example_job', # <-- Any job you want to run
    'exe_env': 'general', # <-- Selected execution environment; Can be anything!
    'source': 'elastic_source', # <-- This source
    'server': 'http://localhost:9200', # <-- String for ES Connection to occur
    'index': 'my_fake_index', # <-- Index to query within ES
    'doc_type': 'myData' # <-- Document type to query for in ES
    'query': {}, # <-- Dict of ElasticSearch Query
    'hour': 16, # <-- **OPTIONAL** 24hr time hour to poll SQL
    'minute': 30, # <-- **OPTIONAL** Minute to poll SQL
    'logic': {} # <-- Whatever logic your heart desires
}

Note

without minute parameter the engine will poll for the entire hour

Note

Hour and minute parameters are in UTC time

Note

To only poll once an hour only set the minute field

mock_data(configuration)

Data from this source is mocked utilizing the GREASE Filesystem

Mock data for this source can be place in <GREASE_DIR>/etc/*.mock.es.json. This source will pick up all these files and load them into the returning object. The data in these files should reflect what you expect to return from ElasticSearch

Parameters:configuration (dict) – Configuration Data for source

Note

Argument configuration is not honored here

Returns:Mocked Data
Return type:list[dict]
parse_source(configuration)

This will make a ElasticSearch connection & query to the configured server

Parameters:configuration (dict) – Configuration of Source. See Class Documentation above for more info
Returns:If True data will be scheduled for ingestion after deduplication. If False the engine will bail out
Return type:bool

SQL Source

class tgt_grease.enterprise.Sources.SQLSearch.SQLSource

Bases: tgt_grease.enterprise.Model.BaseSource.BaseSourceClass

Source data from a SQL Database

This Source is designed to query a SQL Server for data. A generic configuration looks like this for a sql_source:

{
    'name': 'example_source', # <-- A name
    'job': 'example_job', # <-- Any job you want to run
    'exe_env': 'general', # <-- Selected execution environment; Can be anything!
    'source': 'sql_source', # <-- This source
    'type': 'postgresql', # <-- SQL Server Type (Only supports PostgreSQL Currently)
    'dsn': 'SQL_SERVER_CONNECTION', # <-- String representing the Environment variable used to connect with
    'query': 'select count(*) as order_total from orders where oDate::DATE = current_data', # <-- SQL Query to execute on server
    'hour': 16, # <-- **OPTIONAL** 24hr time hour to poll SQL
    'minute': 30, # <-- **OPTIONAL** Minute to poll SQL
    'logic': {} # <-- Whatever logic your heart desires
}

Note

This configuration is an example

Note

Currently We only support PostreSQL Server

Note

without minute parameter the engine will poll for the entire hour

Note

Hour and minute parameters are in UTC time

Note

To only poll once an hour only set the minute field

mock_data(configuration)

Data from this source is mocked utilizing the GREASE Filesystem

Mock data for this source can be place in <GREASE_DIR>/etc/*.mock.sql.json. This source will pick up all these files and load them into the returning object. The data in these files should reflect what you expect to return from SQL:

{
    'column expected': 'value expected'
    ...
}
Parameters:configuration (dict) – Configuration Data for source

Note

Argument configuration is not honored here

Note

A mock file should represent a single row

Returns:Mocked Data
Return type:list[dict]
parse_source(configuration)

This will Query the SQL Server to find data

Parameters:configuration (dict) – Configuration of Source. See Class Documentation above for more info
Returns:If True data will be scheduled for ingestion after deduplication. If False the engine will bail out
Return type:bool