Version: 2
restore

Contents


Overview

Since we have an Agent process running on each device, we are in a unique position to be able to capture "live" data from that device for analysis. This includes data that may be generated at high frequency, such as networking events (e.g. TCP connections), process events, etc. With this data captured, the Agent can store it, aggregate it over time, and query it (i.e. via instructions), and also report - potentially in real-time - any events that may be of immediate interest to a Consumer. This provides some interesting options:

  • From a security point of view, we could provide instant notification of a connection to a specific IP address (e.g. malware calling home)
  • From a Software Asset Management (SAM) viewpoint, we could provide visibility of application usage over time
  • From a systems management point of view, we could use some intelligent trending to work out what has changed in the environment over time

Applies to

This feature applies to Tachyon v3 and above.

Platform Support

Historic data capture is available on Windows Agents, and slightly less extensive support (see below) is available for Mac, Linux and Solaris.

Data capture on Android is presently not available. 

Background

Prior to v3, the Agent supported a simple event-based mechanism, whereby an instruction could be issued that contained a "Subscribe" method call on the Agent, telling the Agent to start generating events based on a pre-defined event source. An example of this is subscribing to particular entries generated from the Windows Event Log. The Agent responds to the initial instruction with an "EventSubscribed" status code, and then later sends event data back (in the form of responses) during the instruction's lifetime. The event subscription is terminated either when the instruction expires, or if it was cancelled explicitly.

Although this mechanism offers some degree of flexibility in terms of what is captured, it is inherently limited in that it follows the standard "instruction"-based approach, where an instruction has a fixed lifetime.

This limitation is most evident when a Tachyon Consumer wishes to subscribe for event changes indefinitely (e.g. Operational Health wishing to know whenever a service starts/stops, or Security wishing to know whenever software is installed or uninstalled). The only strategy to deal with this was to renew the event subscription periodically (e.g. every 24 hours, cancel the last instruction and create a new one). This puts a burden on the consumer in that it has to manage this renewal, and it is also lossy because there is potentially a window between expiry and renewal where events would not be captured.

The Historic Data Collection feature, while not mirroring the event-based functionality exactly, offers an improved mechanism of capturing events on the Agent. The key difference is that the Agent captures the data implicitly (i.e. it is not necessary to send an instruction to the Agent to tell it to capture data), and the data captured is held Agent-side (i.e. it is not sent back to the server as it occurs).

This offers the following advantages:

  • There is no need to create and continually renew an event subscription for indefinite data capture
  • There is reduced traffic between Agents and server
  • Querying of the data can occur at the Agent, rather than capturing all data and then querying server-side

Obviously this approach is not event-driven in the sense that the Agent will notify the server whenever a particular event occurs.

Note that the existing event-based functionality is still supported in the Agent, although its use is not actively encouraged.

Data collected

The initial (v3) implementation of this feature collects a fixed set of data, although each "collector" can be enabled/disabled independently. In the future, this will be extended to allow the capture to be more flexible and potentially policy-based (see "Extensibility" below).

Since a key driver for capturing this data is to be able to evaluate IOC evidence, the Agent will capture as much data as possible to be able to fulfill this requirement.

Some of the data required for IOC evaluation is very difficult to capture (e.g. receiving emails from a particular domain, browser access to a particular URL). In this first cut, we capture the "core" metrics which can be used for detecting IOCs.

What is collected

  • Software installs/uninstalls/presence - the Agent captures whenever software is installed/uninstalled, and also captures which software is present on a device
  • Outbound TCP connections - the Agent captures whenever an outbound TCP connection is made
  • DNS queries - the Agent captures whenever a DNS address is resolved
  • Process execution - the Agent captures whenever a process starts on the device

What is not collected

Everything else! This list is non-exhaustive, but we do not presently capture:

  • Registry/file-system changes
  • User logon/logoff events
  • Inbound network connections or listening ports
  • UDP capture (in- or outbound)
  • Application-level network data, e.g. URL access, email access

Future versions of the Tachyon Agent will capture additional data.

Implementation

The implementation of this feature has four significant components:

  • Capturing the data - i.e. working out when a particular event has happened and gathering the associated data
  • Storing the data - writing the data to an (encrypted) persistent store on the Agent, and deleting it when necessary
  • Aggregating the data - summarizing the data in a sensible way to reduce the storage requirement on the Agent
  • Querying the data - providing a mechanism for the Agent Language to be able to access the stored data

Each of these is described in more detail below.

Data capture overview

  • Data capture is taken care of by a new "Inventory" module on the Agent
    • In hindsight, this was a bit of a bad choice of name - it's not really inventory that we're capturing
  • This module manages a thread which is responsible for gathering data and storing it
  • Since this is just a "regular" Tachyon Agent module, it is optional - i.e. the are no core dependencies on this module within Tachyon itself
  • The Agent has two key mechanisms of knowing when an event occurs that is of interest - event-based and polling-based
    • Event-based relies on a source external to the Agent (normally the operating system) providing a notification to indicate that something has happened
    • Polling-based is where the Agent will periodically check a source of data and work out what has changed by looking at differences in the data returned
  • Event-based notification is preferable for the following reasons:
    • Lower overhead - the Agent does not need to query the system potentially unnecessarily
    • More timely information - data is captured as it occurs
    • Less chance of missed data - if data occurs and then disappears between poll cycles (consider a process which starts and lives for only one second), this data will be missed
  • On Windows, Tachyon uses the Event Tracing framework (ETW) where available to capture real-time events
  • On non-Windows platforms, although frameworks exist (such as DTrace) for event capture, these have not been integrated into the Agent - instead, the Agent will use a polling-based mechanism

The following matrix describes the techniques used to capture data across different operating systems:

Data/PlatformWindowsMacLinuxSolaris
Process executions
  • ETW on Windows Vista and above
  • Polling on Windows XP
PollingPollingPolling
Outbound TCP connections
  • ETW on Windows Vista and above
  • Polling on Windows XP
PollingPollingNot yet available

DNS queries

  • ETW on Windows 8.1 and above

  • Polling on Windows 8 and below

PollingNot yet availableNot yet available
Software

Polling on all versions

PollingPollingPolling

Event Tracing for Windows (ETW)

  • Event tracing for Windows is a notoriously ugly API to use, but does provide a wealth of information in a well-defined way
  • Although ETW is technically available on Windows XP (Sysinternals Process Monitor uses it), only legacy providers are supported, which limits its usefulness for Tachyon
  • Within the Tachyon Agent, a "proxy" wrapper exists to dynamically load and use functions from the corresponding DLLs (tdh.dll, advapi32.dll and ws2_32.dll) if those are available (i.e. if the operating system is Vista or later)
  • The check for this functionality is made at the point that the Inventory module initializes; a debug-level message in the log will indicate the availability of ETW if present...
    • Initialized EventTraceProxy - Event Tracing for Windows functionality is available
  • ... and a warning message will be displayed if ETW is not available:
    • Detected Windows XP or below - Event Tracing for Windows functionality will not be available
  • It is not the purpose of this document to describe how ETW works or is used in detail, but a rough outline of the process is:
    • The process (i.e. the Agent) initializes some ETW structures to create a trace session
    • That trace session is then configured to listen to events from one or more providers
    • Part of that provider registration includes masking what level of event data should be captured
      • Some providers may not be available on specific versions of Windows, some may not capture specific events on specific operating systems, and some may use different event IDs depending on the operating system
    • Trace sessions are configured with a specific name/GUID - if a conflicting session already exists, the Agent connects to that session, cancels it, and starts a new one
    • The Agent then starts the trace session - this has to be done in a dedicated thread, which then blocks and waits for events
    • When an event occurs, a call-back occurs within the Agent, during which the Agent retrieves the data associated with that event (this is the trickiest, nastiest bit)
      • Why is it so nasty? Because the data is compressed into a binary blob which has to be decoded, and the fields within that data can be of complex types (arrays, structures, etc) which need to be decoded. There is also the concept of "mapped" values, where a value associated with the data has to be translated using a pre-defined map to yield a meaningful value
    • The Agent then enqueues the data in a specific data structure (one for process starts, one for TCP connections, etc) ready for processing by its aggregation thread
    • When the inventory module is unloaded (normally if the Agent is shutting down), the trace session is stopped
    • It goes without saying that the capturing of the data needs to happen in as efficient a manner as possible, as ETW can raise huge numbers of events in a very short space of time based on system activity
      • Within the ETW call-back, the Agent uses minimal data copying, performs no logging, and avoids thread synchronization wherever possible

Data capture in detail

  • The specifics of each type of data captured are described below.
  • For each item captured, a timestamp field (named "TS" and in Unix-epoch format) stores the date/time at which that event occured
  • The specifics of data capture (as well as aggregation) can be configured via the Tachyon.Agent.conf file - see the Configuration section below

Capturing process executions

General

  • The Agent captures only process starts; it does not track how long the process has been running, or how much CPU-time (or user/kernel/active time) the process has used

The following fields are captured:

FieldDatatypeSample valueNotes
ProcessIdinteger178Operating-system dependent process ID
ExecutableNamestringvmconnect.exeThe filename (including extension) of the process executable
ExecutablePathstring\device\harddiskvolume8\windows\system32\vmconnect.exe

The path and filename of the process executable.

On Windows, this is the NT-device format version of the path (as a process does not necessarily need to have been launched from a device which has a drive-letter mapping)

ExecutableHashstringdae0bb0a7b2041115cfd9b27d73e0391

The MD5 hash of the process executable

CommandLinestring"C:\Windows\system32\VmConnect.exe" "1EUKDEVWKS1231" "TCH-CLI-WXPX86" -G "B2C72520-BBC6-4736-BBBC-5CCF50FE6666" -C "0"The full command-line of the process, including (on Windows) the executable name
UserNamestring1E\james.daviesThe name of the user who launched the process (or blank if it is a system-launched process)
ParentProcessIdinteger2088The process ID of the process which spawned this one
ParentExecutableNamestringmmc.exeThe filename (including extension) of the executable of the process which spawned this one

Windows

  • The Agent will use ETW for Vista and above to capture process starts, and will use polling (by calling ???) on Windows XP
    • For ETW, the Agent uses the Microsoft-Windows-Kernel-Process provider
    • For polling, the Agent uses the CreateToolhelp32Snapshot Win32 API
  • In addition, the Agent will - on all versions of Windows - do an initial "poll" of processes when it starts capturing; this is to determine which processes were already running when the Agent started capturing data
  • The Agent stores - on all versions of Windows - a setting in persistent storage called "Inventory.ProcessesLastScan" which determines the last time the Agent checked for processes
    • This is so the Agent does not double-count processes which started when it was previously running
    • On XP, this corresponds to the last time the Agent polled; on Vista+, this corresponds to the time that the ETW process capture logic within Inventory module was last terminated
  • Since the Agent runs as LOCAL SYSTEM, details of almost every process will be available; however some processes may not be accessible because of permissions
  • Capturing a process command-line on Windows is not an exact science - the only way to do it is to read the memory of the corresponding process, determine where the process' execution block is, and then read what is presently stored in the "command line" part of that structure.
    • In theory, a program is at liberty to change the contents of that memory, so the command-line read may not be the true command-line that was given to the process at launch time
    • In practice, few programs actually do modify this data. An example of one that does is "vcpkgsrv.exe" - the Visual Studio packaging server. If you're curious, it looks like it removes the executable path portion of the command-line, leaving just the raw arguments.
    • Sometimes the executable name part of the command-line is quoted, sometimes it's not - it's arbitrary based however the parent process launched the child; so you may see a mix of command-lines like...
      • "C:\Program Files (x86)\Microsoft Office\root\Office16\OUTLOOK.EXE" 
      • \??\C:\Windows\system32\conhost.exe 0x4
      • C:\Windows\system32\svchost.exe -k UnistackSvcGroup

Linux

  • An initial scan of processes determines which processes were already running when the Agent started capturing data, using the /proc pseudo filesystem as the source of process information. All the attributes listed above are saved.
  • A poll runs periodically, every 30 seconds by default. Processes that were running last time, based on their PIDs, are ignored. (So if an old process has terminated and a new one taken its PID, the mechanism thinks that the old process is still running.)
  • As on Windows, when the Agent shuts down a persistent storage setting call "Inventory.ProcessesLastScan" records when the Agent last checked for processes so that processes that were already runing last time the Agent ran are not double-counted.

Mac

  • An initial scan of processes determines which processes were already running when the Agent started capturing data, using the proc_listpids() system call to provide a list of running processes. the Agent then uses proc_pidinfo() and proc_pidpath() as the source of detailed process information. This is because on Mac there is no no /proc file-system. All the attributes listed above are saved with the exception of CommandLine.
  • A poll runs periodically, every 30 seconds by default. Processes that were running last time, based on their PIDs, are ignored. (So if an old process has terminated and a new one taken its PID, the mechanism thinks that the old process is still running.)
  • As on Windows, when the Agent shuts down a persistent storage setting call "Inventory.ProcessesLastScan" records when the Agent last checked for processes so that processes that were already runing last time the Agent ran are not double-counted.

Solaris

  • As for Linux. The /proc pseudo-filesystem is different to Linux's, but the same data is available in different places.

Capturing outbound TCP connections

General

  • The Agent captures TCP connections, not UDP connections - as UDP is inherently connectionless (each packet sent is effectively a new connection)
  • Support for IPV6 is limited; the Agent will capture the connections, but the format used to represent the target IPV6 may differ slightly depending on the mechanism used (this will be addressed in a future release)

The following fields are captured:

FieldDatatypeSample valueNotes
IpAddressstring

132.245.77.18

[2001:4860:4860::8888]

The target remote IP address of the connection, either an IPv4 or IPv6 address

See notes above about consistency of IPV6 addresses in this version of the Agent

Portinteger443The target remote port of the connection
ProcessIdinteger11828The operating-system specific identifier of the process which instigated the connection
ProcessNamestringchrome.exe

The executable filename of the process which instigated the connection

Connections originated from system-oriented processes are captured as "(system)"

Windows

  • The Agent will use ETW on Vista+ to capture TCP connections, and will use polling on Windows XP
    • For ETW, the Agent uses the Microsoft-Windows-Winsock-AFD provider
      • Note that the events generated on Vista (and correspondingly Server 2008) differ to those on Win7 (and correspondingly Server 2008 R2) and above
      • In both cases, the Agent captures initial "connect" requests, not just successful connection establishment
        • This means that an attempt to perform a connection will be captured, even if that connection does not complete (e.g. because of a timeout, or the server-side does not permit the connection)
      • The ETW data provided by the Winsock provider for connection events includes only a kernel mode process ID, not a user-mode process ID
        • To overcome this, the Agent has to also capture the "socket creation" event (which includes both user- and kernel-mode PIDs) and use the data to maintain a cached mapping between the two
        • Unreferenced entries from this cache map are aged out every 100 aggregation cycles (see below)
    • For polling, the Agent delegates to its TcpIp provider to query the active connections; on Windows, this ultimately calls the GetExtendedTcpTable Win32 API
  • In addition, the Agent will - on all versions of Windows - do an initial "poll" of existing connections to capture any connections already established at the point the module starts capturing data
    • Unlike process capturing, there is no stored value for the last time this occurred, as it is assumed that TCP connections are generally transient; also this data (i.e. connection time) is not available to the Agent
    • This means that it is possible for the Agent to double-capture a connection if that connection was established before the Agent stops monitoring, and still exists when the Agent starts monitoring again (e.g. between Agent restarts)
    • In practice, this should happen rarely
  • When the Agent captures connections via a poll, a limitation of the Windows API means that ALL established TCP connections - whether inbound or outbound - are captured; there is no way to distinguish between the two
  • Future versions of the Agent may address this by trying to correlate connections with existing open ports on the local device (i.e. try to work out if a connection is inbound if there is a corresponding listening port + IP address)

Linux

  • The Agent's TcpIp provider is used to get outgoing connections, which supports UDP and TCP (we ignore the former here) for both IPv4 and IPv6. For Linux this uses the /proc/net/tcp & /proc/net/tcp6 (and /proc/net/udp & /proc/net/udp6) pseudo filesystems. Entries with a zero outgoing IP address are ignored because they are listening ports. Unfortunately there is no PID associated with each record but there is an inode for the socket, so we then cross-reference with all /proc/*/fd/* values looking for a symbolic link with the form "socket:[inode]" which matches the socket's inode. This gives us the associated process via its PID, and hence the process name.
  • As on Windows, there is an initial poll of existing outgoing TCP connections for both IPv4 and IPv6.
  • As for processes, there is a periodic poll, every 30 seconds by default, to detect changes.
  • Similar restrictions to the Windows implementation apply:
    • There is no stored value for the last time that connections were captured when the Agent last shut down, so there is a risk that a connection will be double-counted by the current iteration of the Agent.
    • The implementation cannot distinguish between active incoming or outgoing TCP connections, so they are all considered to be outgoing.

Mac

  • The Agent captures TCP connections, not UDP connections - as UDP is inherently connectionless (each packet sent is effectively a new connection).
  • The Agent's TcpIp provider is used to get outgoing connections, which supports TCP for both IPv4 and IPv6.
  • As on Windows, there is an initial poll of existing outgoing TCP connections for both IPv4 and IPv6.
  • The code works for all recent versions of Mac OSX though for Mac OSX  earlier than Mac OSX Lion (10.7) it is not possible to report process id for a socket (since we only have sysctl support with a mib of "net.inet.tcp.pcblist" or "net.inet.udp.pcblist". Whereas for later Mac OSX versions we can use a mib of "net.inet.tcp.pcblist_n" or "net.inet.udp.pcblist_n" which offers a new format for protocol control blocks that includes the process associated with a socket. The MacOSX documentation is sparse for the requisite system calls, so use was made of the Apple open source code for the netstat utility.
  • As for processes, there is a periodic poll, every 30 seconds by default, to detect changes.
  • Similar restrictions to the Windows implementation apply:
    • There is no stored value for the last time that connections were captured when the Agent last shut down, so there is a risk that a connection will be double-counted by the current iteration of the Agent.
    • The implementation cannot currently distinguish between active incoming or outgoing TCP connections, so they are all considered to be outgoing. To determine if a TCP connection incoming the Agent would need to determine if there was a listening socket on that port.

Solaris

  • Not yet supported. Will probably involve the /dev/arp pseudo filesystem which has a horrific(ally non-documented) API. This might be useful for DNS queries too.

Capturing DNS queries

General

  • The Agent attempts to capture DNS queries at the point that they are made, although on non-Windows platforms (and pre-Win 8.1 - see below), this is not presently possible and instead the local DNS cache is queried through polling
  • When the Agent captures DNS queries, it captures the query, not the result of that query (i.e. the Agent will capture a request to resolve an FQDN which may ultimately not be resolvable)

The following fields are captured:

FieldDatatypeSample valueNotes
Fqdnstring

client-office365-tas.msedge.net

The FQDN which is being resolved

Windows

  • The Agent will use ETW on Windows 8.1 and above to capture DNS queries, and will use polling on older versions of Windows
  • This is because DNS client resolution events were not available in the DNS ETW framework until 8.1 (see https://technet.microsoft.com/en-us/library/dn305896(v=ws.11).aspx)
    • For ETW, the Agent uses the Microsoft-Windows-DNS-Client provider
    • For polling, the Agent delegates to its TcpIp provider to query the DNS cache - this ultimately calls the DnsGetCacheDataTable Win32 API
  • When using ETW, the Agent will not perform an initial poll to establish the contents of the DNS cache
  • When polling, the Agent will capture all unique FQDNs available in the cache; new entries that appear in the cache are deemed to correspond to resolutions

Linux

  • Not yet supported

Mac

  • The DNS cache is captured on Mac for every relase, by sending an INFO signal to the mDNSResponder service. This instructs the service to dump its DNS cache contents to /var/log/syslog. The Agent parses the difference before and after the signal, looking for Addr entries (DNS A records for IPv4 Address mapping) and AAAA entries (DNS AAAA records for IPv6 Address mapping). This approach works for Mac OSX Lion (10.7), Mountain Lion (10.8), Mavericks (10.9), Yosemite (10.10) and El Capitan (10.12). However briefly in a beta version of Yosemite , mDNSresponder was deprecated but reinstated for the full release of Yosemite. Also for Mac OSX Sierra (10.13) the INFO signal no longer dumps to /var/log/syslog, this is currently being investigated for an alternative approach to dump the DNS cache on Sierra.

Solaris

  • Not yet supported. Probably will use the /dev/arp pseudo-filesystem.

Capturing software installation data

General

  • On all platforms, the Agent will poll (via a call to the Software module) the list of installed software, and will use deltas between polls to infer installs and uninstalls
  • The Agent will assume that "new" installations/uninstallations occurred at the point of polling
  • The Agent stores in persistent storage (under the "Inventory.SoftwareInstallations" and "Inventory.SoftwareInstallationsLastScan" keys) a JSON representation of the results of the last scan of software, and the time that this scan occurred
  • If these keys are present, the Agent will, on start-up, attempt to identify installs/uninstalls which occurred while the Agent was not capturing data
    • For example, if Adobe Acrobat was present last time the Agent scanned, but is no longer present, it can infer that the program was uninstalled
    • Since the Agent has no way of knowing when this install/uninstall happened, it will mark the event as having occurred "now"
    • This may be improved in the future for installs - the Agent can generally derive at least the date on which the install happened (but not the time on Windows)
  • Unlike other data captures, the Agent also tracks the "presence" of software on the machine (not just whether it was uninstalled or uninstalled)
    • This is described in more detail in the Data Aggregation section

The following fields are captured:

FieldDatatypeSample valueNotes
Productstring

Google Chrome

The title of the software that was installed/uninstalled

PublisherstringGoogle Inc.The publisher of the software that was installed/uninstalled
Versionstring55.0.2883.87The version of the software that was installed/uninstalled
Architecturestringx64The platform architecture of the software
IsUninstallinteger00 = install, 1 = uninstall

Windows

  • Software installations are read from the registry from HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall and HKLM\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall
  • Per-user installations are not yet supported

Linux

Note

Linux does not distinguish between O/S packages (even the kernel) and application packages. They are all software.
  • The mechanism is like Windows in that it uses polling and "Inventory.SoftwareInstallations" & "Inventory.SoftwareInstallationsLastScan" in persistent storage. However, there are 2 variants of Linux packages: RPM and Debian-style, the latter also being used for Ubuntu. The data is accessed, as it is for all operating systems, using the Software module's installation enumerator.
  • Polls are run every 120 seconds by default.
  • For RPM-based Linuxes, we enumerate through the RPM DB using the RPM API, getting the package name, version, release, vendor, and installation time.
  • For Debian-style packages, we read through the text file /var/lib/dpkg/status. Only packages that have a Status of "installed" are recorded. There is no recorded package installation time, so that is taken from the modification time of the corresponding /var/lib/dpkg/info/package_name.list file.

Mac

  • The mechanism is like Windows in that it uses polling and "Inventory.SoftwareInstallations" & "Inventory.SoftwareInstallationsLastScan" in persistent storage. The data is accessed, as it is for all operating systems, using the Software module's installation enumerator.
  • Polls are run every 120 seconds by default.
  • The Mac Agent enumerates through installed packages using the pkgutil utility, getting the package name, version, release, vendor, and installation time.
  • The publisher is determined by reversing Product names to produce a URL. So a product com.apple.pkg.CoreADI will produce a Publisher name of apple.com and similarly a product of uk.co.bewhere.chrome.video.osx produces a Publisher of bewhere.co.uk

Solaris

  • The infrastructure is similar to the implementation for Linux (and hence Windows), but works by looking for all files matching the pattern "/var/pkg/publisher/*/pkg/*/*". Each file path itself gives the publisher, package name and version number. The last modification time of such a file is used as the package installation time.
  • There is at time of writing a bug 66121 whereby packges that are "known" but not actually installed are treated as if they installed.

Data storage

  • Captured data is stored initially in memory, and then written to disk during an aggregation cycle (see section below)
    • When the Agent shuts down, any pending data is written to disk
    • If the Agent process is terminated forcefully, any data in memory will be lost
  • The Agent uses a disk-backed, encrypted SQLite database to store captured data
  • Typically this file exists as C:\ProgramData\1E\Tachyon\Agent\DBs\Inventory.dat on Windows
    • This database is created initially when the Agent starts capturing for the first time
  • Each item of data capture (processes, TCP connections, etc) is stored in separate tables
  • Each item of data capture has a "live" table (which each captured event is appended to) and a set of aggregated tables (described fully later)
  • The tables are...

    ItemLive tableHourly tableDaily tableMonthly table
    Software installationsSoftware_LiveSoftware_HourlySoftware_DailySoftware_Monthly
    Process executionsProcess_LiveProcess_HourlyProcess_DailyProcess_Monthly
    DNS resolutionsDNS_LiveDNS_HourlyDNS_DailyDNS_Monthly
    Outbound TCP connectionsTCP_LiveTCP_HourlyTCP_DailyTCP_Monthly
  • All tables contain time-stamp fields named "TS", which are stored as UTC Unix epoch numbers
    • When querying manually SQLite, you can use a query like the following to translate the TS field into something readable...
    • SELECT IpAddress, Port, ProcessId, ProcessName, DATETIME(TS, "unixepoch") AS EventTime FROM TCP_Live
  • The tables comprising the Inventory database are accessible via the Agent Language using a "$" prefix - e.g. SELECT * FROM $TCP_Live (note that the tablenames are case sensitive)
  • If the Agent is unable to write to storage (out of disk space or other file-system problems), it will fail but continue monitoring in the hope this situation will improve later

Data aggregation

  • While monitoring data, a periodic event known as the "aggregation cycle" fires
  • When this happens, the Agent will...
    • Write anything in memory to the database
    • Summarize data to hourly, daily and monthly tables (see below)
    • Delete data which is older than a certain threshold
  • The default time for aggregation is 60 seconds - in other words, it may take up to 1 minute before data captured is available in the Agent's database

The Agent follows a scheme similar to NightWatchman Enterprise in terms of how data is summarized:

  • For each item of data capture (processes, TCP connections, etc.) the Agent stores "raw" data and also data summarized by hour, by day and by month
  • The summary tables consistent of a count of events for a particular time period, and reduced summary data about those events
    • For example, whereas the "live" TCP connection table might store 23 individual rows for connections that 3 different instances of Chrome has made to a particular server on a given day, the "daily" table will simply store the count of 23 for Chrome for that server for that day
    • The same is true of the hourly and monthly tables
    • These summary tables effectively "count + group by" fields to yield aggregated data

As an example of the aggregation, going from live TCP connection table to the daily TCP connection table:

  • In this example, we are going from "live" (raw) data to daily-summarized data.
  • In doing so, we lose the "ProcessId" column (as process IDs will always differ)
  • We then COUNT by GROUPing BY the remaining data fields (IpAddress, Port, ProcessName) and the Timestamp (TS) field truncated to the day
  • This allows us to go from 5 raw records to 2 daily-summarized records
    • Obviously this example just contains a very small number of events - in practice, the number of events summarized would be much greater
  • This pattern holds true for the hourly and monthly tables.
  • So a row in the "hourly" table would count the number of TCP connections to (IpAddress + Port) made by (Process) within that hour
  • And a row in the "monthly" table would count the number of TCP connections to (IpAddress + Port) made by (Process) within that month
  • The other data items (Process, DNS queries, Software) all work in a similar way - some columns (whose values change frequently) are discarded, and the remaining columns are COUNTed and GROUPed BY including a truncated variation of the timestamp

To store the truncated time stamp in the summary tables...

  • For "hourly", the time is truncated to the hour and stored in Unix Epoch format - so an event that occurred at 2017-01-27 18:03:54 would be included in the summary for 2017-01-27 18:00:00
  • For "daily", the time is truncated to midnight on that day and stored in Unix Epoch format - so an event that occurred at 2017-01-27 18:03:54 would be included in the summary for 2017-01-27 00:00:00
  • For "monthly", the time is truncated to midnight on the first day of the month and stored in Unix Epoch format - so an event that occurred at 2017-01-27 18:03:54 would be included in the summary for 2017-01-01 00:00:00

The following table shows the fields which are removed and introduced during the summarization process for each type of data captured:

DataField(s) removedField(s) introduced
DNS(none)LookupCount
TCP connectionsProcessIdConnectionCount
Process executionsCommandLine, ProcessId, ParentProcessIdExecutionCount
Software installationsIsUninstallInstallCount, UninstallCount

Note that the tables capturing software installations are slightly different to the others...

  • This is because the Agent tracks presence, as well as install/uninstall count, of software.
  • For the summarized software tables, InstallCount stores the number of times the product was installed in that period and UninstallCount stores the number of times it was uninstalled
  • It is possible (and common) for both these fields to be zero - this implies that the software was simply present during this hour/day/month and was neither installed nor uninstalled

In terms of how the data gets into the tables to begin with...

  • When the aggregation cycle fires, for each type of data being captured, the Agent first creates a temporary "load" table (e.g. DNS_Load) containing new records that have been captured since the last cycle (i.e. the records that are just in memory)
  • The Agent then appends (using INSERT INTO) these records to the corresponding "live" table
  • The Agent then merges these records into each of the three summary tables
  • It is important to understand that the summary tables are built from the load table, not the live table
    • In other words, the data for the "monthly" table is NOT fed from the "daily" table (which in turn is NOT fed from the "hourly" table) - they are ALL fed from the "load" table
    • This allows to be able to capture, say, just 12 hours' worth of hourly-summarized information without affecting our ability to capture daily-summarized information
    • This is different to how NightWatchman Enterprise behaves - NightWatchman feeds Monthly tables from Daily, and Yearly tables from Monthly

The append to the live table is very simple, but the merge into the aggregated tables is a more complex:

The is no "MERGE" or "UPSERT" statement in SQLite; instead, we define a unique constraint on the aggregation tables (on all fields apart from the count-based fields), and use SQLite's "INSERT OR REPLACE" function.

As an example, the SQL to merge into the TCP daily table looks something like this:

Aggregation SQL example
// Prepare a CTE of 'changes' - records which needs to be inserted or replaced in the aggregated table
WITH CTE_Changes AS 
(
     // COUNT + GROUP BY all the changed records
     SELECT TS, IpAddress, Port, ProcessName, SUM(ConnectionCount) AS ConnectionCount FROM 
     (
         // Include the new records (ones coming from the load table)...
         SELECT DATETRUNC(EventTime, 'day') AS TS, IpAddress, Port, ProcessName, 1 AS ConnectionCount 
         FROM  [Inventory.TCP_Load]
         
         UNION ALL 
         
         // ... combined with the existing records from the aggregation table (but only where
         // the existing records overlap the days we're trying to insert)
         SELECT TS, IpAddress, Port, ProcessName, ConnectionCount 
         FROM  [Inventory.TCP_Daily]
         WHERE TS IN (SELECT DISTINCT DATETRUNC(EventTime, 'day') FROM [Inventory.TCP_Load]) 
     ) AS Agg 
     GROUP BY TS, IpAddress, Port, ProcessName
) 

// Insert (or replace if the combination of TS + IpAddress + Port + ProcessName already exists)
INSERT OR REPLACE INTO [Inventory.TCP_Daily] (TS, IpAddress, Port, ProcessName, ConnectionCount) 
SELECT TS, IpAddress, Port, ProcessName, ConnectionCount FROM CTE_Changes;

The SQL for other periods and data capture sources follows the same pattern.

The last step of the aggregation cycle is to delete old data:

  • Data retention is based on configuration (see below), and can be used to specify how much data is in each of the tables.
  • Every N aggregation cycles (configurable per data capture source, default 3), an additional tidy-up step will be performed which deletes old data
  • This involves deleting rows from the hourly table which are older than X hours, from the daily table which are older than X days, etc.
  • This also includes deletes rows from the live table where to leave a maximum of X rows (i.e. deletion from the live table is based on record count, not based on time)

Data querying

  • The Agent Language has been extended to allow the Inventory tables to be queried directly from the language
  • Use the $ prefix combined with the table name (e.g. @myData = SELECT * FROM $Process_Live WHERE ProcessName = "chrome.exe")
  • Note that because the Inventory tables are not created with COLLATE NOCASE, they need to be queried in a case-sensitive fashion
    • So the example above won't match "Chrome.exe" or "chrome.EXE" - to work around this, you can use WHERE ProcessName LIKE "chrome.exe"
  • For more detail, see the relevant section in the Agent Language documentation

Configuration

The following settings are specific to Tachyon’s historic inventory capture functionality.

Configuration File Property

Description

Module.Inventory.Enabled

Module.Inventory.Enabled=true (default)

Determines whether historic inventory capture is enabled or disabled. 

Must be set to true or false.

Module.Inventory.NoEventTracing

Module.Inventory.NoEventTracing=false (default)

Controls whether the Agent will, on Windows, use a polling-based mechanism to capture data instead of event-based.

The Agent will use Windows operating system events to capture data, if the host operating system supports it. If this setting is true, the Agent will instead use a polling-based approach to capture data.

This setting is presently ignored on other operating systems.

Module.Inventory.AggregationIntervalSeconds

Module.Inventory.AggregationIntervalSeconds=60 (default)

Determines the frequency, in seconds, at which the Agent will write captured data to disk and summarize it.

More frequent aggregations will make captured data available for querying sooner, at the cost of more processing on the device.

Range is 30 to 600 (10 minutes)

The following settings are applicable to each of the data capture sources available.

The Agent supports the following capture sources:

  • DNS – captures DNS lookup requests
  • Process – captures process executions
  • Software – captures software installations and uninstallations
  • TCP – captures out-bound TCP connections

The section below uses the DNS capture source as an example, however the same settings are applicable to each available capture source.

Module.Inventory.Dns.Enabled

Module.Inventory.DnsEnabled=true (default)

Controls whether this capture source is active (true) and will capture data. To disable capture of this data, use false.

Note that disabling the historic inventory capture feature using the Module.Inventory.Enabled setting will take precedence over this setting.

Module.Inventory.Dns.BufferSize

Module.Inventory.DnsBufferSize=1000 (default)

Determines the maximum number of capture entries held in memory during an aggregation period. The default is 1000.

The Agent will store data in memory prior to writing it to disk (as determined by the Module.Inventory.AggregationIntervalSeconds setting described above). This setting controls the size of the buffer available for this data.

If this buffer is exceeded, older entries will be discarded to make room for newer ones.

For example, based on the default values, if more than 1000 DNS lookups occur within 60 seconds.

A higher value will allow the Agent can capture higher volumes of events at the cost of additional memory use.

Range is 100 to 10000.

Module.Inventory.Dns.PollIntervalSeconds

Module.Inventory.DnsPollIntervalSeconds=30 (default – see below)

Determines the frequency, in seconds, at which the capture source will poll for data.

The default value for this setting differs per capture source. The default for the Software capture source is 120 seconds (2 minutes); the default for all other capture sources is 30 seconds.

A lower value (more frequent polls) is likely to capture more data at the cost of additional processing overhead on the device.

Range is 5 to 600 (10 minutes).

This setting is ignored if the Agent is using an event-based mechanism to capture data.

Module.Inventory.Dns.AggregationsBeforeGroom

Module.Inventory.DnsAggregationsBeforeGroom=3 (default)

Determines the number of aggregation cycles that should occur before old data (see Retention settings below) is removed from the Agent’s disk-based store. 

The Agent will store captured data for a limited time before removing it. This setting determines how frequently the “clean-up” operation will be performed. The clean-up operation happens every N aggregation cycles.

A lower value (more frequent deletion) will remove old data more quickly at the cost of additional processing overhead on the device.

Range is 1 to 50.

Module.Inventory.Dns.LiveRetention

Module.Inventory.DnsLiveRetention=5000

Determines the maximum number of capture entries that will be stored in the Agent’s “live” disk-based storage.

The Agent stores detailed (non-aggregated) capture entries on disk, as well as aggregated capture entries per hour, day and month (see below). This setting determines the limit of the detailed entries. When the limit is reached, older entries are deleted to make room for newer ones.

A higher value allows storage of a longer period of detailed entries at the cost of additional disk space on the device. Storing more data will also cause queries on that data to take longer. 

Range is 100 to 50000.

Module.Inventory.Dns.HourlyRetention

Module.Inventory.Dns.DailyRetention

Module.Inventory.Dns.MonthlyRetention

 

Module.Inventory.DnsHourlyRetention=24

Module.Inventory.DnsDailyRetention=31

Module.Inventory.DnsMonthlyRetention=12

Determines the maximum number of hours/days/months for which aggregated data will be kept in the Agent’s disk-based storage.

The Agent will discard data from its hourly/daily/monthly store to make room for newer data.

A higher value allows a longer history of data to kept at the cost of additional disk space on the device. Storing more data will also cause queries on that data to take longer.

Note that these settings are independent of one another – for example, it is not necessary to specify an “hourly” value of 24 or greater to be able to capture “daily” values.

A value of zero means “disable data aggregation at this resolution”. Again, since the settings are independent, so it is valid to disable hourly data aggregation but keep daily and monthly aggregation active.

Range is 0 (disabled) to 100.

Extensibility

  • The data collected by the Agent is presently fixed
  • In the future, we may wish to consider making this policy-based - however, we could only do this if there is a generalized mechanism available for capturing the data
  • For example, if each capture could be expressed as an ETW subscription, that would be workable, but since data generally requires different APIs to be called, making this generalized (and data-driven) is very difficult

Frequently Asked Questions

Can the Agent capture processes/TCP connections/DNS queries when it's not running?
In short, no. However, the Agent can capture items which still have a remnant available when it starts up (e.g. if a process was started before the Agent, but is still running when the Agents starts, it can capture it; likewise if a TCP connection exists which was established before the Agent started)

I just tried to ping an address - what would I expect to see captured?
You should see the "ping" process being captured and a DNS resolution on the address captured. The Agent won't capture the network traffic, because ping uses UDP and not TCP.

I just ran "dir c:\sausages\*.*" from the command prompt, but it's not captured as a process - why?
"Dir", like "type" and "echo" and lots of other DOS commands, isn't an executable in its own right - it's just a command interpreted by cmd.exe. When you run then, cmd.exe isn't launch a new process.

I've set the retention period for (some table) to 3 hours, but when I query I can sometimes see 4 hours' of information - why?
Remember that the deletion of old data doesn't happen exactly on the hour; it happens every N aggregation cycles (normally 3). The old data will eventually get tidied up, but it might take a few minutes.

What's the impact of capturing all this data?
If the Agent can use ETW - then it's very low impact. ETW itself buffers events before it delivers them to subscribers, so the Agent isn't "holding up" any of the operations that it captures. ETW is very light-weight. Polling for data isn't ideal, but is also designed to be low overhead. In practise, this means you're unlikely to notice any impact on the device (at the most, possibly 1-2% CPU usage at the point that aggregation occurs).

What happens if the SQLite database gets deleted?
Then you lose the data. There's only so much that the Agent can protect against.

I don't get this whole aggregation thing - is there a table created for each month/day/hour?
No - the "hourly" table stores per-hour data for up to (normally) 24 hours. There will be a bunch of rows for 6pm, a bunch for 7pm, etc etc.

Can the Agent capture whenever I hit a URL? Or if I am downloading a PNG?
Technically, yes  - at least with ETW, which actually allows you to capture on the wire traffic. In practise, absolutely not - the Agent would have to perform packet inspection on all network data, which would be extremely costly. The Tachyon Agent is not a firewall appliance, nor is it an anti-virus product.

How big is the database going to get?
It depends on how many events are generated, and how long you are retaining the data. With the default retention values, the database will typically be between ~25MB and ~100MB depending on how much data is generated. It would be rare for the database to exceed 100MB. You can follow the instructions on Pre-populating the Agent Historic Database to build a database with large volumes of data for testing.

How can I decrypt the database to see what's in it?
See the "Diagnosing DB Problems" section in SQLite and SQLite Encryption Extension - SEE for the details. A DB is not decryptable (without effort) by customers, only 1E.

Why is it encrypted in the first place?
Otherwise user A on a device would be able to see (potentially) what user B is doing.

So modules have methods which can be used in an instruction - does the Inventory module have any methods, or does it just capture stuff?
Good question. There is not presently any exposed methods from the Inventory module, although we may later add methods to be able to retrieve live statistics on data capture.

When I run Windows process X, the command-line has got \?\ and all sorts of funny characters in it - why is that?
It's down to however the parent process launched that child process - we simply capture the data.

The Agent captures the MD5 hash of the process that's running - isn't that expensive?
Yes, although the Agent uses caching to prevent having to repeatedly calculate the MD5 hashes of processes that run frequently.

Can the Agent capture SHA256 instead of MD5?
Presently, no. The decision to capture MD5 is based on the fact that most IOC feeds tend to use MD5 hashes to identify files. While MD5 isn't good enough to provide a unique identity for a file, it IS good enough to (reasonably) confirm that a particular file has known content.

Why doesn't SELECT * FROM Process_Monthly WHERE ExecutableName = "Chrome.exe" return as many rows as I expect?
Remember that in SQLite comparison of data is case-sensitive. You can either change your query to use LIKE instead of equals, or you can specify COLLATE NOCASE on the operation - see http://stackoverflow.com/questions/973541/how-to-set-sqlite3-to-be-case-insensitive-when-string-comparing 

The Agent has captured DNS queries to www.facebook.com and www.sexyteens.co.uk and all this other stuff that I swear I didn't vist ('onest!) - what's going on?
Lots of your favourite web pages probably make sneaky requests in the background to some of these sites for advertising, statistic gathering, etc. Interesting, huh? 

Future data collection ideas

  • User logons (some research + work already done on this for Windows using ETW)
  • Inbound TCP connections + open ports
  • Registry changes
  • File change events
  • Local group membership changes
  • File association changes