Version: 4
restore

Contents


What is Tachyon Agent Historic Data Capture?

On Tachyon Windows Agent devices Tachyon continuously captures events, which enables Tachyon to capture all significant events as they happen. This should be contrasted with polling, which to a certain degree relies on luck to capture conditions that are brief enough to fall between polls. In this way Tachyon Agent Historic Data Capture compares with the Windows Task Manager or Perfmon. Tachyon captures the data to a compressed and encrypted database to ensure that it has a very low impact on device performance and security.

The data is captured and stored to a local, encrypted persistent store and then periodically aggregated according to an ongoing daily, weekly, monthly window. This means that the data is held securely and the amount of data is minimized while still maintaining its usefulness.

Configuration options for each capture source are described in the public documentation reference for Tachyon Agent configuration properties.

What are the data capture sources?

The table below lists currently supported capture sources, and on which OS they are supported. Data capture on Android is presently not available.

The Agent has two key mechanisms of knowing when an event occurs that is of interest - event-based and polling-based

  • Event-based relies on a source external to the Agent (normally the operating system) providing a notification to indicate that something has happened
  • Polling-based is where the Agent will periodically check a source of data and work out what has changed by looking at differences in the data returned

The Windows Agent can be configured to use polling instead of ETW for individual capture sources.

When using the polling method, the polling interval is every 30 seconds.

Historic data source

Description

Windows
MacOS
Linux
Solaris
DNS resolutionsThe Agent captures whenever a DNS address is resolved.
  • ETW on Windows 8.1 and above

  • Polling on Windows 8 and below

PollingNot yet availableNot yet available
Process executionsThe Agent captures whenever a process starts on the device.
  • ETW on Windows Vista and above
  • Polling on Windows XP
PollingPollingPolling
Software installations

The Agent captures which software is present on a device, and when it is installed and uninstalled.

  • Polling on on all versions
PollingPollingPolling
Outbound TCP connectionsThe Agent captures whenever an outbound TCP connection is made.
  • ETW on Windows Vista and above
  • Polling on Windows XP
PollingPollingNot yet available

How do I retrieve the data from the Tachyon Agent devices?

Live and aggregated historic data is available in inventory tables. 

Historic data source
Live tables
Hourly tables
Daily tables
Monthly tables
DNS resolutions$DNS_Live$DNS_Hourly$DNS_Daily$DNS_Monthly
Process executions$Process_Live$Process_Hourly$Process_Daily$Process_Monthly
Software installations$Software_Live$Software_Hourly$Software_Daily$Software_Monthly
Outbound TCP connections$TCP_Live$TCP_Hourly$TCP_Daily$TCP_Monthly
Example - querying historic captured data
/* Sum the number of connections made per process today */
SELECT    SUM(ConnectionCount) AS Connections
,         ProcessName
FROM      $TCP_Daily
WHERE     TS = DATETRUNC(STRFTIME("%s", "now"), "day")
GROUP BY  ProcessName;
Example - querying historic captured data
SELECT * FROM $Process_Live WHERE ProcessName = "chrome.exe"

Note that because the inventory tables are not created with COLLATE NOCASE, they need to be queried in a case-sensitive fashion. So the example above won't match "Chrome.exe" or "chrome.EXE" - to work around this, you can use WHERE ProcessName LIKE "chrome.exe"

How is the data managed?

The Tachyon Agent automatically aggregates and grooms data in each inventory table. Aggregation intervals and data retention are configurable in the Agent configuration file.  

  • Default aggregation cycle interval is every 60 seconds, therefore it may take up to a minute before an event appears in an aggregated table
  • Default retention for live tables is 5000 entries provided at least 3 aggregation cycles have occurred (older entries are deleted to make room for new entries)
  • Default retention for hourly tables is 24 hours.
  • Default retention for daily tables is 31 days.
  • Default retention for monthly tables is 12 months.

Historic data capture inventory schema

The following table shows the fields which exist only in the Live and Aggregated (Hourly, Daily, Monthly) tables.

Historic data source
Fields that only exist in Live tables
Fields that only exist in Aggregated tables
DNS resolutionsn/aLookupCount
Process executionsCommandLine, ProcessId, ParentProcessIdExecutionCount
Software installationsIsUninstallInstallCount, UninstallCount
Outbound TCP connectionsProcessIdConnectionCount

Timestamps (TS column) are truncated in the aggregated tables.

  • Hourly - time is truncated to the hour and stored in Unix Epoch format - so an event that occurred at 2017-01-27 18:03:54 would be included in the summary for 2017-01-27 18:00:00
  • Daily - time is truncated to midnight on that day and stored in Unix Epoch format - so an event that occurred at 2017-01-27 18:03:54 would be included in the summary for 2017-01-27 00:00:00
  • Monthly - time is truncated to midnight on the first day of the month and stored in Unix Epoch format - so an event that occurred at 2017-01-27 18:03:54 would be included in the summary for 2017-01-01 00:00:00

DNS resolutions

The Agent attempts to capture DNS queries at the point that they are made, although on non-Windows platforms (and pre-Win 8.1 - see below), this is not presently possible and instead the local DNS cache is queried through polling.

When the Agent captures DNS queries, it captures the query, not the result of that query. That is, the Agent will capture a request to resolve an FQDN which may ultimately not be resolvable.

FieldDatatypeSample valueTablesDescription
Fqdnstring

client-office365-tas.msedge.net

  • $DNS_Live
  • $DNS_Hourly
  • $DNS_Daily
  • $DNS_Monthly

The FQDN which is being resolved.

LookupCountinteger1234
  • $DNS_Hourly
  • $DNS_Daily
  • $DNS_Monthly
Sum of resolutions per Fqdn within the
  • When using ETW, the Agent will not perform an initial poll to establish the contents of the DNS cache
  • When polling, the Agent will capture all unique FQDNs available in the DNS cache; new entries that appear in the cache are deemed to correspond to resolutions

Process executions

The Agent captures process starts; it does not track how long the process has been running, or how much CPU-time (or user/kernel/active time) the process has used.

FieldDatatypeSample valueTablesNotes
ProcessIdinteger178
  • $Process_Live
Operating-system dependent process ID.
ExecutableNamestringvmconnect.exe

All

The filename (including extension) of the process executable.
ExecutablePathstring\device\harddiskvolume8\windows\system32\vmconnect.exeAll

The path and filename of the process executable.

On Windows, this is the NT-device format version of the path (as a process does not necessarily need to have been launched from a device which has a drive-letter mapping).

ExecutableHashstringdae0bb0a7b2041115cfd9b27d73e0391All

The MD5 hash of the process executable.

ExecutionCountinteger1234
  • $Process_Hourly
  • $Process_Daily
  • $Process_Monthly
 
CommandLinestring"C:\Windows\system32\VmConnect.exe" "1EUKDEVWKS1231" "TCH-CLI-WXPX86" -G "B2C72520-BBC6-4736-BBBC-5CCF50FE6666" -C "0"
  • $Process_Live
The full command-line of the process, including (on Windows) the executable name.
UserNamestring1E\james.daviesAllThe name of the user who launched the process (or blank if it is a system-launched process).
ParentProcessIdinteger2088
  • $Process_Live
The process ID of the process which spawned this one.
ParentExecutableNamestringmmc.exeAllThe filename (including extension) of the executable of the process which spawned this one.


Each time the Tachyon Agent starts it does an initial scan of processes before it starts capturing. To prevent double-counting a persistent storage setting called "Inventory.ProcessesLastScan" records the last time the Agent checked for processes. This corresponds to the last time the Agent polled, or if ETW is used it is the time when the Agent inventory module was last terminated.

On Windows, the Agent runs as LOCAL SYSTEM, therefore details of almost every process will be available; however some processes may not be accessible because of permissions

Sometimes the executable name part of the command-line is quoted, sometimes it's not - it's arbitrary based however the parent process launched the child; so you may see a mix of command-lines like...

    • "C:\Program Files (x86)\Microsoft Office\root\Office16\OUTLOOK.EXE" 
    • \??\C:\Windows\system32\conhost.exe 0x4
    • C:\Windows\system32\svchost.exe -k UnistackSvcGroup

Capturing outbound TCP connections

General

  • The Agent captures TCP connections, not UDP connections - as UDP is inherently connectionless (each packet sent is effectively a new connection)
  • Support for IPV6 is limited; the Agent will capture the connections, but the format used to represent the target IPV6 may differ slightly depending on the mechanism used (this will be addressed in a future release)

The following fields are captured:

FieldDatatypeSample valueNotes
IpAddressstring

132.245.77.18

[2001:4860:4860::8888]

The target remote IP address of the connection, either an IPv4 or IPv6 address

See notes above about consistency of IPV6 addresses in this version of the Agent

Portinteger443The target remote port of the connection
ProcessIdinteger11828The operating-system specific identifier of the process which instigated the connection
ProcessNamestringchrome.exe

The executable filename of the process which instigated the connection

Connections originated from system-oriented processes are captured as "(system)"

Windows

  • The Agent will use ETW on Vista+ to capture TCP connections, and will use polling on Windows XP
    • For ETW, the Agent uses the Microsoft-Windows-Winsock-AFD provider
      • Note that the events generated on Vista (and correspondingly Server 2008) differ to those on Win7 (and correspondingly Server 2008 R2) and above
      • In both cases, the Agent captures initial "connect" requests, not just successful connection establishment
        • This means that an attempt to perform a connection will be captured, even if that connection does not complete (e.g. because of a timeout, or the server-side does not permit the connection)
      • The ETW data provided by the Winsock provider for connection events includes only a kernel mode process ID, not a user-mode process ID
        • To overcome this, the Agent has to also capture the "socket creation" event (which includes both user- and kernel-mode PIDs) and use the data to maintain a cached mapping between the two
        • Unreferenced entries from this cache map are aged out every 100 aggregation cycles (see below)
    • For polling, the Agent delegates to its TcpIp provider to query the active connections; on Windows, this ultimately calls the GetExtendedTcpTable Win32 API
  • In addition, the Agent will - on all versions of Windows - do an initial "poll" of existing connections to capture any connections already established at the point the module starts capturing data
    • Unlike process capturing, there is no stored value for the last time this occurred, as it is assumed that TCP connections are generally transient; also this data (i.e. connection time) is not available to the Agent
    • This means that it is possible for the Agent to double-capture a connection if that connection was established before the Agent stops monitoring, and still exists when the Agent starts monitoring again (e.g. between Agent restarts)
    • In practice, this should happen rarely
  • When the Agent captures connections via a poll, a limitation of the Windows API means that ALL established TCP connections - whether inbound or outbound - are captured; there is no way to distinguish between the two
  • Future versions of the Agent may address this by trying to correlate connections with existing open ports on the local device (i.e. try to work out if a connection is inbound if there is a corresponding listening port + IP address)

Linux

  • The Agent's TcpIp provider is used to get outgoing connections, which supports UDP and TCP (we ignore the former here) for both IPv4 and IPv6. For Linux this uses the /proc/net/tcp & /proc/net/tcp6 (and /proc/net/udp & /proc/net/udp6) pseudo filesystems. Entries with a zero outgoing IP address are ignored because they are listening ports. Unfortunately there is no PID associated with each record but there is an inode for the socket, so we then cross-reference with all /proc/*/fd/* values looking for a symbolic link with the form "socket:[inode]" which matches the socket's inode. This gives us the associated process via its PID, and hence the process name.
  • As on Windows, there is an initial poll of existing outgoing TCP connections for both IPv4 and IPv6.
  • As for processes, there is a periodic poll, every 30 seconds by default, to detect changes.
  • Similar restrictions to the Windows implementation apply:
    • There is no stored value for the last time that connections were captured when the Agent last shut down, so there is a risk that a connection will be double-counted by the current iteration of the Agent.
    • The implementation cannot distinguish between active incoming or outgoing TCP connections, so they are all considered to be outgoing.

Mac

  • The Agent captures TCP connections, not UDP connections - as UDP is inherently connectionless (each packet sent is effectively a new connection).
  • The Agent's TcpIp provider is used to get outgoing connections, which supports TCP for both IPv4 and IPv6.
  • As on Windows, there is an initial poll of existing outgoing TCP connections for both IPv4 and IPv6.
  • The code works for all recent versions of Mac OSX though for Mac OSX  earlier than Mac OSX Lion (10.7) it is not possible to report process id for a socket (since we only have sysctl support with a mib of "net.inet.tcp.pcblist" or "net.inet.udp.pcblist". Whereas for later Mac OSX versions we can use a mib of "net.inet.tcp.pcblist_n" or "net.inet.udp.pcblist_n" which offers a new format for protocol control blocks that includes the process associated with a socket. The MacOSX documentation is sparse for the requisite system calls, so use was made of the Apple open source code for the netstat utility.
  • As for processes, there is a periodic poll, every 30 seconds by default, to detect changes.
  • Similar restrictions to the Windows implementation apply:
    • There is no stored value for the last time that connections were captured when the Agent last shut down, so there is a risk that a connection will be double-counted by the current iteration of the Agent.
    • The implementation cannot currently distinguish between active incoming or outgoing TCP connections, so they are all considered to be outgoing. To determine if a TCP connection incoming the Agent would need to determine if there was a listening socket on that port.

Solaris

  • Not yet supported. Will probably involve the /dev/arp pseudo filesystem which has a horrific(ally non-documented) API. This might be useful for DNS queries too.

Capturing software installation data

General

  • On all platforms, the Agent will poll (via a call to the Software module) the list of installed software, and will use deltas between polls to infer installs and uninstalls
  • The Agent will assume that "new" installations/uninstallations occurred at the point of polling
  • The Agent stores in persistent storage (under the "Inventory.SoftwareInstallations" and "Inventory.SoftwareInstallationsLastScan" keys) a JSON representation of the results of the last scan of software, and the time that this scan occurred
  • If these keys are present, the Agent will, on start-up, attempt to identify installs/uninstalls which occurred while the Agent was not capturing data
    • For example, if Adobe Acrobat was present last time the Agent scanned, but is no longer present, it can infer that the program was uninstalled
    • Since the Agent has no way of knowing when this install/uninstall happened, it will mark the event as having occurred "now"
    • This may be improved in the future for installs - the Agent can generally derive at least the date on which the install happened (but not the time on Windows)
  • Unlike other data captures, the Agent also tracks the "presence" of software on the machine (not just whether it was uninstalled or uninstalled)
    • This is described in more detail in the Data Aggregation section

The following fields are captured:

FieldDatatypeSample valueNotes
Productstring

Google Chrome

The title of the software that was installed/uninstalled

PublisherstringGoogle Inc.The publisher of the software that was installed/uninstalled
Versionstring55.0.2883.87The version of the software that was installed/uninstalled
Architecturestringx64The platform architecture of the software
IsUninstallinteger00 = install, 1 = uninstall

Windows

  • Software installations are read from the registry from HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall and HKLM\SOFTWARE\Wow6432Node\Microsoft\Windows\CurrentVersion\Uninstall
  • Per-user installations are not yet supported

Linux

Note

Linux does not distinguish between O/S packages (even the kernel) and application packages. They are all software.
  • The mechanism is like Windows in that it uses polling and "Inventory.SoftwareInstallations" & "Inventory.SoftwareInstallationsLastScan" in persistent storage. However, there are 2 variants of Linux packages: RPM and Debian-style, the latter also being used for Ubuntu. The data is accessed, as it is for all operating systems, using the Software module's installation enumerator.
  • Polls are run every 120 seconds by default.
  • For RPM-based Linuxes, we enumerate through the RPM DB using the RPM API, getting the package name, version, release, vendor, and installation time.
  • For Debian-style packages, we read through the text file /var/lib/dpkg/status. Only packages that have a Status of "installed" are recorded. There is no recorded package installation time, so that is taken from the modification time of the corresponding /var/lib/dpkg/info/package_name.list file.

Mac

  • The mechanism is like Windows in that it uses polling and "Inventory.SoftwareInstallations" & "Inventory.SoftwareInstallationsLastScan" in persistent storage. The data is accessed, as it is for all operating systems, using the Software module's installation enumerator.
  • Polls are run every 120 seconds by default.
  • The Mac Agent enumerates through installed packages using the pkgutil utility, getting the package name, version, release, vendor, and installation time.
  • The publisher is determined by reversing Product names to produce a URL. So a product com.apple.pkg.CoreADI will produce a Publisher name of apple.com and similarly a product of uk.co.bewhere.chrome.video.osx produces a Publisher of bewhere.co.uk

Solaris

  • The infrastructure is similar to the implementation for Linux (and hence Windows), but works by looking for all files matching the pattern "/var/pkg/publisher/*/pkg/*/*". Each file path itself gives the publisher, package name and version number. The last modification time of such a file is used as the package installation time.
  • There is at time of writing a bug 66121 whereby packges that are "known" but not actually installed are treated as if they installed.

Data storage

  • Captured data is stored initially in memory, and then written to disk during an aggregation cycle (see section below)
    • When the Agent shuts down, any pending data is written to disk
    • If the Agent process is terminated forcefully, any data in memory will be lost
  • The Agent uses a disk-backed, encrypted SQLite database to store captured data
  • Typically this file exists as C:\ProgramData\1E\Tachyon\Agent\DBs\Inventory.dat on Windows
    • This database is created initially when the Agent starts capturing for the first time
  • Each item of data capture (processes, TCP connections, etc) is stored in separate tables
  • Each item of data capture has a "live" table (which each captured event is appended to) and a set of aggregated tables (described fully later)
  • The tables are...

    ItemLive tableHourly tableDaily tableMonthly table
    Software installationsSoftware_LiveSoftware_HourlySoftware_DailySoftware_Monthly
    Process executionsProcess_LiveProcess_HourlyProcess_DailyProcess_Monthly
    DNS resolutionsDNS_LiveDNS_HourlyDNS_DailyDNS_Monthly
    Outbound TCP connectionsTCP_LiveTCP_HourlyTCP_DailyTCP_Monthly
  • All tables contain time-stamp fields named "TS", which are stored as UTC Unix epoch numbers
    • When querying manually SQLite, you can use a query like the following to translate the TS field into something readable...
    • SELECT IpAddress, Port, ProcessId, ProcessName, DATETIME(TS, "unixepoch") AS EventTime FROM TCP_Live
  • The tables comprising the Inventory database are accessible via the Agent Language using a "$" prefix - e.g. SELECT * FROM $TCP_Live (note that the tablenames are case sensitive)
  • If the Agent is unable to write to storage (out of disk space or other file-system problems), it will fail but continue monitoring in the hope this situation will improve later

Data aggregation

  • While monitoring data, a periodic event known as the "aggregation cycle" fires
  • When this happens, the Agent will...
    • Write anything in memory to the database
    • Summarize data to hourly, daily and monthly tables (see below)
    • Delete data which is older than a certain threshold
  • The default time for aggregation is 60 seconds - in other words, it may take up to 1 minute before data captured is available in the Agent's database

The Agent follows a scheme similar to NightWatchman Enterprise in terms of how data is summarized:

  • For each item of data capture (processes, TCP connections, etc.) the Agent stores "raw" data and also data summarized by hour, by day and by month
  • The summary tables consistent of a count of events for a particular time period, and reduced summary data about those events
    • For example, whereas the "live" TCP connection table might store 23 individual rows for connections that 3 different instances of Chrome has made to a particular server on a given day, the "daily" table will simply store the count of 23 for Chrome for that server for that day
    • The same is true of the hourly and monthly tables
    • These summary tables effectively "count + group by" fields to yield aggregated data

As an example of the aggregation, going from live TCP connection table to the daily TCP connection table:

  • In this example, we are going from "live" (raw) data to daily-summarized data.
  • In doing so, we lose the "ProcessId" column (as process IDs will always differ)
  • We then COUNT by GROUPing BY the remaining data fields (IpAddress, Port, ProcessName) and the Timestamp (TS) field truncated to the day
  • This allows us to go from 5 raw records to 2 daily-summarized records
    • Obviously this example just contains a very small number of events - in practice, the number of events summarized would be much greater
  • This pattern holds true for the hourly and monthly tables.
  • So a row in the "hourly" table would count the number of TCP connections to (IpAddress + Port) made by (Process) within that hour
  • And a row in the "monthly" table would count the number of TCP connections to (IpAddress + Port) made by (Process) within that month
  • The other data items (Process, DNS queries, Software) all work in a similar way - some columns (whose values change frequently) are discarded, and the remaining columns are COUNTed and GROUPed BY including a truncated variation of the timestamp

To store the truncated time stamp in the summary tables...

  • For "hourly", the time is truncated to the hour and stored in Unix Epoch format - so an event that occurred at 2017-01-27 18:03:54 would be included in the summary for 2017-01-27 18:00:00
  • For "daily", the time is truncated to midnight on that day and stored in Unix Epoch format - so an event that occurred at 2017-01-27 18:03:54 would be included in the summary for 2017-01-27 00:00:00
  • For "monthly", the time is truncated to midnight on the first day of the month and stored in Unix Epoch format - so an event that occurred at 2017-01-27 18:03:54 would be included in the summary for 2017-01-01 00:00:00

The following table shows the fields which are removed and introduced during the summarization process for each type of data captured:

DataField(s) removedField(s) introduced
DNS(none)LookupCount
TCP connectionsProcessIdConnectionCount
Process executionsCommandLine, ProcessId, ParentProcessIdExecutionCount
Software installationsIsUninstallInstallCount, UninstallCount

Note that the tables capturing software installations are slightly different to the others...

  • This is because the Agent tracks presence, as well as install/uninstall count, of software.
  • For the summarized software tables, InstallCount stores the number of times the product was installed in that period and UninstallCount stores the number of times it was uninstalled
  • It is possible (and common) for both these fields to be zero - this implies that the software was simply present during this hour/day/month and was neither installed nor uninstalled