Device Deduplication
Several scenarios can lead to the same device being reported more than once within 1E, which can find their way into Experience Analytics and other applications. The main issue is duplicates having different TachyonGuid's. TachyonGuid is the primary distinct identifier of a device, meaning that when calculating the total number of devices within the system it can easily show a count that is higher than the real world number. It also means that data associated with a device (for example, Experience Analytics performance data) will eventually be split among multiple TachyonGuids, affecting access to the device's data as well as affecting aggregations.
In most cases, deduplicating using the device FQDN should suffice, in that it is reasonable to assume that multiple devices with the same FQDN are the same device. However, this is not true in all cases - for various reasons some organizations do not use FQDN to identify a device, they may uniquely identify devices based on Make, Model and Serial Number. For this reason, the deduplication feature can be configured with one or more device attributes to uniquely identify a distinct device.
Note
The Coordinator runs a device grooming task that deletes devices from the TachyonMaster database. This deletes devices that have not connected to the platform in a set number of days (99 by default). When devices are groomed a deleted event will be sent. The tachyonGuid
specifies the device being deleted, for delete events payload will be null.
Configuration
1E stores its Device Deduplication configuration in: <INSTALLDIR>\Tachyon\Coordinator\Tachyon.Server.Coordinator.exe.config
Note
After any changes reboot the server or restart the 1E Tachyon Coordinator service.
The time and frequency of when the Device Deduplication process runs is changed by adding and editing the DeviceDeduplication setting:
<module assemblyName="Tachyon.Server.Coordinator" providerName="Tachyon.Server.Coordinator.Scheduling.SchedulingModule"> <settings> <add key="Crontab1" value="50 23 * * * LogDevicesSeenInTheLast7Days" /> <add key="Crontab2" value="15 * * * * MaintainPolicyRulePartitioning" /> <add key="Telemetry" value="40 23 * * 5 SendTelemetryStats" /> <add key="DeviceDeduplication" value="0 1 * * 0 DeviceDeduplication {DeleteDevices:'false',UpdateDeviceData:'false',DeleteOrphanedDeviceData:'false',DeviceIdentifier:['FQDN']}" /> </settings> </module>
The numbers that you see after value is a crontab schedule expression. The schedule 0 1 * * 0 means "at 01:00 hours, any day of the month, any month of the year, on weekday 0 (Sunday)". In other words, this configuration runs the Device Deduplication process at 1AM UTC every Sunday night.
You can use an online tool such as https://crontab.guru/ to verify your crontab schedule expression.
Warning
Please ensure there are no spaces in the arguments text {within the brackets}.
Do not change the other crontab keys unless advised by 1E.
Modifying when Device Deduplication runs
By default, Device Deduplication runs at 1AM UTC every Sunday night. Another process called Experience Synchronization runs at 2AM UTC daily. It is very important that Experience Synchronization is not running when Device Deduplication starts, therefore if you change the time when either of these processes start, then you must ensure Experience Synchronization process starts after the Device Deduplication process start, and ideally there should be at least a 1 hour gap to avoid overlap. You may need a longer gap for very large systems.
Arguments
Running with all "Update..." and "Delete..." arguments set to false will make no database changes and is considered a diagnostic/practice run. Following a run such as this the DeviceDeduplication and DeviceDeduplicationLog tables could then be analyzed to figure out how many duplicates are in the environment based on the DeviceIdentifier input. This would provide a projected impact for a subsequent run with one or more of these arguments set to true. Alternatively, the user could then also choose to modify the DeviceIdentifier value to include more/less/different columns that would provide a more fine tuned target of what the user expects a truly unique device to be.
Argument | Default | Usage |
---|---|---|
Instance | Instance number of last run +1 | This can optionally be added to the list of parameters in the app setting. By default, the orchestration will handle this automatically. If specifying, previous instance numbers can be found in the DeviceDeduplicationLog table. A use case for specifying this is retroactive execution, see Instance below section for more details. |
DeleteDevices | False | Delete duplicate devices from both databases, and any management group associations for those devices. |
UpdateDeviceData | False | Relink TachyonExperience performance data from duplicate devices (that will be deleted) to the singular active device that will be kept. |
DeleteOrphanedDeviceData | False | Remove TachyonExperience performance data that is associated with a device that no longer exists. |
DeviceIdentifier | ['FQDN'] | This is an array of one or more passes, with each pass made up of one or more device attributes (column names) from the Device table in the TachyonMaster database. Please refer to Device Deduplication below for more details about using multiple passes. You can use any column from theDevicetable that is adevice attribute, however, you should only use attributes that uniquely identify a device. Device attributes that are commonly used to uniquely identify a device are: Fqdn,SMBiosGuid, MAC, SerialNumber,Manufacturer, Model, Domain, Name,User, andLocation. Some of these attributes can be used on their own, whilst others must be used in combination. You should avoid using TachyonGuid, or any of the date columns. Each pass must be wrapped in single quotes. Any attributes specified within a pass are create a single hash per device. When using multiple passes, or multiple attributes within a pass, they must each be separated by acomma. For example: DeviceIdentifier:['SerialNumber,SMBiosGuid','Manufacturer,Model,Location','FQDN',] NoteAny devices that contain identical values across all the specified attributes will be assigned the same hash. Of these devices, the one that has the most recent LastConnUtc value will be marked as ToBeDeleted=0 whilst the others will be marked as ToBeDeleted=1. This is how we identify duplicates as well as the single active device that we want to keep. The CreatedUtcthat is retained for the kept device is not necessarily the earliest or the latest of all its duplicates. NoteIf the column schema of the Device table changes between runs, then the DeviceDeduplication table will have to be backed up and deleted before the next run. This will generate a new version of the table in the next run with up-to-date device columns, which also means new columns can now be specified within the DeviceIdentifier value. |
The following table describes the system behavior for every permutation of boolean arguments passed to DeviceDeduplication:
DeleteDevices | UpdateDeviceData | DeleteOrphanedDeviceData | Device Deduplication process behavior |
---|---|---|---|
F | F | F | This is the default configuration (3 x False). Before you change these settings, you should run a diagnostic/practice run where no changes will be made to the devices or their data. Check DeviceDeduplication and DeviceDeduplicationLog table for results analysis. |
T | F | F | Delete devices identified as duplicates, do not update or merge any existing performance data. |
F | T | F | Do not delete devices identified as duplicates, reassign duplicate devices performance data to the single device that has been identified as the most recent instance of the device. *1 |
F | F | T | Do not delete devices identified as duplicates, do not update or merge any existing performance data. Delete any performance data that is associated with a TachyonGuid that no longer exists in the TachyonMaster Device table. This might be useful as a database cleanup exercise to remove "unparented data". *1 *2 |
T | T | F | Delete devices identified as duplicates and merge any existing performance data that was associated with duplicates. |
F | T | T | Do not delete devices identified as duplicates, reassign duplicate devices performance data to the single device that has been identified as the most recent instance of the device. Finally, after relinking the data, delete any performance data that is associated with a TachyonGuid that no longer exists in the TachyonMaster Device table. *1 |
T | F | T | Delete devices identified as duplicates, do not update or merge any existing performance data. Delete any performance data that is associated with a TachyonGuid that no longer exists in the TachyonMaster Device table. *2 |
T | T | T | Delete devices identified as duplicates, reassign duplicate devices performance data to the single device that has been identified as the most recent instance of the device. Finally, after relinking the data, delete any performance data that is associated with a TachyonGuid that no longer exists in the TachyonMaster device table. |
*1 - This could lead to continued inflated device counts in the system
*2 - Be careful as this data will be deleted and cannot be recovered or merged at a later date even if the TachyonGuid that reported it comes online again.
Instance
Each run of Device Deduplication will result in entries in the DeviceDeduplicationLog table. All rows for the run will be assigned an instance number. The Instance number is also used to be able to share context between the TachyonMaster and TachyonExperience parts of the process.
Retroactive execution: An old instance can also be passed to the process via the app setting. When an old instance is provided duplicates will not be recalculated, instead the list of devices from that previous run will be used (from the DeviceDeduplication table). This will also be logged in the DeviceDeduplicationLog table, an entry that reads "This is a rerun of a previous instance. Previous history from the DeviceDeduplication table will be used for this instance". In an execution such as this any new value for DeviceIdentifier will be ignored however all other new arguments will be applied.
Multiple Passes of Deduplication
The DeviceIdentifier argument is a comma separated list of passes, each pass itself being a comma separated list of device attributes. The order of specified passes is adhered to for Device Deduplication. Each pass further refines duplicate devices against the results of the previous pass. There is no limit to the number of passes specified.
For example, the passes listed in the table below would be run if the DeviceIdentifier is set to DeviceIdentifier:['SerialNumber,SMBiosGuid','Manufacturer,Model,Location','FQDN',]:
Pass number | Device attributes used | Identify duplicates based on... |
---|---|---|
1 | SerialNumber SmBiosGuid | Devices in the TachyonMaster Device table that have the same SerialNumber AND SMBiosGuid. |
2 | Manufacturer Model Location | Devices identified as duplicates in pass 1 that also have the same Manufacturer, Model and Location. |
3 | FQDN | Devices identified as duplicates in pass 2 that also have the same FQDN. |
The results of pass 3 alone are then used for the rest of the process and considered duplicates going forward.
Experience Synchronization
If any of the Boolean arguments are set to true, Device Deduplication may affect existing data in the system, whether through relinking to a new TachyonGuid or removing as orphaned data. For this reason, when the Device Deduplication process completes (with any settings set to true), it triggers a full Experience Synchronization. By default, the regular Experience Synchronization process is incremental, and runs at 2AM UTC daily.
The full Experience Synchronization includes a full process of the TachyonExperience aggregated data, which is initiated regardless of the current ProcessMode option specified (GlobalSetting table in the TachyonExperience database). A full process is required because data which typically 1E handles as additive can have historical data modified by the DeviceDeduplication process, which could otherwise lead to inaccurate counts and aggregations.
For large environments the full process of the aggregable data is likely to be a long running background process, it will not impact data displayed in the UI until it has completed. If it conflicts with an ongoing Experience Synchronization process it will be skipped. Server resource demand is likely to increase during an aggregation process. Be aware that depending on the resources available, this could affect the query response time of the UI.
The latest Accumulated Hotfix for 1E Platform Server allows you to add the ExperienceSync setting to the Coordinator configuration file. You do not need to add or modify this unless advised by 1E.
<module assemblyName="Tachyon.Server.Coordinator" providerName="Tachyon.Server.Coordinator.Scheduling.SchedulingModule"> <settings> <add key="Crontab1" value="50 23 * * * LogDevicesSeenInTheLast7Days" /> <add key="Crontab2" value="15 * * * * MaintainPolicyRulePartitioning" /> <add key="Telemetry" value="40 23 * * 5 SendTelemetryStats" /> <add key="DeviceDeduplication" value="0 1 * * 0 DeviceDeduplication {DeleteDevices:'false',UpdateDeviceData:'false',DeleteOrphanedDeviceData:'false',DeviceIdentifier:['FQDN']}" /> <add key="ExperienceSync" value="0 2 * * * ExperienceSynchronization" /> </settings> </module>
The numbers that you see after value is a crontab schedule expression. The schedule 0 2 * * * means "at 02:00 hours, any day of the month, any month of the year, any day". In other words, this configuration runs the Experience Synchronization process at 2AM UTC daily.
You can use an online tool such as https://crontab.guru/ to verify your crontab schedule expression.
Logging and analysing results
The Device Deduplication process creates logs in the Tachyon.Coordinator.log file.
The DeviceDeduplicationLog table - in the TachyonMaster database - logs more granular detail including the arguments used for the instance, details of each pass if more than one is specified, numbers of duplicates found, as well as rows affected in specific tables by various actions.
The DeviceDeduplication table - in the TachyonMaster database - stores a list of devices to be kept and their duplicates for the instance. The ToBeDeleted column is either 0 (false) for a device to be kept, and 1 (true) for a duplicate to be deleted. Kept and duplicate devices are associated by the Hash column.
With all arguments set to false you can review whether the specified DeviceIdentifier is able to uniquely identify devices and genuine duplicates. Once you are sure the DeviceIdentifer is suitable, you can then change other arguments from false to true according to your needs.
When the DeleteDevices argument is set to true, the Device table is updated for each device where older duplicates of it have been identified. Each device will have a timestamp set for LastDeduplicationUtc column, as well as an Experience event logged. The timestamp is updated when one or more new duplicates are found. There is also a DeduplicationCount column which represents the number of complete Device Deduplication processes where a device has been found to have duplicates. These values will only be set on the device identified as being the most recent device that will be kept, not it s duplicates.
Devices with a LastDeduplicationUtc value can be found using the below SQL query.
Genuine devices that have had duplicates identified
SELECT * FROM [TachyonMaster].[dbo].[Device] WHERE LastDeduplicationUtc IS NOT NULL and DeduplicationCount IS NOT NULL
The Experience event can be viewed within Experience Analytics (Devices → {Select a device} → Logs).
The details of the log will show what attributes were used when it was identified as having duplicates, if multiple passes were specified they will be delimited with a '#' in the order of execution. The IdentifyingHash value can be used to query the DeviceDeduplication table Hash column, in the TachyonMaster database.
This will provide all device details of the device itself as well as all of it's duplicate devices.