Skip to main content

1E 23.11 (SaaS)

FileSystem.FindFileByName

Method

FindFileByName

Module

FileSystem

Library

Core

Action

Finds a file by name.

Parameters

FileName (string): The name of the file to be found. The name can include wildcard characters ? and *.

Warning

Using a wildcard such as FileSystem.FindFileByName(FileName:"*", Fast:true) is never a good idea as this will cause all files to be returned and could have negative an effect on system performance. Wherever possible use specific wildcards to identify the files needed.

Note

Case insensitive wildcards are supported only on Windows.

Fast (boolean; optional, default true): chooses between the fast Windows NTFS search technology or the previous slower (but fully reliable) technology

Note

Ignored for non-NTFS filesystems.

Unix Fast=false and NTFS Fast=true are similar order timescales (unless the Unix method comes across a mounted windows share), NTFS Fast=false is significantly slower.

Recurse (boolean; optional, default true): Recurse into sub-folders. If this is set to false and no folder is specified, only the root folder of each drive/mount-point will be searched. If this is set to false and a folder is specified, only that folder on each drive/mount-point will be searched (if there is such a folder on that drive/mount-point).

Note

Turning off recursion will force use of safe (non-fast) search mode, even if fast mode is requested. However, since folders are then not recursed into, a non-recursive search ought in most cases to complete quite quickly.

Folder (string; optional, defaulting to the root folder): The folder to search in. The folder must be a 'root relative' path e.g. \\windows\\system32 or /var/log. The search will look in such folders on each drive (Windows) or mount-point (Unix). The folder path may include . and .. parts, which will be normalised into the equivalent canonical path e.g. \\windows\\system\\..\\system32 will search in \\windows\\system32.

Note

Specifying a folder will force use of slow search mode, even if fast mode is requested.

TimeoutSecs (integer; optional, default 600): If the operation has not completed within the specified time period, it will fail with an error.

Exclude (Table, optional): A single column table where each row contains a root relative directory path that you want to exclude from recursive search.

Note

On Windows, excluded paths can contain drive letters on Windows (e.g. C:\\), to skip scanning whole volumes.

Ensure the exclude paths are complete paths starting from root.

Return values

For each file found:

  • FileName (string): The full path.

  • Size (int): The size in bytes.

  • Hash (string): The SHA-256 hash of the contents of the file.

Old unicode filenames and v5.1+

Disks that contain files that could have been created over two decades ago could well have filenames that were valid UCS-2 characters (in the range 0xd800 → 0xdfff), but that are not actually UTF-16 surrogate pairs, or just plain encoding may incorrectly exist on the file system quite happily, as NTFS allows them to still exist.

However, when these filenames are converted to UTF-8, which is the preferred internal representation in the 1E Client, then they fail to translate exactly, so it is possible to lose the original reference to the exact file name. This translation problem will cause a warning like this to be issued in the client logs:

2020-03-30 14:50:37,048 WARN - Invalid UTF-16 sequence found starting at offset 24 just after this UTF-8 version [C:\dev\research\ucs2\bad??]

Because the logging is also in UTF-8 then the 1E Client cannot write the UTF-16 character into the log, so the post conversion UTF-8 characters are presented and as in this case, the incorrect unicode values have been replaced with question marks because they are not UTF-16 so are not possible to convert to UTF-8; this just indicates where the issue is in the string.

This is fundamental and applies to any string conversions in general, not just filenames specifically, but this method (and similar ones) is likely to find badly formed UTF-16 on the aged disk file system.

When this occurs in this method, then the result set will contain a row like this:

FileName

Size

Hash

c:\dodgy\unic?de.t?t

4294967295

cannot-initialise-hashing

This is because the filename cannot actually be found once it has been mutated into UTF-8 with missing character markers.

The likelihood of encountering file names like this is very low, they have to be constructed by a scripting language, and are unlikely to work with most modern programs anyway.

For further context, Windows started using UCS-2 from Windows NT 3.0 (1993) onwards and switched to UTF-16 from Windows 2000 (NT 5.0) but obviously any UCS-2 filenames had to remain usable. However this is ancient history.

Example

 FileSystem.FindFileByName(FileName:"ips.txt", Fast:true);
@exclude =
SELECT
        column1 AS Path
FROM
        ( VALUES
                ("C:\\"),
                ("E:\\IgnoredFolder")
        );

FileSystem.FindFileByName(FileName:"ips.txt", Exclude:@exclude);

Platforms

  • Windows

  • Linux

  • MacOS

Notes

This will search all fixed disks which is an expensive process and may take some time. Where possible, use a specific folder search and/or disable recursion to speed up the search.

On Non-Windows, symbolically linked files will be ignored.