On this page:

Thinking Relationally

It is important that when writing SCALE we think Relationally, the on demand training talks about why this is important and here we are going to carry out some exercises to practice the theory discussed there. As a brief reminder, thinking relationally vs procedurally:

• We are accustomed to thinking procedurally
• Procedural approaches are often inefficient and slow
• Thinking relationally requires you to focus on ‘sets’ of data, instead of individual data items

When we think and develop relationally then the query engine is able to hand off the required processing the query optimizer can then multi thread our queries, saving time and processing power.

Basic Relational Operators

The most basic relational operators are Project and Restrict, and we are using the proper names, as if you look at the implementation in SQL you would see things like Select and Where, and if you actually look at SQL its an implementation of the Relational Model, it is not the Relational Model, and this confuses people or they just didn’t know they are not the same thing.

Project

You already did this, in Exercise 2 steps 4-6, when you did the Select from @IPAddresses did you realise this was a Project operation?

Restrict

You already did this, in Exercise 2 step 8, when you did the Where IPVersion = "IPv6" again did you realise this was a Restrict operation?

If you didn't remember that those were examples of Project and Restrict then perhaps consider re taking the on demand module on SCALE - Introduction to the Relational Model

JOIN

Here we will look at Joins

It is possible to do a Select * from table1, table2 which is in effect a join, but this is inefficient as we then return what is called the Cartesian product of all the rows, you could then restrict the results using a where to filter the results you really want, but this is a really inefficient way of returning records.

The Cartesian Product of two (or more) tables is just a new table. If table A has 12 rows and table B has 6 rows, then the Cartesian Product of Table A and B has 72 rows. You can see that this can grow quite quickly.

Exercise: If table A and table B both had 1,000 rows each, what would be the Cartesian Product?. What if you added a table C with 1,000 rows as well?



1ETRNW73
  1. If TIMS is already open, close it and relaunch it from the desktop
  2. In the Instruction window, run the following query
  3. @Connections = Network.GetConnections();
  4. Note the output we have created a temp table with the output data. We can now manage that data using SELECT statements.
  5. Beneath the existing code, input the following and click Run
  6. @Processes = OperatingSystem.GetProcesses();
  7. Note the output is different but we now have information in the @Processes table, and we can carry out operations on these two tables
  8. Beneath the existing code, input the following and Click Run
  9. SELECT c.ProcessID as "Con ProcessID", c.ProcessName as "Con ProcessName", c.RemotePort as "Con Remote Port", p.Executable as "Proc Executable", p.ProcessID as "Proc ProcessID", p.startuptime as "Start up Time" FROM @Processes p
    JOIN @Connections c
    ON c.ProcessID = p.ProcessID;
    The additional lines have:
    Selected spectific data from both the @Processes and the @Connections tables, the tables are aliased as p and c respectively to make it simpler to refer to them.
    We have given each column an as and a name so we can see in the output where the data was retrieved from.
    And we have joined on the ProcessID in each table
    It would have been possible to do a Select * from @Processes, @Connections which is in effect a join, but this is inefficient as we then return what is called the Cartessian product of all the rows, and could have then restricted the results using a where to filter the results really wanted, but that’s a really inefficient way of returning records.

LEFT JOIN

Left Join looks at the query and says ok if I don’t get a match, then I will return the null records

We will now look at this

1ETRNW73
  1. In the existing code block amend the JOIN to be as shown (add LEFT to the JOIN command) and Click Run
  2. LEFT JOIN @Connections c
    Additional rows are returned, there are a number of rows where the process ID is null, and therefore were not returned before, it may be that these additional records highlight an issue with a slow running machine and without the left join we would not have seen this in the original query

Sub Queries

It is also possible to do multi table queries another way and that is via a sub query. These are very versatile and useful and should be used in SCALE where they are appropriate.  They are a way to bolt together a set of independent data into a set of columns


1ETRNW73
  1. Clear the TIMs window and enter the following code and Click Run
  2. @Connections = Network.GetConnections();
    @Processes = OperatingSystem.GetProcesses();
    SELECT *, (Select Startuptime from @Processes p where p.ProcessiD = c.ProcessID) Starttime from @Connections c;

    The example shows selecting all columns in the outer table, Connections. Aliased to c, and this has been done to add the StartupTime from the Processes table, aliased as p to make it clear that the variables are in different tables.
    And we then use the p.ProcessID to match the c.ProcessID on the outer table.

    Corelated subqueries and Joins are very closely related concepts, and sometimes one is better than the other, but we want you to know that both exist. Many people learn elements of SQL by seeing what others do and don’t get any deeper knowledge.

Aggregation

Aggregation allows us to group rows by some criteria


1ETRNW73
  1. Clear the TIMs window and enter the following code and Click Run
  2. @Connections = Network.GetConnections();
    Select RemoteAddress, Count(*) from @Connections
    GROUP BY RemoteAddress;

    We used the function Count(*), it is also possible to use a column in the Count function, this returns the number of rows that match the aggregation.

    At the end of the query we use Group By that is our aggregation clause, grouping by LocalAddress, the aggregation will add up the number of rows and the count becomes a new column derived from the results returned

    This was a project operation. The data was selected from the table, there was no where restriction or taking a subset of the data, the count was an aggregation

  3. Change the select statement to read
  4. Select RemoteAddress, Count(RemoteAddress) from @Connections
    We used the function Count(RemoteAddress), this was an example of using a column in the Count function, again this returns the number of rows that match the aggregation.


    Aggregation doesn’t use a where clause for the restrict clause, and it has its own, rather than where is that you can’t overload a where clause, you can have a where already and then add a group by and Having clauses.

    The having clause can thought of as another kind of where clause if you like.

  5. Add the following line beneath the GROUP BY line (remove the semi colon on this line)
  6. HAVING count(RemoteAddress) >2;
  7. We have chosen to only return rows where the count is greater than 2.

IN

Used to test if an expression matches any value in a list of values. It is used to help reduce the need for multiple OR conditions in a SELECT, INSERT, UPDATE, or DELETE statement.

1ETRNW73

  1. Clear the TIMs window and enter the following code and Click Run
  2. @Processes = OperatingSystem.GetProcesses();
    SELECT * FROM @Processes 
    WHERE Executable IN ("chrome.exe", "csrss.exe", "System" );
    This returns only rows where the executable name matches the value in the list, it is also possible to combine this with a NOT and the results would be those records which did not match an entry in the list

EXISTS


The EXISTS operator is used to test for the existence of any record in a subquery.  The EXISTS operator returns true if the subquery returns one or more records.

1ETRNW73

  1. Clear the TIMs window and enter the following code and Click Run
  2. @Processes = OperatingSystem.GetProcesses();
    SELECT * FROM @Processes 
    WHERE EXISTS (SELECT Executable FROM @Processes WHERE Executable = "chrome.exe");
    Did you expect only rows that matched the EXISTS criterion? Exists evaluates and if a match is found returns true, and then the main query will run. So all rows in the table will be returned if Exists is true.

Common Table Expressions

A Common Table Expression (CTE) is the result set of a query which exists temporarily and for use only within the context of a larger query. Much like a derived table, the result of a CTE is not stored and exists only for the duration of the query. 

1ETRNW73

  1. Clear the TIMs window and enter the following code
  2. SELECT x FROM
    (
       WITH ten(x) AS ( 
    This was the beginning of a Recursive CTE, and we began with a table called Ten, and a column in the table called x
  3. Enter the following lines of code beneath that just entered
  4. SELECT 1 
         UNION ALL 
              SELECT x+1 FROM ten WHERE x < 10
          )

    These work by having an anchor, which is a select, and this is the termination of the recursion, and a Union all with the body

    select x+1 from ten, which is the thing we are making, where x < 10

  5. Enter the following lines of code beneath that just entered, and click run
  6. SELECT x FROM ten
    );
    When we do the Select x from 10 is we end up with 10 rows

    And CTE is the closest to views that are available in SQLite, as you could use the theory behind the 10x to create something rather complicated and then perform actions on it, this doesn’t have to be recursive, you could leave the UNION all out and replace the body with another query.

    The point is that when you subsequently query its as if it were a table, whose body is some arbitrary complicated piece of SQL, this used to be done using a view, but that isn’t available in SQLite

Lab Summary

In this lab, we learned some more basic and not so basic functions that can be used in TIMS. And started to think Relationally, this may have been how you already thought when coding but perhaps didn't realise it.