querying subnets03 Mar 2007 It's been a busy week, too busy to play around with HPPC code.
I've been reading and rereading the Feldmann paper that first proposes a FIS tree and trying to make sense of their space/time complexities. I got no further...
Anyway, I might not have been able to play with code, but I had a couple of remarks and ideas about it.
First, I use a list of numbers stored with each treenode: the rule id's of all rules that match the referenced node.
But there's a problem with it. Rules change and so do the numbers...
If I insert a rule at the top of a chain, it will be rule 1 and all other rules will shift 1 position. To keep the rules in sync with the tree, I'd have to update all numbers in that tree which would be enormously inefficient.
So, instead of using numbers, I know use a list of void pointers. Maybe they will point to the original rule, or to another structure that contains the rule ID.
Another idea I had originated from my job. I maintain a firewall with 9000 rules (all listed in a bash script... no my idea)
From time to time, a network administrator will ask to add or remove a rule, list all rules that apply to his network, or claim that the firewall blocks certain traffic.
Debugging a firewall that size is madness. Especially because the rules are not really structured and lots of filtering is done.
I had the idea to create a tool that filters all rules for a certain subnet, from the ruleset. Listing all rules that apply to a certain subnet would not be a problem then. Debugging a network problem could be done in the same way by looking which rules apply to a certain packet traversing the firewall. And adding/removing rules... well I've wanted to restructure the entire firewall anyway. A tool to do structured queries would be a nice thing to have when doing that (verifying that the new firewall does the same filtering as the old)
Querying a single IP address is easy when you have a segment tree. Just descend down the matching nodes of the tree and add all matching rule numbers to a list.
Querying an entire subnet is less easy. Matching a /17 network for example would be the same as matching 32000 IP's. Of course it would be stupid to do it like this while we can abuse the treestructure.
(The picture shows the dataset under test and the query was for 18.104.22.168/16)
There are 2 cases to consider: the subnet we're looking for has an exact match in the tree, or it doesn't.
If it has an exact match, then we can just descend the matching nodes of the tree and add the relevant rules untill we reach the subnet-node (just as we would do for an IP). And because everything under the subnet-node is by definition a part of the subnet we are looking for, we can add ALL rules under that subnet node.
If there is no exact match (like in the picture, the queried subnet would be located where the red arrow points), we need to look for the subtree under the last matching node, attached to the correct branch. The treenode we find there will be a part of the subnet we are querying for and we can add all it's rule numbers.
While implementing this I ran into a bug. In my dataset 127.0.0.0/8 and 10.0.0.0/8 were added to the tree under 0.0.0.0/0. This results in a redundant node 0.0.0.0/1 at which point both 127/8 (01111111) and 10/8 (00001010) can be added in different branches. My code added 127/8 under branch 0 and 10/8 under branch 1: the exact opposite of what it should have been.
I tracked the bug down to a piece of the code looking like this:
branch_a = ...;
branch_a = ...;
Just to show how much a typo can screw things up :)