Critical Incident Management: 4/8/12

It is not unusual for an organization to simply purchase and install technology without consideration of its active role in the rest of the IT infrastructure. The end result is often a system with greatly limited technical functionality requiring significantly greater effort to integrate with other systems or major process changes to compensate.

There are numerous ways that a VM system can provide data to maximize the benefits to security and operations. Among those systems to be considered are change and incident management. These systems are commonly found in mid-size to large organizations trying to maturely and consistently extract higher performance from IT services.

1 Change Management

Change management is a critical component of the remediation process. As previously discussed, when a vulnerability is found, the details can be sent automatically to a change management system to initiate a change process.

The ability of the VM system to be interfaced with change management is never as straightforward as vendors tend to suggest. Some custom development is almost always needed. Development of this type is typically for the conversion of data types, format, and communication method. The vendor’s product may deliver SNMP traps for newly discovered vulnerabilities; yet the change system will require XML or use an e-mail listener process. Someone will inevitably have to code an interface between these two completely different technologies. The following common data elements are exchanged, whatever the interface method:

vulnerability details sent to change system,
vulnerability event identifier sent to the change system,
change status update sent to vulnerability system once remediation is complete, and
reopening or re-creating the change if the vulnerability is still present.

2 Incident Management

Similar to interfacing to change management, incident management may be a part of the portfolio of an operational support system in your organization. Interfacing issues are similar and will vary by process. In some cases, organizations prefer to handle changes in a change system but incidents are reserved for a very specific set of circumstances.

On the other hand, it is not unusual to generate an incident for tracking the vulnerability and remediation process, and then to use the change management system to track only the impact on systems, resources, and processes when that change is made.

In either case and as previously mentioned, the completion of a change will initiate either the closure of an incident or the immediate notification of completion to the VM system. The VM system will then reassess the target to determine successful compliance.

3 Intrusion Prevention

Some vendors have attempted to integrate their products with IPSs only to find that IPS vendors have dreams of competing against the VM vendor. This has led to a few failed attempts by vendors that would have otherwise benefitted the customer greatly. If you are lucky enough to have an IPS that is compatible with vulnerability data from a selected vendor, then by all means have a hard look at the benefits, as there are many obstacles.

Standards of format compatibility are a significant obstacle. Although we discuss standards at length in this book, few vendors fully support them in the IPS world, or the vulnerability world, for that matter. At this time, the two industries are so far apart in interoperability that only a demanding customer base will be able to influence change. However, the basic idea is that if a new vulnerability is discovered on a target system, then the appropriate upstream IPS will be notified to activate the signature that would protect the asset until it is properly remediated.

There are two major benefits to this type of integration. First, a vulnerability is protected until full remediation can be completed, which lowers the overall dynamic vulnerability level in the environment. Second, the IPS optimizes its performance since only the necessary rules are activated above the standard policy implementation. This is particularly important when a very expensive IPS is heavily loaded on a busy DMZ segment and a hardware upgrade does not offer sufficient cost–benefit.

4 SEIM

Security event and incident management (SEIM) integration is generally easier to accomplish. SEIM vendors make compatibility with myriad data sources an important selling point. The collection of data is their strongest suit and it is very likely that they will accept data from VM systems with little modification.

If your organization has an SEIM program, you would be remiss in not accepting this important data feed. Where the IPS integration is not possible, the SEIM program can at least use the data to determine the severity of an incident and escalate it accordingly. If your vendor does not easily support one of the major vendors of VM products, then they have likely chosen poorly.

Basic architecture of active scanning and the considerations to be made concerning consumption of bandwidth. One other key consideration is the management of several hardware devices. While this may seem trivial to an organization with hundreds or even thousands of servers, usually the staff maintaining the VM system is limited and requires unique training. They rely on resources of other locations and departments to maintain devices with which they are generally unfamiliar. Many of these devices have command lines, serial ports, and network requirements that may not be fully understood. Although many of the administrative responsibilities can be centralized and automated, there are inevitably malfunctions in the device or the environment that need to be corrected on-site.

For example, network connectivity can be lost at some point between the management server and the device itself. Despite all of the available tools, it may not be possible to determine the cause of the failure. A local engineer will have to check the physical connections, switch configuration, device power, and logical network configuration. This may involve plugging in a serial cable, configuring a terminal, logging in with local administrator privileges, and performing command-line functions. In all likelihood, the engineer has not done this in the six or eight months since the device was deployed. The VM operator will have to provide written or verbal instructions and receive some feedback. The use of a network KVM (keyboard video mouse) device is helpful but not perfect. The physical environment may still require inspection. If the network connection to the site is lost, then little can be done remotely.

Additionally, replacement of devices that have failed may be difficult but not for want of technical expertise. Some countries have import duties and restrictions on technologies that can extend the replacement cycle for months. Certain locations seem particularly unfriendly to commerce, particularly where technology is concerned. Russia, Venezuela, and even Mexico can be very resistant to receiving technology, to the detriment of their own citizens. It is even possible that final delivery in some locations may call for a small bribe to the delivery person. Ultimately, a virtual machine version of a product, if available, can be sent electronically and made operational overnight.

With an understanding of these issues, your plan will have to carefully consider the number of devices, location of each, skills of local personnel, languages, reliability of network, available bandwidth, power requirements, environmental conditions, customs procedures, and vendor presence and inventory. There are some clever ways to accommodate deficiencies in many of these components. Depending on your specific challenges, one or more of these strategies will help.

1 Determining the Number of Devices

Determine the number of devices required and include a growth factor. Volume is simply more difficult to manage. Determine the number of networks, strength of network connections, and number of targets. From this information, an estimate can be made for total load for scanning:

Then, estimate the amount of time required to scan a set of those targets with the candidate vendor’s product. A test of a scan under specific conditions provides the average time per host. Extensive testing is recommended since every environment is unique and will respond differently to the various approaches of vendors.

For example, a company has several networks in separate physical locations, as shown in Table 1. This table shows the amount of bandwidth provisioned, used, and available. In order to get an accurate estimate of the amount of time required to audit a network, there are two tests performed with these sites to determine the average amount of time required to scan a host. Some simple math gets a close estimate of how much time is overhead for gathering and transmitting results to the reporting server. Two tests are required with the largest practical target sample size. The key is to make sure the difference between the two samples is substantial enough to meaningfully calculate the difference, indicating the time required to scan a target. Following is the overall process for determining the number of required scanners:

Review the networks and select a representative sample of sites, WAN connection types, bandwidths, and host types. The type of WAN connection and bandwidth will impact the response time of scan activities and impact the total time on a large scale. Connections such as frame relay will show longer response time than higher speed Asynchronous Transfer Mode circuits (ATM) or dedicated private lines. Bandwidth will have an impact as well, but only up to a limit. Scanning activity has limits in device hardware and protocol connection limits.
Select a sample size (S1) of targets to scan in the representative networks. This sample size should be at least 10 percent of the total number of hosts but not less than 10. A second sample size (S2) should also be taken that is larger than the first by at least 50 percent and not less than 20. This will assure that the sample size is sufficient to show a meaningful difference in the results.
Capture measures to complete the table in Table 1. In the table, calculate the time required to scan each target (ST) by subtracting the time to scan the first sample (T1) from the time to scan the second sample (T2), and dividing by the difference in the sample number of hosts. This is the total time required to scan a single host (TT) absent the overhead for gathering, formatting, and transmitting results.
Calculate the amount of time required for the scanning overhead (OH) mentioned earlier. The overhead for the larger sample is the amount of time required to audit S2 minus time to scan all of S2 targets: S2 – (S2 – hosts * ST). The point is that there is a significant difference in the amount of time to scan targets versus the time to complete all of the activities of the audit.
Extrapolate and estimate the time to scan the entire network by multiplying the number of total hosts in the network by TT and add overhead: X = TT * OH. The final column in Table 5.1 shows this number. The sum of these figures shows the number of hours required if a single scanner were to audit all of the networks from a single location one after another. Although this scenario almost never happens with so broad a geographical distribution, it does provide a feel for the amount of device time required.
Decide on how often an audit of every target in the organization is needed. This will indicate the amount of time you have to allocate a single scanner for scanning targets and ultimately lead to the total number of scanners needed. A typical network should have audits performed at least weekly to be sure that newly discovered or reported vulnerabilities are captured on reports and that remediation activities are taking place in a timely fashion.
Create a proposed schedule for auditing. Since many targets will have to be in use during a scan, a schedule will have to be created with the information from the previous step. With a schedule, you can determine how many targets can be audited by a single scanner over a day or week. Note that you will have to consider local time, office customs, and target availability when making this schedule. Figure 1 shows a sample schedule. Looking at each time slot for audits will show how many audits must take place at a given time. This sample has four time slots per day each spanning a six-hour period on a Greenwich Mean Time (GMT) clock.

Figure 1: Audit schedule GMT chart.
Get the vendor’s recommendation on how many simultaneous audits can be performed by one device and how much impact that will have on scan performance. For example, a scan of network A may take 45 minutes when performed alone. But, when performed with another simultaneous scan of a network of equal specification, the scan may take 30 percent longer on average, assuming none of the network devices in between introduces more delay.
Estimate the number of devices needed. Referring again to Table 1, you can assess how many scanners are required by counting the recommended number of simultaneous audits to be performed and dividing by the number recommended. In some cases, you can reduce the number of scanners by rearranging the audit schedule. N.B.: Always leave room for error and growth. This arrangement shows us that if an audit of networks is performed from a central location, it is probably manageable with a single device, provided the device can conduct two audits simultaneously without significant performance degradation.

Table 1: Scan Time Estimation Chart
					AUDIT TIME
LOCATION	TARGETS	BWDTH	UTIL	AVAIL	10 TARGETS	20 TARGETS	TIME PER TARGET	ESTIMATE 20 TARGETS
HQ	450	N/A	N/A	N/A	9	11.5	0.25	119.0
Dallas	260	2000	72	560	13	15	0.2	63.0
Hong Kong	241	4000	45	2200	12	16	0.4	104.4
Santiago	75	384	80	76.8	11	15	0.4	37.0
Mexico City	245	4000	52	1920	8	13	0.5	125.5
Chicago	325	4000	62	1520	11	14.5	0.35	121.3
Atlanta	175	1544	70	463.2	13	16	0.3	62.5
San Francisco	310	4000	55	1800	9	12	0.3	99.0
						Total Hours Required		12.2

If it were determined that, due to operational constraints, the audit of “Chicago-Mfg” could only be performed on Wednesdays, then there may be contention for a time slot. Negotiating compromises among the operational requirements of each site can resolve such issues and further minimize devices. Also note that server networks can often be audited on weekends and late evenings, providing some relief to audit user workstation networks during the daytime.

Every network is unique and results may vary. Testing is an effective tool but not perfect in predicting the idiosyncrasies of networks and systems. Leaving some flexibility in requirements for adaptation is essential. Some advance planning and testing can save a lot of money when making a purchase. It will also enhance the credibility of the system and its operators in the eyes of the user population. Once the scanners are deployed, any top-end VM system should provide a means to report on scanner resource utilization, so the system manager can reallocate in a changing environment.

2 Select Locations Carefully

The location selection should be based on:

the number of targets to be scanned locally,
the bandwidth available to other adjacent networks,
the skill availability and skill level of the support staff for the location, and
the regulatory restrictions on IT issues such as privacy and union work rules.

Naturally, other factors such as shipping costs, taxes, and other important considerations should be made. Also, basic deployment logistics such as power supplies, rack space, and network physical layout should be considered but are typically minor factors.

Critical Incident Management

System Integration | Architecture

1 Change Management

2 Incident Management

3 Intrusion Prevention

4 SEIM

Active Scanning Architecture

1 Determining the Number of Devices

2 Select Locations Carefully

Popular Posts

Search This Blog

Blog Archive

Total Pageviews