Posted on by sunilsavanur | Leave a comment

Use Software Reliability Growth Model (SRGM) for Residual Defects Estimation

Summary:

Often it is observed that testing (and hence defect fixing) cycle gets elongated during product development. Manager is asked questions like, “How long will testing go on”? “How many defects are still remaining in software”? “How do we predict reliability of current software”? At such time, it is better to use statistical methods to take informed decisions, rather than guessing with past experience. The method described here is based on Non-Homogeneous Poisson Process (NHPP) model and Confidence Interval (CI). It uses historical sample test data to predict how many residual defects are there in the software system and how many days are needed to achieve at least 95% confidence level.

Keyword:

Software Reliability Growth Model, SRGM, Residual Defects, Test Cycle, NHPP, Confidence interval

Description:

Software Reliability Growth Model (SRGM) attempts to correlate defect detection data with estimated residual defects and time. By knowing residual defects, informed decisions can be taken about code release. For such analysis, we need breakup of actual test-cases executed and actual defects found preferably day-wise till date.

Here the cumulative defect count is plotted against time, then we get concave curve. This curve should be flattening out as time progresses. This is the sign of growing reliability of software. In case curve in not flattening out, or making zigzag shape, it is obvious, software is not at all reliable, I mean it is nowhere near its reliability. Goel-Okumoto (G-O) model curve is concave shape.

The other type of defect prediction model is based on “Defect Density” and “Software Size”. It is described in another article by me. However in this article, let us focus on reliability model based on Goel-Okumoto and Confidence interval (CI). This reliability model is based on parameters such as Cumulative Defects, Amount of Testing (in days) and Statistical Techniques (G-O and CI).

The amount of testing done can be measured in calendar time as well as number of tests run. In this example let us use calendar time, number of days as a parameter.

Goel Okumoto Residual Defects

Goel Okumoto Residual Defects

Following steps will help arrive at finding residual defects

Step #1: Prepare data as per table shown in Software Reliability Growth Model. Ensure cumulative defects and rate of change are computed. Note the values of total cumulative defects (a), test case efficiency or rate of defect detection (b) and current time (t).

Software Reliability Growth Model

Software Reliability Growth Model

In above example, we have used current predicted defects, 825 as parameter “a” and test case efficiency of 0.27 as parameter “b”. We want to predict defects on 25th day. Using below formulas for µ(t) and CI Res, we can have predicted 880 as overall defects with 95% confidence level and would need another 10 days of testing.

Step #2: Compute µ(t), the predicted defects till date using Goel-Okumoto model

Use formula µ(t) = a (1 – e(-bt)),  a> 0, b > 0, to get predicted defects till date, where

µ(t) = current predicted defects, using Goel – Okumoto Model.

a = initially a is taken as total defect detected till date,

b = the rate at which defect-rate decreases, i.e. rate at which defects are detected.

Confidence Standard Distribution

Confidence Standard Distribution

Step #3: Compute CI Res , to get estimated defects with 95% confidence interval

Let us use 95% confidence level to predict defects given by CIRes=µ(t)+Zα/2*SQRT(µ(t)). The letter α represents total shaded area and α/2 represents half of the normal distribution. So confidence interval for 95%, represented by Zα/2 is equal to 1.96.

Confidence Interval (CI) gives estimated range of values from given set of sample data. The confidence level is the probability associated with a confidence interval, which is expressed as percentage (%). Usually, the confidence interval is symmetrically placed around the mean, thereby normally distributed estimation. Since we are estimating residual defects, wider confidence interval, gives more probability of our prediction being in the right range. Hence 95% confidence level is the most frequent and reasonable value than 99% or 90%.

Confidence Interval Probability

Confidence Interval Probability

Step #4: Compute additional days required to find such defect.

To predict additional days required to find defects = LN (1- µ(t)/a)/b, where, LN = is natural logarithm. All other parameters are same as above.

Conclusion: This article considers origin of defects is within software boundary and hence it is not influenced by hidden failures due to external environmental factors. In such cases, reliability model needs to be adapted to factor in external influences. This article should be used to estimate residual defects or validate reliability models. Do contact me in case of additional information or working speardsheet.

Posted in Project Management | Tagged , | 9 Comments

AGILE metrics: Use minimal and meaningful ones

Summary:

There are many organization esp. with dedicated Quality Assurance department recommending range of AGILE metrics to be tracked. Just by looking at the list, one gets confused as to where to start or what is minimal metrics required. This is true for beginners as well as proficient users of AGILE. So here is my list of recommended metrics for AGILE projects.

Keywords:

AGILE, Metrics, Sprints, Burn-down chart, Sprint backlogs

Description:

Sprint burn down chart: Use it to compare remaining efforts to deliver the work with remaining effort-capacity of team for a given sprint. In case chart indicates, when extrapolated, that it is not possible to complete the work then consider de-scoping work items or apply time-boxing. This should be similar to effort-deviation analysis done in traditional project execution models.

Agile Metrics Burn Down Chart

Sprint Backlog: This is measure of progress across sprints, whether team is delivering work items after each sprint as planned. Obviously, team should aim at minimizing backlog. An increasing trend of backlog should be a cause of concern for project manager. It may happen that backlog grows so much that additional sprint is required to address it.

Agile Metrics Sprint Backlog

Velocity: It is a measurement of how much a team can deliver in each sprint. It is also useful in finding the loading of the team members. Velocity of the team is impacted by  scope-change, team member change, tools change etc.

Agile Velocity

Defect Backlog: It is measurement of how much each sprint is contained, with respect to this planned contents. Defects can be thought as functionality spill over to next sprints.

Agile Defect Backlog

Co-relation of metrics

Now let us understand how these are related to each other. For example, higher defect backlog of current sprint would lead to lower delivered functionality, which increases sprint backlog and lowers velocity. Similarly normal burn down chart with increasing sprint backlog, should be a contradiction in itself. Drop in velocity of team should impact burn down chart and result in increase in sprint backlog. For healthy project, defect backlog should have decreasing numbers sprint after sprint. Constant or increasing defect backlog with no sprint backlog should be considered as contradiction. Thus project authority should be able to understand such dynamics within metrics and question the project status based metrics relationship.

As a next step, rules can be set based on organization norms to make tracking of AGILE project more quantitative.

Posted in Project Management | Tagged | Leave a comment

Test Estimation: Quantitative Approach

Summary

Test estimation and test planning is most neglected area during project planning. People usually put testing efforts as 30% of overall project efforts or development efforts. Even testing team is counted as 30% of development team. This is so high level that test tracking becomes meaningless.  Let us try to go beyond these high level estimates. Let us use some statistical methods to arrive at Test-Cases to be written, defects to be found and number of test iterations required. We are targeting Java, C++, C# based development projects.

Keywords: Test planning, defect density, test estimation

Description

The main concept is to use defect density and test cases efficiency to get quantitative data of test estimation. Here the driving factor is defect-density per unit-size. Some organizations measure density in per KLOC of code (generally in C language world). Other may use defect density per Function-Point or defect density per UCP. We will use defect density per UCP. As per CMMi norms the defect density comes to 2.7 defect-per-UCP or 1.8 defect-per-FP or 0.02 defect-per-LOC (20 defects-per-KLOC). Given the density, we can arrive at total defects to be captured during project execution. Simply multiply UCP * 2.7 defects/UCP = total number of defects.

After getting total number of defects (still it is called estimated defects), let us get estimated test case count. How does one get this? let us see another metric called Test-Case Efficiency, which is normally 40%. But it can vary from company to company. The test case efficiency tells us, how efficient the test cases are in finding defects. When we say 40% efficient, it means out of 100 test cases 40 test cases are capable of finding defects in 1st round of QA cycle. This gives us total number of test cases required, by dividing total number of defects by test case efficiency. Using productivity of test cases development and test case execution, one can arrive at test efforts. Here is sample guideline.

Test Estimation

The last one in planning is, how many QA cycles would be required? Let us start again with total number of defects. Assuming all other parameters are controlled, 20% defects will be re-opened and newly introduced. This will give us count for 2nd QA cycle. Like wise iterative count will give total cycles required for planning phase.

Though, this methods assumes CMMi metrics and quality level, it can act a guideline for team. I would encourage people to use it as target to start with. The main benefit is granular tracking of testing phase.

Posted in Project Management | Tagged | Leave a comment

Software Architecture: In Product Lines Way

Summary:

Most organizations produce families of similar products or systems, differentiated by features. For example mobile phone vendor, Nokia, had produced 25 products per year (up to 4 years) which were sold in many countries with multiple language support, with multiple protocols and varying in size, features. Apple is another example, where iPOD and iPhone had many reusable components. It has to be business and technical strategy “to employ reuse” of components, in hardware as well as in software.

As a result of reuse, there will be improved efficiency and productivity of teams, which directly translates into quick and sustainable revenue over the period. So, there comes the concept of “Software Product Lines” (SPL) where software product is built with strategic reuse of components and assets. The reusable components are built once and “maintained” over long period, while varying features are “developed” specific to a product. In other words, the products are built on the top of core platform (Components). The potential reuse can happen in architecture, design, development, testing and also in people. In this article, we will focus on re-use in software architecture.

Keywords: Software Architecture, Software Product Lines, SPL

Description

The Software Product Lines concept focuses on “plan for change” (what will change from product to product) and “separating reusable” (what will remain core components). In other words, there should be clear demarcation in commonality versus variable parts of products. Software architecture should be able to define concrete place where variation can be inserted. The unit of variation can be thought of as “Features” of product. A functional view of SPL can be described as per diagram below. Products will have common platform and can strategically choose parts.

SPL Functional View

Let us look at steps required to identify features which will vary, basically the variant. One technique could be to ask ourselves whether a feature needs to be inherited. Or does it need to plugged-in through interface? Or does it need to be parameterized? The answer will decide which of object-oriented design applies and also decide applicable design pattern (from GoF). Once variation is identified, create abstract of variation.

For reusable components, as they say each reusable component must be usable first. A platform of reusable component should be able to adapt to different product needs. Such components, once developed, will be going into maintenance mode over which products are built.

Let us try to build software architecture for online hotel and travel booking company as an example. The company needs to have air travel product and hotel booking product. Air travel product may support different payment methods (Credit, Debit etc), may need to sell insurance and earn mileage points. These are specific features of air travel so will be called as variations. However both products need similar user authentication mechanism along with same session management techniques. These parts will go in as platform over which air travel and hotel products will be built.

Now let us move on to design patterns (from GoF) for building Platform and Features. Strategy design pattern (operational pattern) is best suited for Platform and Features implementation. As seen in below UML diagram Platform Manager will be implemented as façade, using this pattern, products can have uniform face.

To extend further on platform capabilities, session management can be implemented in multiple ways, e.g. cookie based or HTTP session management. Content management (CMS) server can be implemented as Drupal or Liferay as content provider. Similarly logging mechanism can be done using log4j or any other proprietary way. Multiple products can share same platform, while each product team can focus on the specific feature development.

Similarly for payment gateway, air travel product may want to support only credit-card and debit card transactions while hotel product may want to support net-banking. Such decisions can be left to product manager as they develop features. With this as guideline, an architect can implement SPL concept with team of experts. Refer SPL detailed UML.

Following are the few points to be aware of, esp. which will impact success of this concept,

  • Lack of expert in position of control and visibility during implementation
  • Reluctance of team accepting need for change
  • Lack of support from senior management
  • Failure to train actual team on basic concepts of this concept

Benefits

This approach improves time-to-market for new product development allowing product managers to focus on product features (specifics). Internal teams can be organized and aligned: one team doing maintenance of platform and other team doing development of products. This approach also leads to better build and release engineering practices as well.

I am sure that the SPL concept can be implemented using alternate frameworks and patterns. This article only gives guidance to build SPL in a language-neutral way. It is certainly worth attempting!

References

  1. http://www.sei.cmu.edu/productlines/
  2. “Software Architecture in Practice” 2nd edition – L. Bass, P. Clements, R. Kazman.
Posted in Technical | Tagged | Leave a comment

UCP Estimation Method: A Different Approach

Summary:

The Use Case Point (UCP) Estimation method for very high level requirements does not yield correct results because it considers only transactions as basis for arriving at complexity of each use case. Here the meaning of transaction is round-trip sequence of steps or tasks to be performed for a given use case. With recent industry trends such as Social connect, Mobile-device connection, Analytics and Cloud platform the product complexity has increased multifold. For a single transaction to complete, multiple technologies, domains and servers need to interact. Hence real complexity of use case is not reflected through the transactions alone. One should take into the account number of systems involved along with the transaction for a given use case. Thus high-level understanding of transactions undermines the complexity involved.

Keywords:

Use case point, Software Estimation, UCP

Description:

The suggestions given in this article are best suited when the requirements are available at very high summary level or at user-goal level. Here we are not considering the estimation where more granular use cases at sub-feature or sub-function level are available. In that situation going for standard UCP is better suited (i.e. only transaction based). This article assumes that reader has knowledge of UCP estimation method.

The standard UCP estimation method derives use case complexity based on number of transactions as shown in table 1. After summing up Simple, Average and Complex use cases, the weight factors are multiplied to arrive at Unadjusted Use Case Weight (UUCW). With this, it is observed that distribution does not follow normal-distribution (bell-curve), as Simple use-cases tend to out-number others. In practical sense it does not happen.

Table 1 : Standard UCP Complexity Table

To get more realistic estimation, it is recommended to add number of systems involved into “Complexity” decision making. Then following the guidelines given in table 2 one can arrive at more normal distribution of use case. The number of systems can be thought of like a component or sub-system with clear interface defined for exchange of data. For example, Payment Gateway, Content Management Server, Database, file server, ERP etc. A single transaction which requires data exchange with any 3 of such sub-systems, is bound to be complex in nature.

Table 2: Suggested UCP Complexity table

Here simple use case is – if any of “number of transactions” and “number of systems involved” is equal to 1. An average use case is – if any of “number of transactions” or “number of systems involved” is equal 2. Finally complex use case is – if any of “number of transactions” and “number of systems involved” is more than 3.

After this step in UCP calculation, rest of UCP estimation can follow standard process. With this, the UCP estimation would be more practical and accurate. Do try it out.

-Sunil

Posted in Project Management | Tagged | 1 Comment

Polynomial Hash Function for dictionary words

Summary:

A simple hash function for dictionary words is made up of addition of its ASCII values. However it is not good enough, as many words have same sum.  This will result into lot of collisions. So the alternative method is to use polynomial coefficient. I am assuming that readers are aware of simple hashing techniques.

Keywords: Hash, Polynomial hash function, Hash index

Description:

The main objective of the article is to find out how effective is the polynomial hash function and measure its Load Factor and Collision Ratio. Generally dictionary words are hashed by adding up their ASCII value. That is, first add their values to get cumulative sum and then get its modulo with table size. But this method causes lot of collision due to simple additions. The polynomial coefficient hash uses a formula by adding increasing power of a coefficient (prime number). Let us start by representing a word using its ASCII value, so consider a word STR = a0, a1, a2, a3,…,an-1 as an example. To compute polynomial hash index, we use a formula like 

Hash Index = a0* X0 + a1*X1 +a2 * X2 + a3 * X3 +…+an-1*Xn-1

X is the coefficient taken from prime number such as 11, 31,37, 41 etc. For our experiment below, I have taken 11.

We will make the hash index more dynamic by considering it as a function of its constituents. So add cumulative sum and word length into the formula as shown below.

Sum += ∑ ai / i= 0 to n-1 (cumulative sum)

Key = ( mod (hash index, Sum) * String Length) mod Table Size

Following is the C++ code for generating polynomial index.

Polynomial Hash Index Source

I have written a C++ code to build a hash table with hash table size as 503 to try hash 750 dictionary words. I have used load factor and collision ratio as measurements of its efficiency. So load factor (LF) = items/size, which is increasing linearly as seen in graph. The let us plot collision ratio (CR) = #collisions/items, which is a measurement of how many of total words are causing collisions.

Polynomial hash function graph

Observations:

From above graph, couple of very important observations are as follows

1) We have achieved collision ratio of 0.4 when table are 100% full (i.e. 500 words). Meaning 60% of the words could be inserted without any collision. This is fairly good distribution.

2) When the table was loaded beyond its capacity (here 1.5 times), the slope of collision ratio has not changed. Refer collision ratio between 500 to 750 words on X-axis. It is seen as holding its projected line as opposed to more collisions.

3) From graph below it is seen that words are distributed uniformly across 503 buckets. The R-squared value is 0.016, which stands for correlation coefficient. Since its value is closer to 0 than 1, we can say there is no linearity between input and hash function.

Polynomial Hash Distribution

Also the trend line is observed to be declining with growing number of inputs, indicating uniform distribution of words.

Some other hash applications includes, Compilers using hash tables to implement the symbol table, Game programs use hash tables to keep track of positions and Online spelling checkers.

-Sunil

 

Posted in Technical | Tagged | 6 Comments

Understanding BACnet Protocol

Introduction:
BACnet, or Building Automation and Control net, is an Ethernet based, ASHRAE (American Society for Heating, Refrigeration and Air-Conditioning Engineers) standard. It is a modern, complex and powerful communications protocol. Some open source implementations are available to support BACnet over RS-232,
Arcnet, Ethernet TCP/IP or just Ethernet.
This protocol defines data communication services for computer equipments which are used for monitoring and control of HVAC&R. It provides comprehensive set of messages for conveying encoded binary, analog and alphanumeric data between devices.

The objective of this article is to give overview of BACnet protocol and provide details about adding a new service(s) to open source protocol stack. For more details about protocol, refer to BACnet specification.

Keywords: BACnet protocol, Building Automation

Description:

The BACNET protocol is based on a four layer collapsed architecture which corresponds to the physical, data link, network, and application layers of the OSI reference model (as shown in figure).

BACnet Stack

The source code can be downloaded from http://sourceforge.net/projects/bacnet

Application Layer

The common appearance of the application layer protocol control information is shown in figure below.

BACnet App Layer

The PDU Type field is always present and indicates the used service primitives. Six groups of services are possible: Confirmed Service, Unconfirmed Service, Segment Acknowledge, Reject, Abort and Error, each service expressed by a different PDU Type number.
For guaranteeing delivery the Confirmed Service is used, with unique Invoke ID numbers to confirm each request using this number. The Invoke ID will be omitted when the unsecured Unconfirmed Service is selected in the PDU Type field. Segmented messages are indicated by the PDU Flags. Finally, the Reject, Abort and Error Services provide error detection and recovery.

Objects

Objects are modeling of Control Devices as object (as in OO concept) An object is a data structure to store device information. All objects are referenced by their Object Identifier property. Thus, Object Identifier + Device Id provides unique mechanism of referencing every object in Control System Network. Each object type has different set of properties; refer to BACnet specification for more details. Services (Methods) are to be provided to access these properties.

BACnet Object

Some of these objects are the ANALOG-INPUT, ANALOG-OUTPUT, BINARY-INPUT, and BINARY-OUTPUT. A read access of the input object’s Present-Value property results in reading a value that represents the state of a hardware dependent analog or binary input. A write access to the output object’s Present-Value changes the state of a hardware dependent analog or binary output.
Most of the objects and properties are encoded inside PDU to ensure a secure control of the system. To encode these objects, properties and their values inside the PDU a special combination of tags are used.

Services

Services are set of functions used by protocol service provider and protocol service user. The services provided by the application layer are divided into 6 groups. BACnet defines confirmed application services based on a client and server communication model. The BACnet-user that assumes the role of a client is called the requesting BACnet-user” and the BACnet-user that assumes the role of the server is called the “responding BACnet-user.” The client and server
model do not apply to unconfirmed services.

BACnet_Services

A client requests service from a server via a particular service request instance /number. The server provides service to a client and responds to the request. Invoke ID is used to track the service request.

Alarm and Event Services

Sometimes it is necessary to inform a operator if a particular event has occurred, e.g. an alarm. This group of services provides functions to notify a remote device if a specified value has been changed, or a specified event has occurred, to acknowledge a notification and to subscribe to a list of notified remote devices.

If a remote user wants to be notified if a value of a specified property of a particular object has been changed, it has to invoke the subscribeCOV service that adds this remote user into a list. When the value of this property now changes the device itself invokes the COVNotification service for each remote user in the subscriber list. This service can be sent as a confirmed or unconfirmed message, as chosen by the subscriber. For notification on any event, the device uses the Event Notification service. This service can also be sent as a confirmed or unconfirmed message, and some events, e.g., the event of type alarm may require a operator acknowledgment which can be convoyed using the Acknowledge Alarm service. And at last it is possible to get a summary of every event-initiating object of a specified device using the getEnrollmentSummary. The objects can be selected by the use of filters to define search criteria. For a summary of active alarms only, which means only objects in an abnormal state and notification type alarm, the getAlarmSummary service is adequate and faster.

File Access Services

For accessing the object of type file there are two services available, the atomicReadFile service and the atomicWriteFile service. These services perform an open-read-close operation and an open-write-close operation respectively on the contents of the specified file. The file can be accessed as records or as a stream of octets.

Object Access Services

This group provides services to access and manipulate the properties of objects whose properties are accessible. The value of one specified property of object can be requested with the readProperty service. This function allows read access to any property of any object. Modification of property values is conveyed by the writeProperty service. An object property that is a list requires special functions to add and remove one or more list elements from this specified property. These functions are provided by the addListElement and removeListElement services. It is possible to create new instances of an object using the createObject service. The behavior of such created objects depends on the implementation and on the device itself. Initial values for the object’s properties may be provided with this function. Existing objects can be deleted using the deleteObject service. The event enrollment objects, which contain information required for managing events.

Remote Device Management Services

It is used to determine all devices and their network addresses by using the who-Is service. The devices respond using the i-Am service and identify themselves to the requesting device. The I-Have service is a response to the who-Has service, but it can also be used to advertise the existence of a given named object or an object identifier. The Time Synchronization service is used to notify a device of the correct current time, so that the device may synchronize its internal clock with other devices.

Virtual Terminal Services

The maintenance and the advanced diagnostic features of an operating device are supported by using these services. The BACNET defines three services to establish a VT-session, to exchange character-oriented data and to terminate this session. The VTOpen service establishes a VT-session, the VTData service
transfers data and VTClose service terminates a previously established specified session or a list of sessions.

Network Security Services

Network security can be achieved by exchanging and requesting of 56-bit Data Encryption Standard (DES) cryptographic keys. A common session key can be obtained by a specified key server using the requestKey service.

Architecture of BACNET Stack – Single Threaded

The following section talks about the open source protocol stack. It gives understanding of how a given request travels through various code segments. Example:- client requests for a file read.

BACnet Design Flow

How to add a new service

Following steps give brief about adding a new service to BACnet. Before going into details some pre-requisites are as follows.
Pre-requisite #1: You must have good understanding of section 12, 13, 20 and 21 of the BACnet specification. It is must.
Pre-requisite #2: You should know architecture of source code.
Steps 

1) Copy source code of similar service. For writing unconfirmed service, copy existing unconfirmed service source code. The source code for a given service ideally consists of 4 files, for example
a. utm.c – containing encoding & decoding of APDU
b. utm.h – data structures and prototype declaration
c. h_utm.c – receive handler code. This is a callback function, which
decodes APDU. Further processing of IDU-ESMS specific should be
done here.
d. s_utm.c – send API for initiator, calls functions to encode APDU.
2) Then use the detailed description of the service entities given in Clause 21 of the BACnet standard. The trick is reading ASN.1 and figuring out the meaning of when to use context or application tags and when to use opening and closing context tags.
3) If the entity is listed with brackets [], then it is context tagged. There are separate functions for encoding application tags and separate functions for encoding of context tags. It is quite obvious that if encoding is done using context tags then decoding should be done using context tags.
4) If there are no brackets, it is application tagged.
5) If there is a complex data type (like ABSTRACT SYNTAX), then it uses opening and closing context tags around it.
6) Everything ends up being encoded with primitive data types using functions from bacdcode.c. This file contains encoding and decoding functions for all types of data e.g. unsigned, string, enumerated etc.
7) Header file should contain structure definition as per Clause 21. This clause also defines types of variables within that structure.
8) After writing the send and receive API, please register (or add) receive handler using “apdu_set_confirmed_handler” or “apdu_set_unconfirmed_handler” or other error handlers
9) Then start unit testing your code.

With this reader should be able to get started with BACnet development and testing.

Posted in Technical | Tagged , | Leave a comment