Software code audit: Know what's in your software

by Kamal Hassin, Protecode

This vendor-written primer has been edited by ITworld, but readers should note it will likely favor the submitter's approach.

Software is rarely written from scratch. Resourceful software development organizations and developers use a combination of previously created code, commercial software and open source software, and their own creative content to produce the desired software product or functionality. And so anytime software changes hands there is a need to understand its composition, its pedigree, its ownership, and any third-party (including open source software) licenses or obligations that govern its use by its new owners.

A software code audit (not to be confused with a software audit, which generally has to do with making sure you have paid for the software applications you are using in your organization) identifies the building blocks (files or software modules or packages, or even five lines of external code) that are used in a product or exist in the code inventory of an organization.

The audit process establishes code ownership, licensing or copyright obligations around any third party content in the code portfolio, authorship, package versions and export restrictions. Software code audits can also highlight alignment with the policies around either use or delivery of software in a particular organization. Software code audits can also pinpoint code reuse between different portfolios within or across organizations.

The common mistake is to only start a code audit process in the last step of a transaction. Starting the audit in anticipation of a transaction allows for timely correction of any shortcomings detected during the audit. You certainly do not want to delay a transaction because of uncertainties uncovered during the audit.

What you need to know before you start

• the objectives of the audit, to understand the company or product that is audited, • the specific business of the target companies. • their third party software practices, • the software environment that is used in the target company, and • their open source adoption policy (if any)

In some cases, all code that must be audited is not in one place, or must be "assembled."

Depending on the size of the project, this part of the audit process can take can take 1-5 days.

Software code scanning and detection

Once the legal framework is in place, the code is available, and the environment discovery process is complete, an automated scanning application is set up. The complete job is broken into logically-meaningful segments (for example, identifiable subprojects and modules), and then the code scanning is carried out.

The scanning application will generate ownership warnings, such as proprietary code without appropriate headers or copyright information, or conflicting license information.

The reports created by the automated solution are reviewed by the audit staff, and a final executive report is assembled. Depending on the size of the audit project, this step can take as little as a couple of days (small project containing thousands of files) and up to two weeks (for a very large portfolio of hundreds of thousands of software files).

End results of a software code audit

The end result of a software code audit is a combination of two reports.

The first is a high-level executive overview report that is custom created by the audit staff. This report defines the software code audit environment, the process used, and the major findings, in simple graphical and tabular format. Attention is drawn to specific packages, files or licenses. Information on commercial or open source software components, a description what each piece of software does, who created it, and related references on public-domain project websites should be provided.

Important information such as copyright owners, licenses associated with the discovered software packages, and optionally encryption or export obligations, are tabulated. The text of all licenses that are discovered is included with this report. The report lists all external content, including complete third party software files, modules or projects, or snippets of code that have a code structure similar to known open source projects. The findings of a software code audit must be verifiable; therefore references or hyperlinks to all information that is discovered would be provided.

The second report is a detailed machine-generated report, listing all packages, files, licenses, copyrights, etc. associated with all software files in the target portfolio, and, optionally, a license obligation report, summarizing the obligations associated with all licenses found in the portfolio.

How much does a software code audit cost?

Generally the cost of an audit is proportional to the complexity of the project, which in turn can be roughly defined as the number of files in the target portfolio, the nature of the packages (commercial or public domain) used in the portfolio, and the information that is available about those packages. Most audits (thousands and up to hundreds of thousands of software files) fall within a $5-$40K range.

If you’re planning a specific transaction involving software assets, whether it’s an M&A, equity investment, product introduction, demand for IP indemnity, commercialization of research or other event, conduct a software code audit as early as possible in the transaction. Knowing what’s in the code can speed up transaction times and reduce costs associated with fixing problems at the last minute.

Kamal Hassin is Director of R&D and Product Management at Protecode.

Free Course: JavaScript: The Good Parts
View Comments
You Might Like
Join the discussion
Be the first to comment on this article. Our Commenting Policies