System Agility, Data Agility

The term agility has become a standard in the software industry to denote the ability of an organization to modify their product quickly, generally in small iterative steps, to respond to customer feedback, competitive landscape development, etc. The agility of a software product can be measured in terms of the latency between a motivating design change and the availability of that change to the user, moderated by some degree of quality assurance, regression testing and so on.

An agile engineering environment depends on core and deep investments in certain processes and rigour. It is imperative that engineers can build the software, run a battery of regression tests, rely on the semantics of an API via a strong suite of unit tests and so on.

That being said, there is another aspect of agility that is becoming more and more relevant: data agility. It is quite possible, and somewhat common, to build data processing systems which depend on some specific distribution of features in the input data. This can particularly be the case with supervised machine learning systems. Given a set of inputs, the learning algorithm models distributions in those inputs in order to set parameters which at run time can make predictions. While you may have an agile engineering practice for the code, dependencies on qualities and assumptions regarding the input can put you in a position that prevents agility with respect to the data.

Data agility is acheived when the system is designed to either be independent of certain types of qualities of the input data, or when there are well defined processes, tests and analytical tools that radically reduce the time from identifying a new data source to shipping it in production.

System agility is not data agility, and aiming for data agility requires an upfront investment in tools specifically for that purpose.