By Cameron Sim, CEO, Crewspark Inc
Heartbleeds Fast Response (describing the current security data community)
The Heartbleed security bug (http://heartbleed.com/ ), which highlighted vulnerability with the very widely adopted security library OpenSSL was identified in April 2014. Although estimated to affect most of the worlds connected computers, a fix was identified and rapidly adopted inside a few days.
The IT security community has matured over a number of decades and continues to promote constant evolution and innovation through an open community and standard delivery channels. New security breaches will continue for the foreseeable future but a fabric to share and deliver solutions across the globe now exists.Within areas of Big Data and specifically at the confluence of Data Science, Analytics and IT engineering similarly telling trends are beginning to surface.
Innovations in Big Data (How is big data and data science evolving)
As the IT and data analytics communities continue to evolve in their application for further value across academia, business and other sectors; operational and commercial standards are sure to emerge that will shape how organizations interact in the future. Operational inefficiencies that exist today with the process of one company sending data (FTP or other method) to another is unreliable and will be superseded with widely-adopted standards that ensure format and quality of the underlying data, amongst other metrics. Organizations will not only be able to swiftly connect with each other for the transfer of analytics data but take part in an analytics network with similarly operational efficiency to the financial markets.
Automated Data Science, with Standards
At a similar pace, greater automation in analytics data processing, enabled through maturing data science practices (as well as the need to constantly reduce TCO within quantitative environments) will focus on storing and organizing data for reusability and traceability from processing and reporting. How data is collected, stored, manipulated, versioned, processed and reconciled back to original source is pertinent to achieving a mature and efficient data management platform, now and certainly in the future.
Another sure advancement in Data Science is the commoditization of algorithms, especially so with accuracy-based logic that include prediction and recommendation. Algorithms applied to data that infer insight have inherent value and can be quantified in terms of accuracy, also recorded might even include assumptions, biases and other metadata that add further context to a shareable algorithm.
The modular and reusable approach to data and algorithms will usher in new standards of automated data science, fostering rapid innovation, enable easier adoption and promise greater data sharing opportunities for all academic, data scientist or organization.
Paramount to any future data marketplace is the ability to ensure a standardized data format. Although government initiatives like www.data.gov have made great strides in making available public sets, many services still expose vastly different formats and reporting tools – much more needs to be done to ensure proper standardization and adoption of public data. Central to any global initiative is a governing body that defines and advocates the standards at which general conformity must happen, at present there are no dominant organizations or groups that have a global footprint and enough community adoption for such a global network.
A Corporate Data Marketplace
In the new world of analytics driven organizations and with a sustaining governance model in place for unstructured or procured data, marketplace interactions at the institutional level will enable even the smallest of adopters to be able subscribe, buy or sell analytics data from any other organization. It’s is also inevitable that the common practices of data science will transition from manual data preparation towards greater levels of automation and in time, a collaborative and inclusive network that fosters far greater levels of innovation than have been achieved to date. As in the IT security industry, standardization and collaboration leads to unification through a robust community and that can be established as critical mass is achieved in IT Big Data maturity.