There are three different categories that can be used to categorize data:
- Structured Data is normally organized into rows and columns. The most popular form of structured data is a SQL (relational) databases or excel spread sheets. These relational databases usually support existing applications that support the business.
- Semi-Structured Data is not organized in rows and columns like fully structured data is. Instead it is organized using metadata or tags. An example of this would be JSON. At EMC we use JSON as the default format of data received when using RestAPI calls to our storage arrays.
- Unstructured Data is file or rich media. It is made of documents, images, music, movies, etc.
The biggest challenge with Structured Data for a lot of my customers is their tools for management (normally a SQL Database) does not scale easily. A lot of startups I work with use a DevOps model to get their software solutions out to market quicker. The tools to manage structured data are normally not conducive to agile development methodologies which a lot of new companies have already embraced. Traditionally, companies would leverage a waterfall development cycle which would take months or years before they got any results. Now small teams work in agile sprints, iterating quickly and pushing code every week or two, some even multiple times every day. This was a huge reason why Github went from opening its doors in 2008 to becoming a multi-million dollar company and is “the spot” for software engineers on the web.
The hardest part with dealing with semi-structured data is that each type can use a different schema. SO there is never going to be a “one size fits all mold” with the different semi-structured data types, like XML and JSON. Both are used commonly in Java but they are very different. XML is a markup language which can add extra information to free-flowing plain text. Whereas an object notation like JSON is used when representing objects. JSON looks more like the data structures used in programming languages (like Java). They are both excellent examples of semi-structured data, they are used completely differently.
Unstructured Data are growing exponentially with data doubling in capacity every few years. Over 80% of data in in corporations is unstructured according to IDC. If not secured correctly, all of this growth can even create a lot of risk for companies. Security is an ever growing concern for businesses and with more data comes more to protect. Another challenge with unstructured data is traditional SQL databases have no way to manage it.
“There’s also no way, using a relational database, to effectively address data that’s completely unstructured or unknown in advance.” –MongoDB
It requires a new method of management then traditional applications can provide. This is leading the way for applications like MongoDB to enter the market in a big way. This is just some of the many challenges the variety and volume of data is creating in the industry.