Big Data & Brews: Informatica on Database Schemas

By Stefan Groschupf

My chat with Informatica’s Anil Chakravarthy touched on the subject of database schemas, ETL and dynamic mapping. With the growing number of data sources and complexity, Anil argues that a purely static schema has only limited use and that flexibility is critical. He also points out that technology doesn’t have to provide the perfect answer, but it should save time, which to me, is the most valuable asset.

Enjoy the next episode of Big Data & Brews!


Stefan: What’s your perspective? As you said, there’s a growing number of data sources and more insights shape up as you’re enriching more the data. Is it really hard to define the static schema that we used to do?

Anil: Yeah, absolutely. Let me actually, just because it’s good conversation, I’ll start with the other extreme because the schema discussion usually goes from either …

Stefan: Black to white.

Anil: Yeah. Either everything is fixed or nothing is fixed. As you mentioned earlier, when I was at Symantec, one of the businesses, product groups, that I ran was data loss prevention, the DLP business. There, there is no schema. It’s basically, how do you un-structure data, especially over email? Somebody might be sending social security numbers, etc. What do you do in those cases? DLP became a very successful category by just having essentially regular expressions. That you look for certain data.

Has that been enough? Clearly not because you look at what’s going on in the world of breeches etc. It’s necessary, but not sufficient. That’s what has shaped our world view. You don’t want to insist on schema everywhere. There will be many, many types of data where you can do perfectly good processing without schema. That is not sufficient by itself. Even in the world of security that we’ve been talking about now, like we just talked about, you need to understand metadata. You need to understand what is valuable data. You cannot combine it with other schemaless… You might, for example, have SharePoint documents where you’ll never get any schema, but they still contain valuable information in order to protect data and process it. You …read more

Read more here:: Datameer CEO Blot


Leave a Comment

Your email address will not be published. Required fields are marked *