Podcast Content Studio
Sales and Marketing Assets from Kitcaster™
June 18, 2021
Podcast / Host Links
Download and share
Click 📋 to copy
Article Summary 📝
Click 📋 to copy
Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk
On June 18th, 2021, the CEO of Unstruk Data Kirk Marple, was interviewed by Tobias Macey on “Data Engineering Podcast.” On this episode, Unstruk Data is described as a “data warehouse” for unstructured data. They offer automated data preparation through metadata enrichment, integrated compute, and graph-based search. Marple, who founded the company during the Covid-19 pandemic, wanted to address issues having to do with the organization of image-based data- an issue that spans across diverse industries. Addressing the sheer volume of media data, Marples speaks to Unstruk Data’s mission to provide a layer of structure to better the organizational and analytical process for industries that accumulate and rely on image-based data. Unstruk Data will create an accessible zone for metadata that is often left in disorganized pools by utilizing processes such as comments, tags, geospatial and time logs- data organization that is able to be personalized according to the client’s needs.
In this episode, you will learn
- About Unstruk Data, what they do for metadata organization.
- The thought process behind the founding of the company.
- How to understand what unstructured data is.
- The state of the industry that uses unstructured data.
- Unstruk Data’s workflow of ingesting and enriching unstructured data.
- The future of Unstruk Data.
“Because of the fact that some of these files could potentially be gigabytes in size, just for an individual object. I’m wondering how you handle things like…being able to manage any sort of transfers? Optimizing the processing and extraction of information from these files, so that it doesn’t explode your usage bill and end up causing you to be sort of upside down in terms of your profit model, and just sort of the overall complexities of dealing with these large and complex file objects.”
Kirk Marple – “Even if we’re taking data from, say, an s3 bucket, I mean, we do at the very least index it, and create thumbnails and things like that. So today, we’re actually using a caching model where we do bring it in, we archive it for a short amount of time. And then we have storage policies, we can either archive it permanently, we could put it into cold storage, or we just throw it away, after we’re done processing…Everything is really managed in a streaming model. So, we don’t bring whole files into memory, anywhere. That’s a real key architectural choice, I think, to your point of not blowing up. We can deal with very large files…in the media world, you’re dealing with terabyte file sometimes. We don’t see it in the industrial use cases much, those file sizes, but we have had to re-architect around, say, like millions of points in a point cloud. Very large point cloud structures have been a critical design point that we’ve had to put a bit more thought into. It’s good, I have a great front-end team that thinks through “Okay, like when are we going to break a browser?” “How much can we put into a browser and think about the memory side of it?” I’m more of a back end, so it’s good to have that balance of people thinking about it from both sides of where your limitations are”
“I’m interested in digging a bit further into the actual data modeling aspects. And for industrial use cases, you might have some ontological concepts of you know, “I have a physical location, I have a number of workers, I have equipment.” I’m wondering if you can just talk through some of the data and domain modelings that goes into the Unstruk platform and how you think about the extensibility of that, and the workflow of specifying a sort of custom ontology for building this entity graph?”
Kirk Marple – Yeah, this is really interesting, and this is where a lot of my background in the DSPs. I did a lot with audio metadata, pulling in data from all the different broadcast studios, trying to kind of correlate all the different music and album and all those different things together. And I started to apply similar methods. But you’re right, I mean, it doesn’t always mean it’s not as formalized in this domain as you might get like in an IMDb structure and…a kind of more Spotify-type music structure. We’ve taken our approach- and this is a sort of prescriptive approach- of, there’s a concept of a tag. So we kind of kept it simple that most everything maps to a tag in the system. From a user perspective, there can be user-generated tags that a human assigns to a piece of content, or machine-generated tags that say, a machine learning computer vision algorithm or that we get through entity extraction. So initially, we’ve tried to map everything to basically a data model that is related to the schema.org model… So if they want to build their own data models, eventually, or I mean, extend with their own metadata, they can be our API. Out of the box, they get a really solid, robust, extensible data model that’s somewhat generic in the sense that with this tagging model.
Additional topics discussed
- How Kirk Marple came about this area for optimization
- Paralleled metadata problems between audio-visual industries and software engineering.
- When to not use Unstruk
- Unexpected uses of Unstruk
- Customer involvement and collaborative potentials with Unstruk
Tobias Macey is an engineer who currently manages the Technical Operations Team at MIT Open Learning. He also owns Boundless Notions LLC where he offers advice on data infrastructure and cloud implementation. Looking to fill a gap in podcasts about data science, Macey noticed that there were few podcasts specifically on the subject of data engineering. Data Engineering Podcast is meant to offer insight to various new projects and platforms of data scientists to keep people up to date and in the loop. Macey also hosts the podcast “Podcast.__init__”
Kirk Marple is the CEO of Unstruk. He began his career in Media Management and entertainment, owning a transcoding company. After selling it to diversify his prospects, Marple found inspiration within other areas of data management such as the automotive industry. To Marple, the use of data for car imagery and closed captioning had parallels in “time-based telemetry.” However, Marple noticed problems within the storage of unstructured data and metadata within file-based formatting and decided to expand into the business of optimizing a system to store, organize, and analyze metadata within pools of stored information.
Resources and links mentioned in the show