Adventures in Data Enrichment

by An Trotter March 2, 2026

iManage announced using A.I for document enrichment at ConnectLive in 2022. I was delighted to be invited to ConnectLive 2023 to discuss the ongoing fruits of this collaboration at Hearst along with Jack Lees, Nachum Sokolic, and, moderated by Robert Florendine, Knowledge Engineer at iManage.

Our legal department is seeking to solve issues with search, primarily to retrieve language that fit a particular use case, but also to retrieve a specific document. Examples of contributing underlying factors include: (1) users failing to select or apply the correct meta data field values or doing so inconsistently; and (2) users not using the recommended search criteria or search type. We wanted to leverage AI to apply meta data values consistently and to assist with retrieval.

The beta project offered an opportunity to leverage automated document classification, data point extraction, and metadata generation to assist with search.

Following a successful Proof of Concept or POC, we migrated to the RAVN search engine (shout out to Tony Chermsirivatana who managed cleanup and migration of our 400K doc dataset). iManage then conducted client search interviews with team members representing basic, advanced and non-standard use cases. That gave us rich user stories and analysis to draw on for our work going forward.

iManage summarized the results and provided recommendations that we are now jointly addressing. One component is recently launched pilot document classification and new advance search capability.

As is typical in many organizations, for a number of years the team has been seeking but failing to arrive at a consensus on how best to categorize and tag documents. By leveraging an existing model and clearly defining and communicating that model to our users, we are hoping to develop a consistent approach that has otherwise eluded us. This will enable us to surface the information we are looking for and not just the records we ourselves have classified as individuals.

Our hypothesis is that while individuals might make different choices from the algorithm, having an objective standard that everyone can rely upon is expected to satisfy most. The goals is a taxonomy one accepts as a given, like the Dewey decimal system. iManage’s work with SALI is important towards reaching a similar standard.

In having more meaningful metadata associated with documents in Work, we are surfacing new information. It will be interesting to see what insights are generated as a result of that additional data. As curated generative AI emerges, we anticipate that pulling new insights may well accelerate. As Robert said, “the information experience is shifting.”

In legal ops, our goal is help legal professionals operate at the top of their practice and to remove time spent on ministerial work. As RAVN develops to provide an intuitive and robust way to categorize and retrieve data quickly, our professionals can focus on quality of work product and producing that product more quickly.

We are getting ready to release our first set of iManage use principles addressing matter creation, repositories appropriate to different use cases, document naming and versioning protocols, precedent or knowledge repository guidance. I suspect it will be an iterative process but I am hopeful it will lead us on the right road.

If you are a iManage customer you can view a recording of the session (Thursday, June 8th at 11:30 AM CDT) at ConnectLive 2023.

Slide reading ConnectLive 2023 What is Data Enrichment and Why It Matters