Approaches in using Generative AI for Enterprise Content Management

Allen Chan
9 min readAug 1, 2023

Co-authors: Luigi Pichett, Pierre Feillet, Yazan Obeidi

See other related stories in the “Approaches in using Generative AI” series:

  1. Enterprise Content Management
  2. Decision Automation
  3. Workflow Automation Part 1
  4. Workflow Automation Part 2

Generative AI has been rapidly evolving, enabling different and more sophisticated interactions with Large Language Models (LLMs) like those available in IBM watsonx.ai. In this series of articles, we will take a use case based approach to look at how we can leverage LLMs together with existing automation technologies like Workflow, Content Management, Document Processing and Decisions to enable new solutions.

In this article, we will be looking at a popular LLM use case — leverage LLMs to handle semantic Question and Answering for information. Specifically, we will focus on how to apply the technique to information stored in an Enterprise Content Management system like the IBM FileNet Content Manager. Many FileNet deployments have millions or even billions of documents, and it provides extensive search capabilities based on existing metadata and textual content associated with each document. On the other hand, these are syntactic search — so it is not aware of synonyms or abbreviations, nor can the search results be summarized and simplified — it would just return the documents that the system can find based on specified matching keywords. If we can have a flexible Q&A system that can understand contextual questions with the ability to summarize information from all the buried knowledge, it will be tremendously beneficial. In this brief article, we examine a few possible solution patterns that will allow us to apply LLMs to support a more flexible semantic Q&A system with documents in your document repository.

We review four possible approaches to accomplish this use case by utilizing Generative AI interactions among a selected set of enterprise documents, an input query and an underlying LLM, each offering unique advantages and challenges.

Considerations

In designing a solution, we must consider several factors:

  1. Training on LLM can be expensive that requires hundreds of GPUs for days or weeks. Assuming a document will have 1,000 tokens (likely more) — a million documents will have 1 billion tokens. The cost to fine-tune an existing LLM model with 1 billion tokens is going to be very high. [1] While parameter efficient fine-tuning (PEFT) methods are a topic of active research, for example, Low-Rank Adaptation (LoRa) [2], fine-tuning can still lead to understandable outcomes such as catastrophic forgetting and bias towards the fine-tuning training set corrupting pre-trained behavior [3] reducing multi-task performance, and even reduced performance on the fine-tuned task.
  2. Access control, data privacy, and sensitive data. Many documents in an enterprise content repository have straight access control and contain sensitive and private data. Care must be taken to make sure those information does not make its way into the LLMs unintentionally. There is no known way to ask an LLM to completely erase the knowledge that it has been trained on.
  3. LLM is designed to process “language” data, not Excel, not PDF, not JSON, not Powerpoint, not photos… so we will need to understand how to “normalize” the information that we want to store in LLM. This requires particular attention as the choice of a “good” textual representation is dependent on the expected prompt and response behavior we want to achieve for a given task.
  4. Truthfulness and Traceability — we want the model to answer questions based on facts that are stored in the ECM, not making it up (“hallucinating”) based on a combination of information from the ECM and its base foundation model. In addition, we must be able to trace the answers back to the original documents.
  5. Not all LLMs are created equal — they use different algorithms, different training data, different weighting on data, and different pre- and post-processing. So the quality of the answers will rely on picking the correct LLM for the problem domain.

A quick view of the approaches

In this article, we will cover 4 possible approaches to solving the problem. By examining these approaches, we aim to provide insights into how we can leverage Generative AI effectively for ECM content, enabling LLMs to produce more contextually relevant and informed responses, while also considering LLM aspects like token limitations, temperature, inference costs, and performance.

1. Feed documents from ECM directly into LLMs for fine tuning

Description: In this approach, we will take compatible documents (keeping in mind LLMs can only accept textual information) into the LLMs for fine-tuning. In this case, the role of the Prompt Generator is to produce (prompt, response) pairs to fine-tune the model. This has the potential of feeding hundreds of millions or billions of tokens into the model. Once the fine-tuning is completed, we can use the usual prompt engineering techniques to ask questions about the knowledge.

Figure 1. Feed documents from ECM directly into LLMs for fine turning

Pro: This approach is straightforward and easy to implement. It allows users to input prompts and receive generated text directly from the LLM, making it accessible to a wide range of applications. It requires minimal preprocessing and setup.

Con: This approach might work as experiments, but in real life, this will lead to token explosion, expensive and lengthy fine-tuning, and leakage of sensitive data (as there is NO role-based access control in LLMs). In addition, given we are now mixing enterprise data with foundational data, there is a real risk of hallucination in the answers provided by the model.

2. Leverage ECM Built-in Search

Description: In this approach, the Q&A client will first use FileNet’s search to get the relevant documents, the system will then feed the content of the documents as part of the prompt and asks the LLM to limit the answers to only information that is in the prompt.

Figure 2. Leverage ECM Built-in Search

Pro: This approach is straightforward and easy to implement. It allows users to input prompts and receive generated text directly from the LLM, making it accessible to a wide range of applications. It limits the token count to only information that it passed to the model in the prompt and requires minimal preprocessing and setup. Given we are now instructing the LLMs to only return information present in the search result, we provide some form of access control and limit the risk of hallucination.

Con: Since we are first going to perform a syntactic search to retrieve all relevant documents on specified search keywords, it will only return information when there is a direct match. Furthermore, the user may be required to take manual action to curate the results if there are many, or, to extract the particular relevant context(s) within the retrieved document(s) if the token count is too great. While in general, approaches such as summarizing the document context with LLM are possible, in this case, the user must also ensure that the input to this process is semantically relevant.

3. Refining Input Context with Knowledge pre-processor

Description: In this approach, a Knowledge pre-processor is used to optimize the ECM input context. This may include filtering out sensitive data along with other aspects of data preparation, cleansing, de-duplication, classification, etc. Thanks to this preparation step of context-optimization, a modestly wider selection of ECM docs can be used and combined along with the user query, into a prompt to the underlying LLM. The system hosting the LLM then generates a response based on the input prompt.

Figure 3. Refine Input Context

Pro: By leveraging an optimized context, potentially spanning across more knowledge from the ECM store, this approach has the potential to enhance the quality of generated responses, along with limiting the disclosure of sensitive data from the corpus.

Con: Similar considerations of pattern #2: LLM can still not fully grasp the nuances of the context, leading to potentially inaccurate or irrelevant outputs, and the usable context is still limited by the number of tokens accepted in a single interaction by the underlying LLM, along with the associated costs.

4. Adopting a VectorDB for Semantic Search

Description: In this approach, we leverage a technique called Retrieval Augmentation Generation (RAG) [4] approach the data-preparation phase uses GenAI Embeddings generation service and a Vector Database (or VectorDB, a DB system optimized to store and quickly retrieve the long numerical “embeddings” associated to phrases and document fragments) to store such embeddings generated from trusted ECM documents. The shortest is the “distance” between 2 vector embeddings inside the DB, and the greatest is the semantic similarity between the two concepts/phrases they originated from. No upfront LLM tuning occurs during this phase.

Figure 4. Adopting a VectorDB for Semantic Search

During the Q&A phase, the input user query is also pre-processed to obtain its vector-embedding representation and hence allow to perform a semantic search on it and select the n-most relevant document-fragments from the VectorDB, to be used as qualified context for LLM query.

Finally, the input user query is combined with the qualified context retrieved from the VectorDB into a compact prompt to the underlying LLM, which can provide an answer leveraging its conversational ability, along with enhanced accuracy and traceable evidence to the relevant context. The number of LLM tokens that are necessary during the interaction is limited to the semantically similar document fragments, based on the configurable VectorDB lookup, to the user query.

Pro: Adopting a VectorDB provides a rich knowledge base for RAG interactions, where the LLM can augment its responses with contextually relevant information. This approach can lead to more accurate and insightful outputs, especially in complex scenarios where contextual understanding is crucial. Additionally, it reduces the burden of fine-tuning and manual data curation, along with limiting the number of tokens required during LLM interactions and associated costs. Finally, if the VectorDB is indexed over document “chunks” then the retrieval can be much more targeted to relevant sections of documents with greater token efficiency than in the other 3 cases.

Con: Implementing a VectorDB can involve significant setup and maintenance efforts. Ensuring the VectorDB contains relevant and up-to-date information also requires focus and continuous updates and data quality control, which however is anticipated in the preparation phase. The effectiveness of the VectorDB largely depends on the availability and comprehensiveness of ECM-stored knowledge. Finally, the strategy used to fragment the text representation into chunks is also dependent on the expected (prompt, response) behavior desirable for a given task.

Summary

The comparison of the described approaches reveals a spectrum of trade-offs in terms of performance, implementation complexity, forecasted costs, and quality.

Fine-tuning LLMs (approach 1) enhances performance and domain-specific understanding. But it comes with significant computational costs, requires access to relevant datasets, and increases risks of model overfitting, limiting generalization to diverse tasks.

The simple prompting (approach 2) is easy to deploy and requires minimal resources, making it suitable for quick application development. However, it may lack context awareness and precision, resulting in generic responses, along with the sufferings of a lack of LMM scalability. Nevertheless refining the input context can improve response quality at the cost of pre-processing costs.

Adopting a VectorDB for upfront semantic search significantly enhances response relevance and accuracy by augmenting LLMs with specialized knowledge. While the implementation complexity is higher, the long-term benefits of a rich knowledge base and reduced fine-tuning requirements can outweigh the initial costs.

The Q&A RAG (“Retrieval-Augmented Generation”) pattern, which uses Embeddings and Vector Databases, to generate/store/retrieve word embeddings from trusted documents, and later enable focused-context Q&A interactions with LLMs, is a recently revamped and promising trend in the Generative AI space. The approach promises to overcome LLM scalability limitations in terms of tokens and related interaction costs, which are instead typical of the prompt-engineering approaches when dealing with a large set of ECM documents.

From a quality standpoint, the pre-processing of the user query, via embedding generation and a semantic search in the VectorDB, enables focused LLM interactions, that have the potential, as long the VectorDB content is kept relevant and current, to provide more meaningful answers, reduce LLM hallucinations, and offer traceable sources to the relevant context.

While the implementation complexity is higher, the long-term benefits of a rich knowledge base and reduced fine-tuning requirements can outweigh the initial costs.

Please also read part 2 of the series: Approaches in Using Generative AI for Business Automation: The Path to Comprehensive Decision Automation

References

[1] https://www.hpcwire.com/2023/04/06/bloomberg-uses-1-3-million-hours-of-gpu-time-for-homegrown-large-language-model/

[2] https://arxiv.org/abs/2106.09685

[3] https://arxiv.org/abs/2301.11293

[4] https://www.promptingguide.ai/techniques/rag#

--

--

Allen Chan

Allen Chan is an IBM Distinguished Engineer and CTO for Business Automation, building products to get work done better and faster with Automation and AI.