Skip to content

karthiksing05/DocDoctor-AI-ATL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocDoctor

Inspiration

Our team is a diverse group of CS majors who work across many different fronts—robotics, IoT, AIML, full-stack—but the one thing we could all agree on was that chat interfaces are overused and outdated. From the beginning, we wanted to build a service where users could use gen-AI to bridge the gap between initial actions and consistent, exigent follow-throughs—interfaces Interfaces that accommodated questions with high likelihoods of near-immediate future relevance. We also wanted to build AI applications that would motivate advances in enterprise.

As we brainstormed, one of our group members recounted a funny story about a family friend–we’ll call him John—who worked at a call center. John struggled to find documentation for a problem that was identical to a case from the previous week. He couldn’t find documentation at the instant he needed it because he was limited by antiquated lookup solutions that didn’t use semantic targeting. John remarked on his constant juggling of looking up information and helping the consumer, neither of which could be accomplished without detracting from the success of the other.

John’s troubles are not unique: While millions of services exist that either transcribe calls or efficiently search and display documentation in a user-friendly way, there’s no service that integrates these features intuitively and reliably. From a consumer standpoint, call centers are infamous for long hold times, and from an operator standpoint, services can’t be provided at peak efficiency without equally efficient access to information. Furthermore, most semantic searches fail on jargon-dense text chunks, where disambiguation of relevance is challenging with only geometric similarity.

We designed DocDoctor from the ground up with efficiency during calls in mind, ensuring that customers can leave with fruitful solutions and sales reps can focus on problem-solving without being inhibited by a lack of information.

What it does

DocDoctor, a "documentation doctor", is an extension to CRM software built to assist company representatives in help-desk-esque calls. The core pipeline works as follows:

  • The tool transcribes the call live in the "Transcript" sidebar, marking relevant events that can be expanded on.
  • The user has the option to either highlight words or phrases from the transcript and search relevant "reports", or click words/phrases that are smart-highlighted by an LLM-powered backend as contextually relevant.
    • "Reports" are summarized interactions that are logged with a title, relevant problem, and final solution.
  • All reports populate in the "Canvas" as freeform cards that can be expanded for higher detail with summarized information about the key problem and solution
  • Additionally, a "Search" sidebar allows you to pull relevant reports regarding
  • At the end of a call, a window pops up that allows you to modify an LLM-initialized report and add it to a database. You also have the option to manually create and add reports to the database.

How we built it

Implementation

Call Client (Python CLI tool)

  • Uses ElevenLabs for low-latency speech-to-text and keeps a rolling .wav buffer of the active speaker so transcriptions improve with context.
  • A simple voice meter provides visual confirmation that audio capture is healthy.
  • On startup the client requests a new document ID from the router, lets the user mark the current speaker (customer or representative), and continuously sends async transcription snippets + speaker metadata to the backend for real-time appending.
  • Future plans: integrate speaker diarization to auto-detect speakers, and build a web-based call client that integrates directly into the web UI.

Frontend Stack (Angular.js, TypeScript)

  • The web client is the support rep's primary UI: live transcript, ML-driven smart highlights, and a canvas of relevant reports.
  • Clicking a highlighted phrase triggers a Cobweb-backed search and surfaces ranked report cards for quick reference.

Backend Stack (Golang router + MongoDB)

  • The router orchestrates flows by accepting transcript snippets, forwarding them to the ML stack for relevancy labeling, persisting annotated transcripts, and requesting verbose problem–solution summaries on websocket close.
  • These summaries are uploaded as new documentation in the MongoDB database.
  • It also streams updates to the frontend and exposes API endpoints for retrieving documents via the Cobweb search API.

ML Stack: (Google Gemini, Cobweb, SentenceTransformers)

  • Traditional relational databases have become limited in their capabilities of semantic search and have an extremely rigid schema, preventing them from being scalable. This gave rise to the vector-store database, which encodes and retrieves dense vector representations based on geometric similarity. However, as aforementioned, even vector databases fail for highly semantically targeted datasets.
  • We use a symbolic cognitive algorithm, Cobweb, to retrieve our documents instead of traditional methods.
    • Cobweb is an unsupervised algorithm that uses an information-theoretic definition of similarity for embeddings (rather than geometric similarity metrics like the dot product), calculated over an incrementally built tree topology. It matches geometric similarity in performance and time complexity.
    • Additionally, case studies have been documented where data rich with niche vocabulary fails with traditional dot-product search but functions with Cobweb.
    • We train a Cobweb-database on previously asked problems from prior reports, and build an API around the database that allows requests to query the top-k results per a given relevant keyword or phrase.
    • We also apply a Principal Component Analysis and Independent Component Analysis to efficiently disambiguate data before feeding to our Cobweb-backed database.
  • In order to tag text and summarize each transcript into a report, we make use of Google Gemini 2.5's zero-shot learning capabilities. With some effective prompting, we were able to summarize transcripts effectively by the relevant problem(s) and solution(s) discussed.

Challenges we ran into

Most modern projects are glorified LLM-wrappers that make use of a chat-focused interface to jump-start the process, and navigating an interface where we attempted to predict probable questions before they were asked proved challenging. Our UI went through many iterations to ensure that call operators could take advantage of our framework in a way that differentiated from superfluous chatbot-search-engines by providing intuitive functionality. Creating an interface that felt intentional required a lot of thought.

Additionally, we pivoted many times to identify a project with a quantifiable need. We really wanted to build a solution that would champion the use of safe AI, but struggled to see what unique innovations we could make within the time frame allotted. We found ourselves constantly working to converge on an idea that encompassed our diverse goals and priorities as a group.

We also have a very modular stack; coordinating the development of our frontend and backend, software for interfacing with calls in real-time, and ML-specific backend required a lot of planning, and doing it all in parallel was certainly a challenge. Many times, we had to call a session to stop all development and return yet again to a whiteboard to sketch what our vision was, what our final product entailed, and whether specific details were part of the scope. Fortunately, each time we did this, we all walked away with clearer direction than ever.

Accomplishments that we're proud of

We’re proud of how DocDoctor evolved from a small idea into a cohesive, scalable system that blends real-time transcription, intuitive design, and advanced AI search. Our robust AI pipeline—built with ElevenLabs, Google Gemini 2.5, and a Cobweb-backed database—performs reliably across diverse inputs with low latency and strong contextual accuracy.

We’re especially proud of the interface, which lets agents interact naturally with transcripts, including highlighting phrases, retrieving reports, and generating summaries, without breaking focus. Implementing Cobweb for information-theoretic retrieval was another major milestone, enabling precise matches even in jargon-heavy data.

Equally, we value our collaboration and iterative design—coordinating four distinct technical stacks while keeping a shared vision. And above all, we’re proud that DocDoctor reflects safe, human-centered AI, empowering users rather than replacing them.

What we learned

One of our most important lessons was that building a cohesive project with a shared vision between four programmers of different backgrounds was a LOT. We felt incredibly worried about the amount of time we spent in the brainstorming stage, but we are all super satisfied with our final product. We know that we wouldn't have obtained this cohesiveness if we hadn't fleshed out every idea we had.

Additionally, the integration step is HARD: we returned to the whiteboard various times to ensure that our full stack flowed from start to finish. Communication and verbosity became key as we all ensured that our understanding of the thing we were building was the same.

What's next for DocDoctor

DocDoctor is still the product of a hackathon, but its modular architecture and extensible backend make it primed for growth. Our next steps focus on scalability, deeper AI integration, and broader enterprise adaptability.

  • Expanding beyond prior reports:
    • Currently, DocDoctor’s semantic search retrieves insights from previous call reports. Our next milestone is to expand the scope of retrievable data to include SOPs, technical manuals, knowledge base articles, and internal documentation. This would allow DocDoctor to function as a unified, enterprise-wide source of truth, capable of pulling relevant information from any organizational dataset.
  • Advancements in Cobweb and hybrid retrieval:
    • While Cobweb already outperforms traditional vector databases in jargon-heavy domains, we plan to explore hybrid retrieval pipelines that combine Cobweb’s information-theoretic clustering with dense embedding search for greater precision and scalability.
  • Deeper CRM integration:
    • Our next iteration will focus on integrating directly with CRM platforms such as Salesforce, HubSpot, and Zendesk, enabling DocDoctor to automatically update tickets, log call outcomes, and provide contextual recommendations during live customer interactions.
  • Lightweight deployment and on-prem support:
    • We envision containerized builds of DocDoctor that can run on-premises or in private clouds, ensuring compliance with sensitive data requirements and enabling cost-efficient deployment at scale. We have the advantage of not storing any transcript data permanently, and our Dockerized system is posed for deployability.

Ultimately, DocDoctor’s future lies in bridging human expertise with AI assistance at every layer of enterprise communication, transforming reactive customer support into a proactive, data-driven, and intuitive workflow.

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •