Skip to content

GenAI & Data Engineering Workshop [Part-1]: Enhanced Safety Assistant Through Intelligence

Featured Image

From Curiosity to Creation: Inside Azilen’s GenAI & Data Engineering Workshop Series

Every breakthrough begins with curiosity, the kind that asks “what if” and “why not.” At Azilen, that spirit took shape in our GenAI & Data Engineering Workshop.

The idea was simple: give our engineers the space to explore ideas that push the boundaries of how intelligence and data can work together.

No fixed outcomes. No predefined paths. Just exploration guided by a shared belief that real innovation starts when curiosity meets craftsmanship.

Across multiple teams, we saw raw ideas evolve into defined solutions – some conceptual, some functional, all driven by purpose. Each team identified a real-world challenge, shaped it into a definition, and translated it into a working direction for how GenAI and data engineering can intersect to solve enterprise problems.

This blog series captures those definitions – the thinking behind them, the problems they address, and the possibilities they unlock.

It’s a window into how curiosity, when guided by engineering intent, becomes creation.

Reimagining Industrial Safety with GenAI and Data Engineering

Industrial safety depends on timely action and accurate information.

Our team set out to make both effortless through an AI-powered Industrial Safety Assistant that supports on-ground technicians with instant guidance, fault insights, and preventive measures, all powered by GenAI and Data Engineering.

This project combines RAG-based knowledge retrieval, contextual response generation, and real-time operational data – designed to work in fast, high-risk environments where every second matters.

The initiative was driven by a cross-functional team that brought together engineering depth and domain intuition:

→ Dipesh Bhavsar – Technical Project Manager

→ Vipul Makwana – Technical Leader

→ Palak Patel – Scrum Master

→ Ghanshyam Savaliya – Senior Software Engineer II

→ Alpesh Padariya – Senior Software Engineer II

→ Darshit Gandhi – Senior Software Engineer II

→ Swati Babariya – Senior Software Engineer I

→ Rency Ruparelia – Software Engineer

→ Bhavya Thakkar – Associate Software Engineer

→ Jil Patel – Associate Software Engineer

Together, they showcased how industrial safety can evolve from compliance to intelligence, where every query, alert, and recommendation is powered by data awareness and AI reasoning.

Gen AI & Data Engineering Workshop

The Industrial Safety Crisis

The numbers that shaped this definition speak for themselves:

2.8 million+ annual industrial injuries worldwide

5190 daily workplace fatalities

$170 billion global economic impact

60% of these accidents are preventable

When the team looked deeper, the manufacturing industry alone accounted for 403,000 injuries per year, construction reported 275,000, and mining showed an 87% higher fatality rate than average. The average cost per incident? $42,000.

For a domain where every second counts, delayed information or unclear instructions can directly cost human lives.

Root Causes Behind the Numbers

The team began by mapping the hidden factors behind these numbers. Their analysis uncovered four core causes:

1️⃣ Information Access: 45% of safety or operational guidance arrives too late.

2️⃣ Language Barriers: 67% of blue-collar workers face comprehension challenges with safety documentation.

3️⃣ Training Gaps: 38% report inadequate preparation for machine handling and risk response.

4️⃣ Emergency Response: 23% delays occur during crisis handling due to a lack of contextual direction.

This diagnosis helped the team define what their RAG system must solve – the gap between critical information and the worker who needs it, right when they need it.

The Opportunity of AI in Industrial Safety

Industrial safety may look like a compliance function, but the data shows a far larger opportunity.

The industrial safety market stands at $45 billion, growing at a 12% CAGR (2024–2030), with 78% of processes still manual.

In addition, the ROI potential of augmenting human capability with intelligent systems could reach 340% within just two years.

For enterprises, this is a chance to turn safety from a cost center into a measurable efficiency and productivity driver.

With the problem and potential clear, it’s time to step inside the build – the architecture, the logic, and the choices behind it.

Technical Specifications of GenAI-Powered Industrial Safety Assistant

The team engineered every layer to achieve real-time responsiveness and contextual accuracy in high-risk environments.

HTML Table Generator
Component
Technology
Specifications
Performance
LLM Engine OpenAI GPT-4o 128K context, 4096 output tokens 2s response time
Embeddings text-embedding-3-small 1536 dimensions, 8191 tokens 95% accuracy rate
Vector Store Vectra DOT-Product, Cosine similarity 100ms retrieval
Chunking Strategy Recursive + Symmetric 100 tokens, 200 overlap 98% content preservation
Backend API NestJS Async workers, WebSocket support 1000+ req/sec

System Flow: From Manuals to Real-Time Machine Intelligence

Every industrial machine comes with a detailed manual, often buried in PDFs that are hard to navigate when a worker is already hands-on with the equipment.

The team addressed this friction by designing a seamless flow that converts static documentation into dynamic, conversational intelligence.

Here’s how the system works:

AI-Driven-RAG-System-for-Industrial-Worker-Support

1️⃣  Document Ingestion: Admin uploads the machine manual (PDF). Text and images are extracted automatically.

2️⃣ Data Processing: Extracted content goes through chunking and detailing to preserve context. Each chunk is converted into embeddings and stored in VectraDB.

3️⃣ User Query Embedding: The worker’s query (e.g., “What voltage should I use for this AC?”) is also converted into an embedding vector.

4️⃣ Similarity Search: The system searches top-k matching chunks from VectraDB using cosine similarity for contextual precision.

5️⃣ LLM Response Generation: Matched chunks and user query are combined in a prompt and processed through GPT-4o, generating an accurate, guided response.

6️⃣ Recursive Function Calling: For incomplete replies (like “refer to appendix”), the system uses recursive function calls until a full, coherent answer is ready, including both text and relevant images.

AI Agents
Want to Build Solution Using RAG, LLMs & Multimodal Intelligence?
Explore our 👇

Effective Techniques to Split Documents

Splitting documents the right way makes a direct impact on retrieval quality. A poorly split document can lose context mid-sentence, while a well-structured one helps the LLM understand exactly what the user is asking for.

Here’s how the team experimented with different techniques to find the most effective strategy.

1. Fixed Length

The baseline method, splits the text every fixed number of characters, say 1,000 characters.

This approach works fast but can break context – a sentence about “valve pressure” might get divided halfway, reducing retrieval accuracy.

2. Recursive

Here, the document is split by logical block units > page, paragraph, sentence, and word.

This ensures every vector represents a complete thought or instruction, ideal for technical manuals where each paragraph carries distinct operational meaning.

3. Semantic Meaning-Based

This technique focuses on conceptual continuity rather than length.

Imagine a window of 200 words: the model embeds it, then slides to the next 200 words, compares both embeddings, and checks if they point in the same semantic direction.

→ If both talk about “electrical calibration,” they merge into one chunk.

→ If the next part shifts to “cooling system safety,” the model starts a new segment.

This keeps related ideas together and avoids mixing unrelated topics within a single vector.

4. Hybrid Strategy

The best results came from combining both semantic and recursive methods.

This hybrid method preserved context, completeness, and performance – a balance crucial for high-precision RAG applications like this safety assistant.

Word Embeddings: The Language Behind AI

Embeddings turn words into numbers so machines can understand meaning, context, and relationships. Words with similar meanings appear closer together in this vector space.

HTML Table Generator
Word
Vector
King (0.8, 0.6)
Queen (0.82, 0.58)
Man (0.4, 0.3)
Woman (0.42, 0.28)

Here, “man” and “woman” are close to each other, and so are “king” and “queen.”

Now imagine this: king – man + woman = queen.

This simple equation shows how vectors capture meaning – king minus man gives the idea of royalty, and adding a woman brings female royalty, leading to queen.

In our system, embeddings work the same way – they help connect machine parts, safety actions, and user queries with the right answers, based on meaning rather than just keywords.

From Query to Vector: Retrieving Semantic Representations

Every time a user asks a question, the system turns that query into a vector using the same embedding model applied to the documents.

The next step is finding which document vectors lie closest to this query vector, meaning which pieces of information share the same context.

The Basic Idea: Distance Between Vectors

The simplest way to measure closeness is through Euclidean Distance, the straight-line distance between two points in space.

Euclidean

If two vectors are close together, their meanings are likely related.

But here’s the challenge: in real-world language models, vector magnitudes can vary a lot.

Two sentences may have similar meanings but different lengths or structures, which can distort Euclidean results.

Why Cosine Similarity Works Better

To handle this, we use Cosine Similarity. Instead of measuring distance, it measures the angle between two vectors.

Cosine Similarity

When two vectors point in the same direction (regardless of their size), it means they share the same meaning.

In simple terms:

Cosine = 1 → Same meaning

Cosine = 0 → Unrelated

Cosine = -1 → Opposite meaning

This method focuses on semantic direction, which makes it ideal for comparing text embeddings.

Role of DOT Product

The dot product is a faster mathematical way to achieve the same comparison.

DOT Product

It multiplies corresponding vector values to check alignment. If two vectors point in the same direction, the product is higher.

That’s why in real-time retrieval, cosine similarity and dot product work hand-in-hand for both accuracy and performance.

Engineering with Curiosity and Intent

Every layer of this project, from chunking logic to embedding precision, reflected a mindset of curiosity. The team treated each component as an experiment, learning how language, context, and retrieval interplay to create intelligence that feels natural.

Choosing NestJS over Python wasn’t just a tech decision; it was a statement that GenAI belongs wherever great engineering happens. The team redefined conventions and proved that performance, scalability, and AI innovation can live in the same stack.

This journey captured what we value most at Azilen – the pursuit of engineered intelligence that feels purposeful, reliable, and human-aware.

Generative AI
Ready to Build Your Own GenAI Assistant?
Explore our 👇

Top FAQs on GenAI-Powered Industrial Safety Assistant

1. Why was this project important to build?

Because industrial safety is a massive challenge. Every day, thousands of accidents happen due to delayed guidance or missing information. This system makes that critical information instantly available to the worker, exactly when they need it.

2. How does the RAG system actually work here?

Admins upload machine manuals into the system. The system breaks down the content, extracts text and images, creates embeddings, and stores them. When a worker asks something, the system finds the closest match through similarity search and uses LLM to generate a precise, context-aware response.

3. How does this solution help improve workplace safety?

It reduces delays in getting machine-related guidance, removes language barriers, and helps workers handle emergency responses faster. Essentially, it turns every worker into a better-informed operator, cutting down risk and downtime.

4. Can this system adapt to different industries?

Yes, absolutely. Whether it’s manufacturing, mining, oil & gas, or construction – as long as there are documents and machines involved, this system can be trained and deployed for that context.

5. What makes this project unique compared to traditional AI chatbots?

Traditional bots rely on static scripts or general-purpose data. This one is built specifically for industrial use – it learns from company manuals, supports multimodal data (text + images), and delivers verified information from real documents.

Azilen Technologies
Team Azilen

Azilen Technologies is an Enterprise AI development company . The company collaborates with organizations to propel their AI development journey from idea to implementation and all the way to AI success. From data & AI to Generative AI & Agentic AI, and MLOps, Azilen engages with companies to build a competitive AI advantage with the right mix of technology skills, knowledge, and experience.  

Related Insights

GPT Mode
AziGPT - Azilen’s
Custom GPT Assistant.
Instant Answers. Smart Summaries.