Azure vs. AWS vs. Local GPU: Which RAG Architecture Fits Your Business?

In my 10 years working in the tech industry, I have seen waves of trends, but few have moved as fast as Generative AI. The numbers back this up: McKinsey reports that by 2025, 67% of telecom respondents expect to use GenAI for more than 30% of their daily tasks. 

 

But statistics are one thing; production is another. 

 

Today, my colleagues and I are seeing a massive rise in RAG (Retrieval-Augmented Generation) solutions. We are moving beyond simple chatbots to building complex systems that “talk” to your internal documentation. Based on the projects I’ve delivered for our customers, from retail to finance, here is a breakdown of the architectures we actually use, the problems we’ve solved, and my honest advice on when not to use GenAI. 

Why We Are Betting on RAG

When I talk to clients, I explain that a standard chatbot is like a brilliant improviser who sometimes makes things up. RAG changes that. It forces the bot to look at your specific data before answering. 

 

We have built RAG solutions for everything from internal “Q&A over documentation” tools to customer-facing retail bots where users ask about their specific order status. In my experience, this approach is the only way to guarantee accuracy, domain adaptability, and data privacy. 

The Three Architectures We Use

Over the last year, I’ve overseen deployments across three different environments. Here is how they compare in the real world: 

1. The On-Premise / GPU Approach (Llama & Hugging Face)

We built a chatbot in Levi9 using purely internal infrastructure. This was for a customer who needed absolute control. 

 

Follow these principles: 

2. The Azure Cloud Approach

For one of our retail customers, we deployed a solution using the Microsoft stack. 

3. The AWS Approach (Bedrock & Guardrails)

I have also worked with Amazon Bedrock, and there is one feature here that I find particularly useful for corporate clients:Guardrails. 

When I Don't Use GenAI

This might sound counter-intuitive coming from an AI proponent, but sometimes GenAI is the wrong tool. 

 

I am currently working on a project involving sensitive financial data. In this specific case, we made the hard decision to not use Generative AI. Why? Because we cannot risk hallucinations. When dealing with bank details and financial reporting, “mostly accurate” isn’t good enough. 

 

In these scenarios, I always pivot back to classical AI and basic statistics. We need models that are mathematically interpretable and verifiable. If you are in a high-risk industry, my advice is to prioritize interpretability over the “cool factor” of GenAI. 

Challenges & My Recommendations

If you are ready to build, here are a few hurdles I’ve faced and how I suggest you handle them: 

The Playground is Open: It’s Time to Experiment

If there is one thing my decade in tech has taught me, it’s that you cannot learn this technology from slides alone. We are in a unique moment where the barrier to entry is incredibly low. Whether you are an individual developer or a CTO, you have access to the same powerful tools we use—Amazon Bedrock, Azure OpenAI, and the endless library of models on Hugging Face. 

 

My advice? Don’t wait for the “perfect” use case. 

 

Go explore. Spin up a sandbox environment. Pull a model from Hugging Face and try to break it. See what happens when you feed it your own data. Test the limits of what these architectures can do. The landscape is shifting under our feet, and the only way to stay relevant is to be the one testing the ground. 

 

We are just scratching the surface of what RAG can do. I’ve shared my map of the territory—now it’s time for you to start walking the path. 

In this article:
VIZ-articol_ai_2-photoaidcom-cropped.png
Written by:
Ana-Maria Cehan,
Data Tech Lead
Levi9
Published:
4 February 2026

Related posts

Amanda Andriesanu

Delivery Director

Amanda Andriesanu brings over 20 years of experience in software service delivery across diverse industries, including telecom, banking, automotive, and education. With a proven track record of building strong client relationships and leading technology teams in complex business environments, Amanda excels at finding clarity in complexity. She strives to balance just the right amount of ambiguity with predictability, ensuring people and actions align with a meaningful purpose.

Anca Dana Gafiteanu

Delivery Centre Director

Anca Gafiteanu is the Delivery Centre Director at Levi9, overseeing operations in Romania.

With over 15 years of experience in IT and leadership, Anca specializes in managing large-scale delivery teams, driving operational excellence, and fostering collaborative cultures. Anca’s commitment to innovation and her passion for developing talent make her our invaluable leader at Levi9.