8 minute read
Feb 10, 2025
This post details how to set up a robust and scalable infrastructure for Marqo AI applications using Kubernetes, Vespa, and a development proxy server. This setup will allow you to develop, test, and deploy Marqo AI applications efficiently.
1. Why RAG (Retrieval-Augmented Generation)?
RAG is a crucial technique for enhancing the capabilities of AI models, particularly in scenarios where access to up-to-date and specific information is vital.
Problem: Large language models (LLMs) are trained on vast datasets, but their knowledge is static and can become outdated. They also lack specific knowledge about your internal data or domain.
Solution: RAG addresses this by combining the power of LLMs with a retrieval mechanism. Before generating a response, the system retrieves relevant information from a knowledge base (like a vector database). This ensures that the AI model has the most relevant and current information to generate accurate and contextually appropriate responses.
Benefits:
Improved Accuracy: RAG models can provide more accurate answers by grounding their responses in specific data.
Reduced Hallucinations: By using retrieved data, the model is less likely to generate fabricated information.
Up-to-date Information: RAG allows the AI to access the latest information, making it more reliable for dynamic environments.
Customizable Knowledge: You can tailor the knowledge base to your specific domain, ensuring that the AI understands your specific use case.
Reduced Training Costs: You can avoid retraining the entire model to include new information.

2. Benefits of Marqo AI
Marqo AI is a powerful framework designed to simplify the development of AI applications. It offers several key benefits:
Ease of Use: Marqo AI is designed to be user-friendly, allowing developers to quickly build and deploy AI-powered applications without extensive expertise in machine learning.
Integration with Vector Databases: It seamlessly integrates with vector databases like Vespa, enabling efficient indexing and retrieval of information for RAG applications.
Scalability: Marqo AI is built with scalability in mind, allowing you to easily handle increasing data volumes and user traffic.
Flexibility: It supports various AI tasks, such as semantic search, recommendation systems, and more.
Customization: It allows you to customize the AI models and algorithms to fit your specific needs.
Developer-Friendly: It provides an extensive set of tools and libraries, making it easy for developers to build and deploy AI applications.
Focus on Developer Experience: Marqo AI is designed to be intuitive and easy to use, allowing developers to focus on building their applications rather than dealing with complex configurations.
Part of Forgemaster AI: Marqo is a key component in our knowledge assistant for developers, which aims to streamline onboarding, reduce costs, and enhance productivity.
3. How to Get Started with Minikube
All the code and components you can find at https://github.com/ForgemasterAI/marqo-navigator
👩🍳 NB! If you decide to follow along you need to clone the repository
Minikube is essential for setting up a local Kubernetes environment for development and testing.
Installation:
This downloads and installs the Minikube binary.
Starting Minikube:
The minikube start
command initializes a single-node Kubernetes cluster. The optional command with --gpus all
and --cpus=8
enables GPU support and allocates 8 CPUs for AI workloads.
Verification: Use
kubectl get nodes
to verify that the cluster is running.Why Minikube? Minikube allows you to run a full Kubernetes cluster on your local machine, which is ideal for development and testing of your infrastructure and applications before deploying to a larger cluster.
4. Installation and Content
This section details installing Marqo AI, Vespa, and setting up a development proxy.
Installing MarqoAI and Vespa:
This script automates the deployment of MarqoAI and Vespa on the Kubernetes cluster. It likely contains Kubernetes manifests for deploying these as pods.
Setting up Development Proxy Server (Optional):
These commands install Node.js, pnpm
, dependencies, DNS utilities, and deploy the Marqo application.
Content Overview:
The
k8s-vespa.sh
script likely deploys Vespa as a stateful set, which handles the vector database.The proxy server setup includes Node.js and
pnpm
for managing a development server, facilitating debugging and local development.The Marqo application is deployed to the Vespa server using a
curl
command.
5. Conclusion
This setup provides a solid foundation for developing and deploying scalable AI applications using Marqo AI. By leveraging Kubernetes, Vespa, and a development proxy, you can efficiently manage your infrastructure and accelerate your development process.
Key Takeaways:
Kubernetes with Minikube enables local development and testing.
Vespa provides a scalable vector database for Marqo AI.
RAG is crucial for accurate and up-to-date AI responses.
Marqo AI simplifies the development of AI applications.
6. Next Steps: Scaling More Content Nodes
To handle larger datasets and more traffic, you’ll need to scale your infrastructure. Here are the next steps:
Horizontal Scaling of Vespa:
Increase the number of Vespa nodes to handle more data and queries.
Load Balancing:
Configure a load balancer to distribute traffic across multiple Vespa nodes.
Monitoring and Logging:
Implement a monitoring system (e.g., Signoz ) to track the performance of your infrastructure.
Set up logging to diagnose issues effectively.
Database Optimization:
Optimize Vespa configurations and indexing strategies for performance.
Infrastructure as Code (IaC):
Use tools like Pulumi to manage your infrastructure, ensuring reproducibility and version control.
Content Ingestion:
Implement a robust content ingestion pipeline to handle the continuous addition of new data.
By following these steps, you can build a scalable and resilient infrastructure for your Marqo AI applications.
7. Whats next? Helm Chart Development for Marqo AI
The next instalment will focus on creating a Helm chart based on the infrastructure setup discussed in Part 1. This will facilitate easier deployment and scaling of Marqo AI applications on Kubernetes, streamlining the configuration and management processes.
Key Topics in Part 2:
Introduction to Helm and its benefits for Kubernetes deployments.
Step-by-step guide to building a Helm chart tailored for Marqo AI and Vespa.
Best practices for using Helm to manage application lifecycles.
Don’t miss out on this essential continuation! Be sure to check out updates on our projects, including contributions to the Marqo Navigator repository on GitHub. Join the community and stay informed about the latest developments!