Blog | Aladintech Learning Resources

Template Report

December 3, 2025 · 2 min read

1. Executive Summary

Objective: Briefly explain why the research was conducted (e.g., solve a problem, improve efficiency).
Key Findings: Summarize the major conclusions and recommendations.
Impact: Highlight the potential benefits of the new technology.

2. Introduction

Background: Describe the current situation, challenges, or limitations that prompted the research.
Purpose: State the purpose of the research (e.g., evaluate feasibility, compare solutions).
Scope: Define the boundaries of the study (e.g., specific use cases, teams, or systems).

3. Research Methodology

Approach: Describe the methods used to gather and evaluate information (e.g., testing, surveys, benchmarking).
Criteria: List the evaluation criteria (e.g., performance, scalability, ease of use, cost).
Tools/Resources: Mention any tools, datasets, or environments used.

4. Technology Overview

Description: Provide a high-level overview of the technology.
Features: Highlight key features or innovations.
Market Adoption: Briefly mention how widely the technology is adopted or supported.

5. Analysis and Findings

Testing and Experiments: Document the tests or prototypes you built and the results.
Comparison with Alternatives: Compare the new technology with current solutions or competitors.
Strengths and Weaknesses: List the pros and cons of the technology based on your findings.

6. Use Case Applicability

Potential Use Cases: Identify specific scenarios where the technology could be applied.
Limitations: Highlight areas where the technology may not be suitable.
Integration Feasibility: Discuss how easily the technology can be integrated into existing systems.

7. Cost-Benefit Analysis

Implementation Costs: Estimate costs (e.g., licensing, training, migration).
Return on Investment (ROI): Discuss potential savings or revenue increases.
Risk Assessment: Highlight risks associated with adoption.

8. Recommendations

Adoption Plan: Recommend whether or not to proceed and provide an implementation timeline.
Training and Support Needs: Outline any training or support required.
Further Research: Suggest additional areas for exploration if needed.

9. Conclusion

Summary of Findings: Recap the most important insights.
Final Recommendation: Reiterate your stance on adopting the technology.

10. Appendices

References: List all sources used during the research.
Data and Charts: Include any additional data, graphs, or tables supporting your analysis.
Technical Details: Provide detailed technical notes or configurations.

Technical Report: NVIDIA H100 GPU Floating Point Capabilities and INT4/8 Precision Impact

May 19, 2025 · 5 min read

1. Executive Summary

Objective: To evaluate the performance implications and trade-offs of switching from floating point to INT4/8 bit precision modes on NVIDIA H100 GPUs for AI workloads.
Key Findings: Switching to INT4/8 precision can deliver 2-4x improvements in computational throughput and significant memory efficiency gains with manageable accuracy trade-offs when implemented correctly.
Impact: Switching to INT4/8 precision can enable larger model deployment, lower inference latency, higher throughput(good data), while maintaining acceptable accuracy for many AI applications.

2. Introduction

Background: A node of H100 with 8GPU, each have 80GB VRAM, will give us 640GB. This is not enough to run a full R1, which needs 400GB+ aside for KV cache in FP8 and 671GB to load weights in FP8. We can sconclude that a full deepseek-R1 runs in production on 16 H100 GPUs (2 nodes). To run this on a single H100, alternatives strategies must be considered. As AI models grow increasingly larger, GPU memory and computational demands present significant challenges for efficient deployment and operation. Traditional floating-point precision may be unnecessarily resource-intensive for many inferences workloads
Purpose: To assess the benefits of adopting lower precision INT4/8 computation on H100 GPUs compared to traditional floating point formats.

3. Research Methodology

Approach: Technical specification analysis, performance benchmarking, and review of published research on quantization techniques for the H100 architecture.
Tools/Resources: NVIDIA H100 technical documentation, PyTorch/TensorFlow quantization frameworks, benchmark datasets for accuracy validation, and TensorRT optimization suite.

4. Technology Overview

Description: The NVIDIA H100 (Hopper architecture) GPU features fourth-generation Tensor Cores and a new Transformer Engine that support multiple precision formats including FP64, FP32, TF32, FP16, FP8, INT8, and INT4.
Features:
- Transformer Engine for adaptive precision management
- Hardware-accelerated quantization/dequantization operations
Market Adoption: The H100 represents NVIDIA's flagship data center GPU with growing adoption across major cloud providers and AI research institutions for large-scale AI training and deployment.

5. Analysis and Findings

Comparison with Alternatives:

Precision Format	Throughput	Memory Usage	Accuracy Impact	Use Case Suitability
FP32/TF32	Baseline	High	None	Training, Scientific
FP16	Medium	Medium	Minimal	Training, Inference
FP8	High	Low	Low-Medium	Training, Inference
INT8	Very High	Very Low	Medium	Primarily Inference
INT4	Ultra High	Ultra Low	High	Inference Only

Strengths and Weaknesses:
- Strengths:
  - Significantly increased computational throughput (2-4x), INT4 Precision can bring an additional 59% SpeedUp Compared to INT8
  - Reduced a network's memory footprint and conserve memory bandwidth enabling larger models
  - Lower latency for inference workloads
  - Hardware-accelerated quantization support
- Weaknesses:
  - Reduced numerical precision and range
  - Potential accuracy degradation
  - Implementation complexity requiring specialized expertise
  - Not suitable for all model architectures or operations

6. Use Case Applicability

Limitations:
- May not be suitable for models with high numerical sensitivity
Integration Feasibility:
- Well-supported through NVIDIA's software stack (TensorRT, CUDA)
- PyTorch/TensorFlow integration available through quantization libraries
- Requires model-specific calibration and validation
- May require architecture-specific modifications for optimal results

7. Cost-Benefit Analysis

Implementation Costs:
- Engineering time for model quantization and validation
- Potential accuracy recovery work through quantization-aware training
- Testing and qualification across various inputs
Return on Investment (ROI):
- 2-4x increase in inference throughput per GPU
- Proportional reduction in infrastructure costs
- Ability to deploy models 2-4x larger within the same memory constraints
Risk Assessment:
- Accuracy degradation may impact user experience
- Maintenance complexity with mixed precision workflows

8. Recommendations

Adoption Plan:
1. Initial Phase: Identify candidate models for INT8/4 conversion based on throughput needs and accuracy tolerance
2. Testing Phase: Implement post-training quantization and validate accuracy against benchmarks
3. Optimization Phase: Apply quantization-aware training where necessary
4. Deployment Phase: Gradual rollout with monitoring
Further Research:
- Exploration of hybrid precision approaches for model-specific optimization
- Evaluation of emerging quantization techniques
- Comparing TensorRT-LLM and Ollama
Possible Workflow:
- Download model from Hugging Face → Prefer models with GPTQ or QLoRA support (4-bit).
- Load with transformers + AutoModelForCausalLM → Set quantization configs.
- (Optional) Quantize with GPTQ → Skip if model is already quantized; else run GPTQ tooling.
- Fine-tune with QLoRA → Use peft, bitsandbytes, transformers, accelerate (fit LLaMA-2 13B easily on H100 in 4-bit).
- Evaluate and deploy locally → Wrap with FastAPI, vLLM, or Hugging Face's text-generation-inference
Untrained Model:

9. Conclusion

Summary of Findings: INT4/8 precision modes on H100 GPUs offer substantial performance and efficiency benefits with manageable accuracy trade-offs for suitable workloads.
Final Recommendation: Adopt INT8 precision broadly for inference workloads with selective INT4 usage for less sensitive components. Maintain higher precision for training and numerically sensitive operations.

10. Appendices

References:
- NVIDIA H100 Technical Documentation
- "Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference"
- "ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers" (Yao et al.)
- "GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers" (Frantar et al.)

Đánh giá tính hiệu quả LLMs Opensource Model, Sizing cần thiết, Định hướng RND.

May 7, 2025 · 5 min read

Đức Thành

Software Engineer @ Aladintech

1. Executive Summary

Objective

Evaluate the effectiveness of DeepSeek models, including:
- Reliability
- Response time
- Model suitability per agent type
Optimize cost and resource sizing
Assess usage feasibility

Key Findings

DeepSeek models provide strong performance when used with proper configuration and alignment to task type.
High-end models (70B+) require expensive infrastructure and should be carefully evaluated for ROI.
Smaller distilled models can achieve practical efficiency and reliability when tuned correctly.

Impact

Improved agent reliability and response time
Flexible deployment options (local/private inference)
Cost-effective sizing strategies and modular flow architecture

2. Introduction

Background

Current Needs

Need for private deployable models
Suboptimal performance on CPU
Lack of full testing across models and tasks

Challenges

Too many model options to benchmark exhaustively
Risks include:
- Slow response times
- Inaccurate outputs (especially with 7B models)
Difficulty in measuring efficiency

Understanding Threats

Proposed Solution

Benchmark and test candidate models
Use model classification and evaluation metrics

Purpose

Create a measurable framework for evaluating models
Classify model capabilities, usage limits, and costs
Select 3 high-reliability agents for further testing
Explore optimization strategies for agents
Propose reliability enhancement techniques
Reference:
- Secure data access and privacy
- Improve user experience

Scope

Conduct a controlled evaluation (no production deployment yet)
Provide a test suite and benchmarking guidance

3. Research Methodology

Approach

Usage & Environment

Environment: Ollama, SGLang (ref)
Tools: Ollama, Langchain, n8n (prebuilt SDK available)

Tested Model Variants

DeepSeek-V3 (General-purpose)
DeepSeek-R1 (Reasoning tasks)
DeepSeek-VL2 (Image+Text)
Janus (Multi-modal)
DeepSeek-Coder (Code-focused)

Evaluation Metrics

Total duration
Load duration
Prompt evaluation (count, duration, rate)
Overall evaluation (count, duration, rate)

Hardware Sizing Reference

Usage: 8hrs per day, on demand:

Usage for research, application oriented.
For training model, usage when monthly rent.

Model Parameters (Billions)	Params (B)	VRAM (GB)	Recommended GPU	CPU Recommendation	RAM (GB)	Price
700B	671	1342	16x NVIDIA A100 80GB	AMD EPYC 9654 / Intel Xeon Platinum 8490H	2048+	2500$
14B	14	6.5	RTX 3080 10GB	Ryzen 9 7900X / i9-13900K	64+	N/A
32B	32	14.9	1 x A6000	Threadripper 7980X / Xeon W9-3495X	128+	N/A
70B	70	32.7	1 x H100	EPYC 9654 / Xeon Platinum 8490H	256+	1200$

4. Analysis and Findings

Benchmark Environment

Configs Used

Dual RTX 5070Ti: 32GB VRAM, 64GB RAM, i5-14600KF (~$0.5/hr)
H100 NVL (single): 94GB VRAM, 100GB RAM, EPYC 9354 (~$2.5/hr)
Dual H100 NVL (~$5/hr)

Model Benchmarks

llama3.1 (70B): General-purpose
DeepSeek-R1 (32B, 70B): For reasoning, solution generation
DeepSeek-Coder (33B): For code explanation/suggestions
DeepSeek-LLM (67B): General-purpose

Observations

RTX 5070Ti can handle up to 32B models with tuning; ideal for narrow-scope agents (e.g., coding assistants).
For 70B models, 5070Ti setup is slow (up to 4 mins response).
H100 NVL is optimal for real-time inference with 70B+ models.
Larger models (700B) are currently impractical for cost and infra reasons.

Model Comparison Insights

1x A6000 is sufficient for 32B models with proper prompt tuning.
1x H100 can support 70B models for testing/research.
Models with vision capabilities require pre-processing: PDF → Text → Embedding → Vector DB → Retriever → LLM.
Multi-model workflows (e.g., classification + reasoning) improve accuracy and performance:
```
User → Model 1 (classifier/tuner) → Model 2 (responder) → Output
```

5. Use Case Applicability

Suitable Use Cases

Domain-specific AI agents (e.g., code generation, Q&A bots)
Parallel model inference to boost reliability
On-prem inference for sensitive data handling
Coding support agents

Limitations

GPU-dependent
Lacks native image/PDF input unless extended with external modules
Large model cost constraints

Integration Feasibility

High if using LangChain/n8n SDKs
Moderate effort for on-prem setup (requires infrastructure and monitoring)

6. Cost-Benefit Analysis

Implementation Costs

GPU rental via Vast.ai for prototyping (~$0.5 to $4/hr)
Setup and tuning time
DevOps and monitoring for production use

ROI & Savings

Local inference = no token cost (vs. OpenAI)
Smaller tuned models yield significant savings
Modular flows reduce infrastructure duplication

Risks

Over-investing in large models without maximizing smaller ones
Long response times = poor UX
Lack of model support for some input types (e.g., images)

7. Recommendations

Adoption Plan

Build and tune agents first
Start with 1x H100 or A6000
Adopt flow-based architecture:
Use multiple models for task specialization
Consider unsupervised learning and prompt chaining

Training & Support

Document setup and tuning best practices
Evaluate in-house vs. external support

Further Research

Model tuning for cost-efficiency
Explore hybrid models (reasoning + coding)
Improve model interaction reliability

8. Conclusion

Summary

DeepSeek offers strong performance when deployed with appropriate infrastructure.
Smaller models (14B–32B) provide good results with tuning.
700B+ models are not cost-effective currently - can not use all the powers yet.

Final Recommendation

Use 1x A6000 or H100 for research and mid-sized deployments.
Optimize agent design and build modular flows.
Focus on maximizing potential of 32B models before scaling up.

Tích hợp n8n vào Aladintech.co V2

December 29, 2024 · 3 min read

Đức Thành

Software Engineer @ Aladintech

1. Tổng quan

Bài toán: Tích hợp N8N vào Aladintech version 2.
Usecase: Tự động hoá workflow bằng công cụ N8N.
Tác động: Bằng công cụ quản lý workflow, có thể thiết kế quản lý luồng công việc, tích hợp các công cụ đang bị phân mảnh hiện tại của Aladintech.

2. Tổng quan công cụ

Mô tả: N8n là nền tảng tự động hóa luồng công việc (workflow automation), cho phép tạo các quy trình tự động hóa.
Tính năng: Kéo thả các node, custom node, tích hợp sẵn nhiều API của các công cụ như email, jira, google sheets,...

3. Use-case

Tiềm năng:
- Tự động hoá các luồng công việc:
  - Hiện tại, team đang xây quy trình, bắt đầu bằng module HRMs. Anh Khương đang tham vấn và sẽ gửi lại 3 đầu việc của module HRMs, bao gồm:
    - Quy trình tuyển dụng: Lọc đầu vào, ngân hàng câu hỏi phỏng vấn, ma trận đánh giá kỹ năng
    - Quy trình training: Quy trình training cho các vị trí khác nhau như BE & FE
    - Quy trình đánh giá: Quy trình đánh giá, lộ trình thăng tiến cho nhân viên.
    => Xây dựng cho các quy trình có thể tự động. Các quy trình đều là các bước A => B, ta có thể áp dụng tự động hoá cho các đầu việc không liên quan đê con người. Ví dụ, khi PV ứng viên, tự động tạo list câu hỏi và gửi về email cho ứng viên.
  - Giải quyết các vấn đề tồn đọng hiện tại:
    - Các hệ thống quản lý rời rạc, nhiều khi ảnh hưởng bởi lý do khách quan là khách hàng yêu cầu sử dụng một phần mềm nào đó.
    - Ví dụ:
      - Khách hàng yêu cầu dùng JIRA => JIRA của mình là self-host => Không đồng bộ.
      - Quản lý bằng Google Sheets => Thống kê, đánh giá khó khăn, manh mún.
    Figure 1: Flow xây dựng.
Giới hạn & Vấn đề kỹ thuật:
- Jira hiện tại em chưa call tích hợp được => cần thời gian thêm tích hợp / check API của các self-host app của mình.
- Sẽ cần người thiết kế flow đối với các flow phức tạp (Ví dụ: custom API của một hệ thông khác, một node chưa được hỗ trợ bởi n8n)

4. Test self-host n8n

Một số công cụ tương tự: Zapier, Airflow(Data driven),...
So sánh: N8N free và self-host được
Self-host: http://125.212.192.150:8388/home/workflows

5. Plan tích hợp & triển khai

Triển khai seft-host:
- Đơn giản
- Cần dựng các flow có sẵn cho team.
- Dùng cơ bản là kéo thả, cần một chút training sau khi dựng flow.
Plain triển khai:
- Xây dựng xong quy trình, mô hình hoá thành flow diragram
- Tiến hành xây dựng flow trên N8N

1. Executive Summary​

2. Introduction​

3. Research Methodology​

4. Technology Overview​

5. Analysis and Findings​

6. Use Case Applicability​

7. Cost-Benefit Analysis​

8. Recommendations​

9. Conclusion​

10. Appendices​

1. Executive Summary​

2. Introduction​

3. Research Methodology​

4. Technology Overview​

5. Analysis and Findings​

6. Use Case Applicability​

7. Cost-Benefit Analysis​

8. Recommendations​

​

9. Conclusion​

10. Appendices​

1. Executive Summary​

Objective​

Key Findings​

Impact​

2. Introduction​

Background​

Current Needs​

Challenges​

Proposed Solution​

Purpose​

Scope​

3. Research Methodology​

Approach​

Usage & Environment​

Tested Model Variants​

Evaluation Metrics​

Hardware Sizing Reference​

4. Analysis and Findings​

Benchmark Environment​

Configs Used​

Model Benchmarks​

Observations​

Model Comparison Insights​

5. Use Case Applicability​

Suitable Use Cases​

Limitations​

Integration Feasibility​

6. Cost-Benefit Analysis​

Implementation Costs​

ROI & Savings​

Risks​

7. Recommendations​

Adoption Plan​

Training & Support​

Further Research​

8. Conclusion​

Summary​

Final Recommendation​

1. Tổng quan​

2. Tổng quan công cụ​

3. Use-case​

4. Test self-host n8n​

5. Plan tích hợp & triển khai​

1. Executive Summary

2. Introduction

3. Research Methodology

4. Technology Overview

5. Analysis and Findings

6. Use Case Applicability

7. Cost-Benefit Analysis

8. Recommendations

9. Conclusion

10. Appendices

1. Executive Summary

2. Introduction

3. Research Methodology

4. Technology Overview

5. Analysis and Findings

6. Use Case Applicability

7. Cost-Benefit Analysis

8. Recommendations

9. Conclusion

10. Appendices

1. Executive Summary

Objective

Key Findings

Impact

2. Introduction

Background

Current Needs

Challenges

Proposed Solution

Purpose

Scope

3. Research Methodology

Approach

Usage & Environment

Tested Model Variants

Evaluation Metrics

Hardware Sizing Reference

4. Analysis and Findings

Benchmark Environment

Configs Used

Model Benchmarks

Observations

Model Comparison Insights

5. Use Case Applicability

Suitable Use Cases

Limitations

Integration Feasibility

6. Cost-Benefit Analysis

Implementation Costs

ROI & Savings

Risks

7. Recommendations

Adoption Plan

Training & Support

Further Research

8. Conclusion

Summary

Final Recommendation

1. Tổng quan

2. Tổng quan công cụ

3. Use-case

4. Test self-host n8n

5. Plan tích hợp & triển khai