Background Removal and Replacement Using Vision AI and Vision Engineering
100%
Increase in Services Offered50%
Increase in Processing Capacity90%
Daily Processing Cost SavingsImage-Video Background Removal with Vision AI & Engineering
Click here to downloadCustomer Overview
Our client, a US-based niche product photography service provider in the automobile industry, wanted to grow their business using digitization. They provided an application so auto dealers could take professional-grade photos of cars' interiors and exteriors using 37 compositions or camera angles. Our client then polished these photos using both automated and manual processes to provide 4K high-definition outcomes to their customers.
Project Overview
The client used a third-party background removal and replacement tool and paid high costs for over 70,000 images daily. They didn’t use this tool for videos because it needed processing all frames, increasing costs and making staying competitive difficult. They wanted to reduce dependency and build Intellectual Property (IP) by developing a better-performing background removal and replacement tool to scale photo and video processing.
Challenges
Building a superior background removal solution with 4K resolution results, surpassing a leading tool, while meeting tight deadlines & budget constraints.
- The solution must provide high-quality outcomes better than the third-party tool our client used and help deliver 4K high-definition results for all 37 compositions.
- It should perform fine-grained background removal for car components’ narrow edges, imagery behind car’s windows and windshields, etc.
- It must generate the car’s drop shadow and manage the visibility (transparency level) of the new background as seen through the car’s windows.
- Building a custom model required labeling a massive dataset of 37 car compositions, i.e., around 20,000 4K images, and training presented a significant challenge—more importantly, achieving this within a tight deadline.
- We must research and evaluate different foundation models to identify the most suitable AI model we can utilize to build this solution.
- We must choose between two approaches: utilizing a base AI model that requires full training or a pre-trained model that needs fine-tuning.
- The solution must integrate with our client’s existing infrastructure without major code changes.
Solution
Using the latest vision model, vision engineering, annotation & high-capacity GPUs, we achieved precise background removal & replacement.
- Using sample data of 4K resolution images, we trained and evaluated the performance of both open-source foundation models: the proven U2Net and the latest BiRefNet. Based on the results, we used the BiRefNet model.
- We chose to fine-tune a general pre-trained model instead of fully training a base model, considering the need for a smaller dataset and lower computational costs, while aiming for better and faster results.
- We used 20,000 4K resolution images from 37 compositions to train the AI model. To annotate all images in under 2 weeks, we used the CVAT.ai annotation tool and a team of 10 human annotators.
- We simultaneously rented five A100 GPUs with 80 GB VRAM and 12.6 CUDA capabilities from VAST.ai to perform distributed training of our AI model.
- For accurate background removal & replacement of narrow car interior edges & drop shadow effect creation, we used the OpenCV Vision Engineering Toolkit. Adjusted opacity levels for precise window transparency.
- We built a scalable REST API similar to the one used by the third-party tool our client previously utilized, facilitating integration with minimal code.
Benefits
- Developing an in-house solution allowed the client to create valuable Intellectual Property (IP), enhancing their competitive advantage.
- It lowered operational costs by eliminating per-image fees & scaled image processing without increasing expenses.
- The solution’s versatility supported photo and video processing, expanding the client’s service offerings to auto dealers.
- It delivers superior image quality, significantly improving outputs and reducing the burden on human editors for extensive editing.
Technology
- BiRefNet
- OpenCV
- Python
- Hugging Face
- FastAPI
- Vast.ai
- CVAT.ai
Industry
- Automobile/Automotive
Conclusion
Developing an in-house solution enhanced the client’s competitive advantage and reduced operational costs by eliminating per-image fees. The solution supported photo and video processing, expanded service offerings, and delivered superior image quality while lowering the editing workload for human editors.