Computer Vision using LLMs

Akhil Jain
4 min readSep 26, 2024

Last week, I had the opportunity to build a remarkable Computer Vision project that could perform a variety of tasks without the need to worry about the underlying machine learning model, training, or testing. This is a stark contrast to the days before the rise of Large Language Models (LLMs), when such a project would have taken days or even weeks to develop.

The core of this solution lies in its ability to extract images from the video feed of the factory floor and ingest them into Amazon S3. My custom-built python code on AWS Lambda then taps into the power of Anthropic’s Claude-3.5-Sonnet model (hosted on the AWS Bedrock platform), to analyze these images. Using a carefully crafted prompt, the model is able to determine whether the worker’s attire is in compliance with the established safety equipments, returning an output of 0 (non-compliance) or 1 (compliance).

By proactively identifying potential safety risks, organizations take immediate action to address them, reducing the likelihood of costly lawsuits, increased insurance premiums and reputational damage.

Let’s deep dive into the technical details.

Technical Details

The image upload on S3 triggers the Lambda function which detects if the workers are wearing safety equipments, specifically, Hard Hats and Safety Vests.

--

--

Akhil Jain
Akhil Jain

Written by Akhil Jain

Sr. Solutions Architect (Big Data, IoT, ML) at Amazon Web Services | https://www.linkedin.com/in/akhiljain01/

No responses yet