Classify images in real-time using labels
Convert spoken words into text
InsectSAM + GroundingDINO Inference