Computer Vision – Introduction
Definition
Field of AI enabling computers to understand and interpret images & videos like
humans.
Goals: Recognition, measurement, search, interaction with visual data.
Definitions:
o Ballard & Brown (1982) – Build explicit, meaningful descriptions of
physical objects from images.
o Trucco & Verri (1998) – Compute 3D properties from one or more images.
o Sockman & Shapiro (2001) – Make useful decisions about objects/scenes
from images.
Scope of Computer Vision
1. Perception & Interpretation – Identify objects, people, scenes, activities.
2. Measurement – Extract 3D world properties.
3. Search & Organization – Manage and retrieve visual content.
Human vs. Computer Vision
Computers: Good at simple, repetitive tasks (with large data).
Humans: Superior in complex, context-based tasks.
Trend: AI improving rapidly, changing what is “hard”.
Why is Vision Hard?
1. Incomplete Information – Unknown camera settings, lighting, object shape.
2. Inverse Problem – Need to infer real-world scene from 2D data.
3. High-Dimensional Data – Computationally expensive.
4. No complete model of human visual system.
5. Context dependency – Deciding what is important in an image.
Importance
Visual cortex occupies ~50% of Macaque brain; ~1/3 of human brain dedicated to
vision.
Billions of images/videos captured daily (Flickr, Facebook, Instagram, YouTube).
Applications
Consumer Electronics: QR codes, panorama, night mode, FaceID.
Healthcare: Medical imaging, diagnosis, surgery planning.
Security: Biometric authentication, CCTV monitoring.
Industry: Robotics, automated inspection, OCR, number plate recognition.
Transportation: Self-driving cars, driver monitoring.
Space: Mars rovers, telescopes.
Entertainment: AR/VR, film VFX, virtual sports replay.
Retail: Cashier-less checkout, theft detection.
Creative: Image/video generation (GANs), photo editing.
Key Research Areas
Reconstruction: 3D/4D scene recovery from photos or depth cameras.
Recognition: Object detection, face recognition, classification.
Generation: Image-to-image translation, GANs.
Visual Search: Content-based retrieval.
Why Study Computer Vision?
Huge data availability from internet and sensors.
Broad application across multiple domains.
Critical in automation, AI, and human-computer interaction.
If you want, I can also make a memory-friendly mind map or one-page cheat sheet from
this so you can revise in 5 minutes before your exam. That will make it easier to recall.