![](https://static.wixstatic.com/media/3062a4_4f6f442b09a047f98fa75705b35b0b55~mv2.png/v1/fill/w_633,h_356,al_c,q_85,enc_avif,quality_auto/3062a4_4f6f442b09a047f98fa75705b35b0b55~mv2.png)
Computer vision allows computers and systems to find meaningful information from digital images, videos and other visual inputs. It seeks to perform and automate human tasks. It requires a lot of visual data. Microsoft believes computer vision applications use inputs from sensing devices, AI, machine learning & deep learning to imitate human vision. These applications may recognise patterns in the visual data and use them to determine the content of other images.
Computer vision is a multidisciplinary field that we could regard as a subfield of artificial intelligence and machine learning. It may involve using specialised methods and making use of general learning algorithms. PwC says in their white paper new computer vision techniques permit organisations to collect innovative intelligence about businesses: people, products, assets and documents. Computer vision will have a tremendous impact on our lives.
![](https://static.wixstatic.com/media/3062a4_5fa4118c397940489e8c30ef16336ab8~mv2.png/v1/fill/w_415,h_233,al_c,q_85,enc_avif,quality_auto/3062a4_5fa4118c397940489e8c30ef16336ab8~mv2.png)
History of Computer Vision
Understanding the way of the past is how we can understand the future of computer vision. As early as in the 60s, computer vision emerged as a domain, still heavily influenced by symbolist philosophy. Marvin Minsky was one of the first computer scientists to outline an approach to building AI systems based on perception.
Basic mechanisms for computer vision are first extracting meaningful features from the raw pixels, then matching these features to known, labelled ones to achieve recognition. We successfully applied PCA (Principal Component Analysis Explained: Principal Component Analysis (PCA, Explained: Principal Component Analysis (PCA) | by Anjaneya Tripathi | Analytics Vidhya | Medium) for the first time to complex recognition problems, such as face classification. Principal Component Analysis is another method appearing in the late 90s and revolutionising the domain is called SIFT (Scale Invariant Feature Transform, Introduction to SIFT( Scale Invariant Feature Transform) | by Deepanshu Tyagi | Data Breach | Medium). We have developed more advanced methods over the years with more robust ways of extracting key points or computing and combining discriminative features.
Machine learning plays a crucial part in tackling image classification. To tackle image classification in the 90s, we developed more statistical ways to discern images based on their features. SVMs (Support Vector Machines, Support Vector Machine — Introduction to Machine Learning Algorithms | by Rohith Gandhi | Towards Data Science) were for a long time the default solution for learning a mapping from complex structures to simple labels. The computer vision community has adapted other machine learning algorithms over the years, including random forest, bags of words, Bayesian models and neural networks.
![](https://static.wixstatic.com/media/3062a4_3d57430065164c5db555099fa83eb138~mv2.png/v1/fill/w_426,h_529,al_c,q_85,enc_avif,quality_auto/3062a4_3d57430065164c5db555099fa83eb138~mv2.png)
In the 50s, Frank Rosenblatt came up with perceptron, a machine learning algorithm inspired by neurons and the underlying block on the first neural networks. Some research papers introduced how we could train neural networks with many layers of perceptrons put one after another using a more straightforward scheme – backpropagation. We developed the first convolutional neural network (CNN) and applied it to the recognition of characters with some success. The methods were computationally heavy or could not scale to grander problems.
The internet transformed data science. It started big data and the golden age of data science. With so much data accessible covering multiple use cases, we opened new doors and new challenges. Hardware kept becoming cheaper and faster, perhaps following Moore’s Law. They become better designed for computer vision. GPUs became affordable and popular with the advent of the new millennium.
Deep learning regroups deeper neural networks with multiple hidden layers – additional layers set between their input and output layers. Each layer processes its inputs and passes the results to the consequent layers, all trained to extract increasingly abstract information. Deep learning started growing until a significant breakthrough in 2012 gave it its contemporary prominence. Google demonstrated how we could apply advances in cloud computing to computer vision.
![](https://static.wixstatic.com/media/3062a4_295cc54271df41a3a8fe31f9bc0cb337~mv2.png/v1/fill/w_500,h_456,al_c,q_85,enc_avif,quality_auto/3062a4_295cc54271df41a3a8fe31f9bc0cb337~mv2.png)
Why do we use computer vision?
The main tasks of computer vision include content recognition, video analysis, content-aware image edition and scene reconstruction. Across industries, people apply computer vision. For example, insurance companies leverage computer vision to analyse damaged assets. The automotive industry has used computer vision accompanied by deep learning for screen analysis, automated lane detection and road sign reading. The media world uses computer vision to recognise images on Social Media to identify brands. Every day, people upload more than 300 million photos to Facebook, the largest social media platform, and every minute there are 510,000 comments posted and 293,000 status updates. Retailers are interested in analysing the shopping carts of in-store shoppers to detect items and make recommendations. Healthcare companies apply computer vision to disease detection in MRI scans or use it with deep learning for radiology tasks. 90% of all medical data is image-based.
Computer vision can allow us to spot defects that may not be visible to the human eye. It may permit us to analyse every football player. It lets us spot counterfeit money and prevent fraud. It can make automated checkout possible for a better customer experience. It supports identifying areas of concern for cancer patients. It helps us detect early signs of plant disease. It enables us to detect leaks and spills from pipelines. It assists us to differentiate between staged and real auto damage.
![](https://static.wixstatic.com/media/3062a4_ba5bd38166804e02ae11aff4a9b0394c~mv2.png/v1/fill/w_592,h_320,al_c,q_85,enc_avif,quality_auto/3062a4_ba5bd38166804e02ae11aff4a9b0394c~mv2.png)
The Future of Computer Vision
Computer vision in the future can perform a broader range of functions. We will be able to train models more effortlessly. They will be able to discern more from images than they do now. We can also use computer vision in conjunction with other technologies or other subsets of AI for more potent applications.
For instance, we can combine image captioning applications with natural language generation (NLG) to interpret the objects in the surroundings for visually challenged people. Computer vision will have a crucial role in artificial general intelligence and artificial superintelligence development by allowing it to process information beyond human capabilities.
Computer vision AI must happen close to the edge to attain the real-time responsiveness that many systems require. Video processing requires large data sets. Thus, we need appropriate processing and storage resources close to the edge to avoid latency. Storage needs to be transparent to simplify processing so that the system does not necessarily need to know where the data is stored to make the best use. It also needs to work seamlessly with accelerated computer resources while scaling out locally to meet increasing capacity demands. AI is the future of computer vision. With more data, solutions will become more adaptive. It may lead to dramatic improvements in the performance of systems and automated processes, along with decision making. A real-time continuous improvement loop will be possible where a model is constantly retraining itself.
It might be hard to believe that there are more benefits and applications of the technology that remain unexplored considering the capabilities of present-day computer vision. The future of computer vision will create paths for AI systems as human as us. Nonetheless, there are a few obstacles that we have to overcome, and the biggest one could be by demystifying the black box of AI. Computer vision, while functioning effectively, can be incomprehensible when it comes to its inner workings to explain outcomes.
We should validate a computer vision model design to ensure models work correctly with specific real-world computer vision applications. Organisations looking to adopt computer vision technology or enhance their current implementation should concentrate on the insights they want to gain. We can now deliver better quality insights faster and at scale. We should focus on the specific outcomes they want to achieve. We should also monitor computer vision to ensure safety, reliability, privacy and morality. Ethical frameworks can be a starting point to consider.
References:
Hands-On Computer Vision with Tensorflow 2 (Dr. Benjamin Planche and Eliot Andres)
Comments