Towards a Smart Bionic Eye: AI-powered artificial vision for the treatment of incurable blindness

Objective. How can we return a functional form of sight to people who are living with incurable blindness? Despite recent advances in the development of visual neuroprostheses, the quality of current prosthetic vision is still rudimentary and does not differ much across different device technologies. Approach. Rather than aiming to represent the visual scene as naturally as possible, a Smart Bionic Eye could provide visual augmentations through the means of artificial intelligence-based scene understanding, tailored to specific real-world tasks that are known to affect the quality of life of people who are blind, such as face recognition, outdoor navigation, and self-care. Main results. Complementary to existing research aiming to restore natural vision, we propose a patient-centered approach to incorporate deep learning-based visual augmentations into the next generation of devices. Significance. The ability of a visual prosthesis to support everyday tasks might make the difference between abandoned technology and a widely adopted next-generation neuroprosthetic device.


Introduction
How can we return a functional form of sight to people who are living with incurable blindness? Few disabilities affect human life more than the loss of the ability to see. Although recent advances in gene and stem cell therapies (e.g. Russell  ) are showing great promise as nearfuture treatment options for end-stage retinal degeneration, and some affected individuals can be treated with surgery or medication, there are currently no effective treatments for many people blinded by severe degeneration or damage to the retina, the optic nerve, or cortex. In such cases, an electronic visual prosthesis (bionic eye) may be the only option (Fernandez 2018, Roska andSahel 2018). Analogous to cochlear implants, these devices electrically stimulate surviving cells in the visual pathway to evoke visual percepts (phosphenes). Whereas there is only one regulatory-approved gene therapy (Luxturna), three visual prostheses have been commercialized over the years (Second Sight's Argus II, Retina Implant AG's Alpha-AMS, and Pixium Vision's IRIS II). Existing devices generally provide an improved ability to localize high-contrast objects and to perform basic orientation & mobility tasks (Geruschat et al 2012, Karapanos et al 2021.
However, the prosthetic vision generated by current retinal implants is still rudimentary and does not differ much across different device technologies (Erickson-Davis and Korzybska 2021). Analogous to the first generation of cochlear implants, these devices have relied on straightforward signal processing and encoding schemes, assuming that each electrode in the array can be thought of as a 'pixel' in an image (Dagnelie et al 2007, Chen et al 2009, Perez-Yus et al 2017, Sanchez-Garcia et al 2019; to generate a complex visual experience, one then simply needs to turn on the right combination of pixels. In contrast, current prosthesis users report seeing highly distorted phosphenes, which vary in shape across subjects as well as electrodes and often fail to assemble into more complex percepts (Wilke et al 2011, Beauchamp et al 2020, Erickson-Davis and Korzybska 2021, Fernández et al 2021. In the case of epiretinal implants, these distortions are largely due to inadvertent activation of passing axon fibers (Rizzo et al 2003, but other device technologies based on electrical stimulation of visual cortex or optogenetics may face related issues. On the one hand, optogenetic prostheses may cause perceptual distortions due to differences in temporal dynamics between the optogenetic molecules and normal photopigments (Fine and Boynton 2015). On the other hand, although there is a long history of patients reporting punctate percepts (sometimes described as 'a star in the sky') in response to single-electrode stimulation of the visual cortex (Dobelle and Mladejovsky 1974, Evans et al 1979, Dobelle 2000, Bosking et al 2017, more recent work has highlighted that the percepts resulting from multi-electrode stimulation cannot be explained by a summative model based on single-electrode phosphenes (Barry et al 2020, Beauchamp et al 2020, Fernández et al 2021.
While much work has focused on either making use of these documented distortions (Srivastava et al 2009, Kiral-Kornek et al 2013, Bruce and Beyeler 2022 or finding ways to avoid them (Vilkhu et al 2021, Granley et al 2022, de Ruyter van Steveninck et al 2022, these often theoretical insights have yet to be incorporated into a new generation of implantable technology.

Towards a smart bionic eye
Rather than aiming to one day restore natural vision with visual prostheses (which may remain elusive until we fully understand the neural code of vision), we might be better off thinking about how to create practical and useful artificial vision now. Specifically, a visual prosthesis has the potential to provide visual augmentations through the means of artificial intelligence (AI) based scene understanding (see figure 1), tailored to specific real-world tasks that are known to affect the quality of life of people who are blind (e.g. wayfinding & navigation, face recognition, selfcare). With recent breakthroughs in deep learningbased computer vision and AI, it is timely to consider how this work may best complement existing lines of animal and human behavioral research to inform the design of a next-generation visual prosthesis.
Instead of aiming to represent the visual scene as naturally as possible, a Smart Bionic Eye could locate the misplaced keys in the living room (figure 1, 'Visual search'), read out medication labels ('Screen reader'), inform a user about people's gestures and facial expressions ('Conversation') during social interactions, or warn of nearby obstacles and outline safe paths ('Navigation') when the user is going for a walk. Such a device could take inspiration from existing low vision aids (Htike et al 2021), which do not promise any kind of sight restoration, but increasingly rely on AI to deliver functionality at a practical level (e.g. Microsoft's Seeing AI and Google Lookout are using computer vision to identify packaged food, and screen readers to read visually captured text aloud).
Indeed, we are not the first to point out that computer vision (and more generally: deep learningbased AI) may have an important role to play in visual prosthesis design (Barnes 2012, Islam et al 2019.  Barnes 2014, Rasla and. However, although these studies are valuable in that they provide insights and specific hypotheses about the role of image processing and stimulus optimization for prosthetic vision, most of them were based on hypothetical future devices, did not involve prosthesis patients, or relied on overly simplified simulations that assumed phosphenes to be small, isolated, and independent light sources. It is therefore unclear how these findings would translate to real prosthesis patients. Only a handful studies have validated their computer vision algorithm on sighted subjects viewing prosthetic vision simulations However, with recent advances in computer vision and AI, the time is now to re-visit these ideas. It is only through the advent of deep learning that we can extract depth from a single image (without the need for extra sensors and bulky peripherals), that we can segment objects according to semantic labels, or that we can converse with an AI that understands our intention. In addition, the rapid development of deep learning-specific hardware (e.g. Intel's Neural Compute Stick) may soon allow these models to be deployed in real time in an energy-efficient way. To this end, several studies have already made use of deep learning to optimize stimulation strategies in an andto-end fashion (Granley et al 2022, de Ruyter van Steveninck et al 2022 and applied these concepts to task-based scene simplification (Han et al 2021, Küçükoglu et al 2022, White et al 2022. Ultimately, the ability of a visual prosthesis to support everyday tasks might make the difference between abandoned technology and a widely adopted next-generation neuroprosthetic device. Indeed, when Retina Implant AG (maker of the Alpha-IMS/AMS subretinal implants) dissolved in March 2019, they cited their device not leading to 'the concrete benefit in everyday life of those affected' 4 as one of the main reasons for shutting down.

The scientific challenge
How do we arrive at a Smart Bionic Eye? Achieving this ambitious goal will certainly require the engineering of next-generation visual prostheses with large electrode counts (Ferlauto et al 2018, Chen et al 2020, Shah and Chichilnisky 2020 and the development of sophisticated AI systems. However, the challenge is less about dreaming up new computer vision algorithms and more about identifying the design principles and visual cues that are best suited to augment the visual scene in a way that supports behavioral performance for a potentially heterogeneous end-user demographic. For example, humans are able to flexibly adapt their visual navigation strategies depending on the visual cues that are available to them-in texture-rich environments they might use optic flow, but in texture-scarce environments they might rely on the perceived location of the goal, together with extraretinal information about their head and eye position (Turano et al 2005). Furthermore, these strategies change under central and peripheral vision loss (Turano et al 2001). How do we know which visual navigation cues are best suited for visual prosthesis patients?
Another concern is that the vision tests typically used in clinics and psychophysics laboratories (e.g. perimetry, acuity, contrast sensitivity, orientation discrimination) are not designed to test the ability of prosthetic devices to restore vision (Peli 2020). The main reason for this is the nature of the multialternative forced choice paradigm that is typically used to administer these tests. As such, they may not measure what the researchers intended, either because nuisance variables may provide spurious cues that can be learned in repeated training or because the tests can be passed without form vision (Peli 2020). Consequently, superior performance on these tests does not necessarily imply sight restoration.

The proposed solution
To address these challenges, we propose to a patientcentered approach to incorporating AI-powered visual augmentations into the next generation of implanatable technology.
Most prosthesis designs share a common set of components: a camera to capture images, generally mounted on glasses; a video processing unit (VPU) that transform the visual scene into patterns of electrical stimulation and transmits this information through a radio-frequency link to the implanted device, and an electrode array implanted somewhere along the visual pathway.
The conventional approach to stimulus encoding, as implemented by previously commercialized devices such as Alpha-AMS and Argus II, is typically very simple, assuming a linear relationship between the gray level of a pixel in the captured image and the stimulating amplitude (figure 2(A)). Several studies have already proposed more sophisticated stimulus encoding strategies to recreate a desired neural activity pattern over a given temporal window (Shah et   the perceptual consequences of the resulting neural activity. We thus suggest an iterative workflow that begins and ends with the patient (see figure 2(B). In line with research practices in the human-computer interaction (HCI) community, the first step is to identify the information needs of the end user through a series of qualitative and quantitative studies. This may involve low vision users navigating a virtual environment to (e.g.) avoid obstacles or reach a goal location. Their struggles and challenges may then inform the visual cues that are required to perform the task (Hoogsteen et al 2022), which may lead to task-specific visual augmentation strategies. These strategies can be refined using qualitative feedback from the end user (i.e. in which way do they prefer the information to be presented?) as well as their behavioral performance (i.e. which strategies are most effective?). Finally, strategies that perform well in the training environment can be tested on real prosthesis patients. Below we expand on these ideas.

Patient-centered design
As pointed out by Htike et al (2020) and Erickson-Davis and Korzybska (2021), the majority of research on visual prostheses (and more generally: low vision aids) has focused on the technical aspects rather than the usability of these devices. One promising development has been the Functional Low-Vision Observer Rated Assessment (FLORA), a tool to provide a subjective assessment to capture the functional visual ability and well-being of visual prosthesis patients (Geruschat et al 2015). While it is encouraging to see increasing adoption of FLORA by the community (Geruschat et al 2016, Karapanos et al 2021, in practice it is often employed as an external validation tool that constitutes the very last step of the design process-a proof of concept, so to speak. However, if the proof of concept fails, researchers must start over and try again until they have found a better way to improve FLORA performance of their subjects. This is in stark contrast to research practices of the HCI community, which typically aims to incorporate end users in the decision making and development during every step of the design process (Rubin andChisnell 2011, Lee et al 2017). In particular, patient-centered design (PCD) is a methodology that aims to make systems usable and useful by first-andforemost focusing on the needs and requirements of the patient (Reis et al 2011, Light 2019). Using a combination of clinical and technical tests, feedback and questionnaires, PCD can inform what potential end users may want out of a visual prosthesis, where and how they would use it, and what features they would consider essential. These tests may be conducted during each stage of the design process to ensure that development proceeds with the user as the center of focus (Rubin and Chisnell 2011).
While this feedback may not be the solution to all problems related to the optimal encoding of visual information, it may represent an important first step towards developing more usable prosthetic devices that may complement existing lines of research that focus on prototyping with animal models or simulation systems. In a recent systematic review (Kasowski et al 2022), we showed that although there is no shortage of publications that demonstrate a proof-of-concept augmentation strategy, less emphasis has been placed on understanding the usability of their proposed technology. Involving appropriate end users in all stages of the design process may ultimately improve the effectiveness and accessibility of the technology as well as user satisfaction (Schicktanz et al 2015).

Virtual prototyping
Due to the unique requirements of working with bionic eye recipients (e.g. constant assistance, increased setup time, travel cost), experimentation with new stimulation strategies remains timeconsuming and expensive.
In the interim, a more cost-effective and increasingly popular alternative might be to rely on an immersive virtual reality (VR) prototype based on simulated prosthetic vision (SPV) (Zapf et al 2014, Thorn et al 2020, Sanchez-Garcia et al 2020a, Kasowski and Beyeler 2022). Here, the classical method relies on sighted subjects wearing a VR headmounted display (HMD), who are then deprived of natural viewing and only perceive phosphenes displayed in the HMD. This allows sighted participants to 'see' through the eyes of the bionic eye user, taking into account their head and/or eye movements as they explore a virtual environment. The visual scene can then be manipulated according to any desired image processing or visual augmentation strategy (Han et al 2021).
In order for simulation results to translate to real prosthesis patients, simulations should rely on psychophysically validated phosphene models and employ a restricted field of view that necessitates head scanning (Kasowski and Beyeler 2022). In addition, sighted participants in SPV studies are often sampled from the university's undergraduate population (for practical reasons). Their age, navigational affordances, and experience with low vision may therefore be drastically different from real bionic eye users, who tend to not only be older and prolific cane users but also receive extensive vision rehabilitation training. For instance, Williams et al (2014) compared sighted and blind navigation and found that both groups understand navigation differently, leading sighted people to struggle in guiding blind companions. Furthermore, blind people use a combination of devices and technology to complement their existing orientation and mobility skills (Williams et al 2014), which may lead to a wide variety of navigation styles (Ahmetovic et al 2019, Htike et al 2020. An important step towards designing more usable visual prosthetics may thus be to recruit age-appropriate participants for SPV studies. If done right, the use of a VR prototype may drastically speed up the development process by testing theoretical predictions in high-throughput experiments, the best of which can be validated and improved upon in an iterative process with the bionic eye recipient in the loop (Kasowski et al 2021).

Visual augmentations to support real-world tasks
Most visual prostheses are equipped with an external VPU capable of applying simple image processing techniques to the video feed in real time. For instance, edge detection and contrast maximization are already routinely used in current devices. In the near future, these techniques may include deep learning-based algorithms aimed at improving a patient's scene understanding.
Based on this premise, researchers have developed various image optimization strategies, and assessed their performance by having sighted observers conduct daily visual tasks under SPV (Dagnelie et al 2007, Al-Atabany et al 2010, McCarthy et al 2014, Li et al 2018. Simulation allows a wide range of computer vision systems to be developed and tested without requiring implanted devices. SPV studies suggest that one benefit of image processing may be to provide an importance mapping that can aid scene understanding; that is, to enhance certain image features or regions of interest, at the expense of discarding less important or distracting information (Boyle et  A so-far unexplored application domain concerns the use of visual question answering (VQA) to help a user retrieve misplaced items or orient themselves in their environment ( figure 3(D)). VQA models (e.g. Antol et al 2015) are able to give a visual answer to a verbal question; for example, in response to the question 'How many giraffes are drinking water?' and a given image, the network would respond by drawing bounding boxes around all the giraffes drinking from the water hole (but not the other ones, even if they are standing by the water hole). In the context of the Smart Bionic Eye, VQA models would allow a user to ask questions such as 'Where did I put my keys again?' , and the system would respond by segmenting the keys in the prosthetic image while the user is looking around the room (see also figure 1).
Other concrete examples to support practical tasks might include (a) an outdoor navigation mode, where we may need to test the utility of highlighting nearby obstacles, highlighting the goal location, or outlining structural edges to let a user orient themselves in the environment, and (b) a conversation mode, where we may need to test the utility of highlighting different facial features to allow for face discrimination, highlighting the person that is currently speaking to determine whether they are addressing the user or someone else, or notifying the user of people entering or leaving the room. Importantly, these ideas should constitute only the beginning of a conversation with potential end users, such that the proposed solution can be iteratively refined based on both qualitative feedback from real patients and quantitative measures from virtual patients with the VR prototype.
It is easy to see how the above deep learning techniques could become an integral part of the Smart Bionic Eye once they reach a certain maturity that allows them to be used in unstructured environments. In the future, these visual augmentations could be combined with GPS to give directions, warn users of impending dangers in their immediate surroundings, or even extend the range of 'visible' light with the use of an infrared sensor (Sadeghi et al 2021). Once the quality of the generated artificial vision experience reaches a certain threshold, there are a lot of exciting avenues to pursue.

Challenges & limitations
Despite its potential, development of a Smart Bionic Eye faces a number of challenges and limitations, which we briefly address below.

Risks & benefits
At the core of the question about whether to develop and implant a Smart Bionic Eye lies a risk/benefit assessment. Indeed, the AI-powered algorithms outlined above could also be used as input to other low-vision devices such as smart glasses and sensory substitution devices, which do not necessitate risky and invasive surgery. Future patients thinking about whether to implant should therefore not only consider device safety and efficacy data in their decision, but should also be informed about lessinvasive alternatives that may deliver similar benefits.
That being said, one advantage that a Smart Bionic Eye could offer over nonvisual alternatives is a combination of both a conventional 'natural vision' mode next to a number of 'artificial vision' modes designed to support everyday tasks. Such a device (though invasive and expensive) might thus be superior to other accessibility aids such as smartphone apps and sensory substitution devices, because it could directly tap into the visual cortex of a blind user to make them see. On the other hand, one might also consider a next-generation device to combine the benefits of prosthetic vision with other sensory augmentations (Kvansakul et al 2020).

Neural code of vision
A major outstanding challenge is translating electrode stimulation into a code that the brain can understand. Interactions between the device electronics and the underlying neurophysiology can lead to perceptual distortions that severely limit the quality of the generated visual experience (Fine and Boynton 2015, Erickson-Davis and Korzybska 2021. One possibility is thus that we must first address fundamental questions about the neural code of vision (Abbasi and Rizzo 2021) and (the lack of) cortical plasticity in adult visual cortex (Beyeler et al 2017), before we can explore AI-based visual augmentations.
However, since the goal is not primarily to create natural vision, it suffices that phosphene characteristics are distinct and stable over time, which is the case for current implants (Luo et al 2016, Fernández et al 2021. In addition, there often exists a numeric or symbolic forward model, constrained by empirical data, that can predict a neuronal or ideally perceptual response to the applied stimulus (Bosking et al 2017. To find the stimulus that will elicit a desired response, one essentially needs to find the inverse of the forward model, which can be achieved in a number of ways (Spencer et al 2019, Fauvel and Chalk 2022, Granley et al 2022.

Robustness & safety
It can be downright dangerous to allow computer vision algorithms to operate in the real world without people in the loop. These AI systems can make serious mistakes that no sane human would make (Hole and Ahmad 2021). For example, it is possible to make subtle changes to images and objects that fool vision-based AI systems into misclassifying objects. This can have grave consequences if the system is relied upon to warn of impending dangers, such as an approaching car, where a false negative could be fatal.
However, this issue is not unique to the Smart Bionic Eye, but affects applications ranging from selfdriving cars to remote sensing and medical imaging. While more work is needed to improve the robustness of vision-based AI systems in real-world scenarios, potential solutions may range from techniques to improve model performance under naturallyinduced image corruptions and alterations (Drenkow et al 2021) to human-machine partnership (Patel et al 2019, Fauvel and Chalk 2022).

Engineering
Even if the stimulus encoding problem and safety issues are solved, there remains the question of how to fit a sophisticated AI system on a low-power, portable 'edge device' such as a VPU.
Although still an active field of research, a potential solution may take the form of a serverless cloud service (Zhang et al 2021), as is currently being developed for Internet of Things solutions, or deep learning-specific neuromorphic hardware, such as Intel's Neural Compute Stick. While the latter has the potential to dramatically improve the latency, robustness, and power consumption compared to traditional computers, new computer vision algorithms are needed to process the unconventional output of neuromorphic sensors to unlock their potential (Gallego et al 2022, Sanchez-Garcia et al 2022. In addition, since people who are blind tend to spend a lot of time indoors (Jeamwatthanachai et al 2019), it is not outlandish to assume that a Smart Bionic Eye could be shipped with a central desktop computer that would handle most of the computationally expensive processing while communicating wirelessly with the external glasses of the implant.

Conclusion
In this letter, we propose to complement existing lines of bionic vision research with a patient-centered approach that considers the possibility of a visual prosthesis to function as an AI-powered visual aid. This Smart Bionic Eye would harness recent developments in deep learning-based computer vision and AI to provide useful visual augmentations for everyday tasks.
To enable such a technology, we first need to address fundamental questions at the intersection of neuroscience, engineering, and HCI to better understand how visual prostheses interact with the human visual system to shape perception (Beyeler et al 2017, Abbasi andRizzo 2021) and to identify visual augmentation strategies that best support specific realworld tasks (Han et al 2021). This advance in technology could improve the ability of a visual prosthesis to support everyday tasks and lead to a successful nextgeneration neuroprosthetic device.

Data availability statement
No new data were created or analysed in this study.