Information-reduction methods determine
navigation performance in simulated
prosthetic vision in virtual reality
Master Thesis
June 30, 2023
R. E. Lucas
Student Number: 6540384
Supervisor: P. C. Klink
2nd Supervisor: M. Naber
Master Artificial Intelligence
Faculty of Science
Utrecht University
Netherlands
R. E. Lucas Master Thesis
Abstract
While there are numerous innovations that help blind an partially sighted people improve
their quality of life, some tasks still remain difficult. Neurotechnological innovations can re-
store a rudimentary form of artificial vision (AV) with brain implants that stimulate the brain
and create the perception of ’phosphenes’. These are dots of light with spatial locations that
correspond to the locations of the stimulating electrodes in the brain. In this paper, we inves-
tigate how different information-reduction methods affect the usability of AV for navigation by
simulating different types of AV in a virtual reality environment. The problem with currently
existing implants is the lack of resolution. Therefore we propose a ”walkable” path implemen-
tation that guides navigating users to avoid obstacles and stay on the sidewalk. With this path,
we subtract the relevant information for navigation from the visual scene. Through an simu-
lated prosthetic vision (SPV) study we compared this walkable-path approach with two other
methods: a semantic segmentation algorithm and a combination of semantic segmentation and
the walkable-path. Different phosphene densities where compared for each method. We found
that adding a guidance path to the visual scene, improves navigation performance. Performance
did not necessarily improve when more phosphenes where added. Subjective evaluations showed
that people preferred only having a path over having either having both a path and the environ-
ment or only the environment. These results are a step in the direction of functionally adaptable
prosthetic vision systems.
1 Introduction
Worldwide there are over 253 million blind and partially sighted people (“World Blind Union”,
n.d.). While in the past this indicated a life of poverty and few opportunities, this all changed
due to innovations and societal support for BPS people. First the invention of Braille, which made
reading possible for BPS people, and then the use of guidance animals and probing canes which
allowed them to transport on their own. Furthermore, research concerning assistive technologies
has increased over the past years (Bhowmick & Hazarika, 2017). Technological advances have
been made through the creation of Voice Over and apps like Seeing AI which allow BPS people
to interact with the world through their phone. These technologies also allow children to use
educational materials and interact more with classmates which helps their general development
(Mulloy et al., 2014).
Nonetheless, while these assistive devices increase support and quality of life (Lancioni & Singh,
2014), they do not improve vision in the literal sense. There are still many struggles that BPS people
face in their daily life. For example, smartphone use still has its challenges due to the mobile touch
screen interfaces and small keyboards (Rodrigues et al., 2020). Navigation and mobility can also
be difficult because they do not have their vision to guide them and they must use other senses
to determine where obstacles are and which road they should take (Giudice, 2018; Kemp, 1981).
Besides practical problems, BPS people also have a higher risk for depression and other social
problems (Kemp, 1981; Koenes & Karshmer, 2000). The improvement of actual vision could be
beneficial for BPS people to help them overcome some practical struggles and thereby improve their
mental state.
That is why there is an increasing interest in the development of Artificial Vision (AV) (Bertozzi
et al., 2002; Fernandes et al., 2012; Humayun & de Juan, 1998; Soria et al., 2006). The applications
of AV are diverse; it can be applied in the development of robots or road vehicles (Bertozzi et al.,
2002; Soria et al., 2006), but more importantly, AV can also be used in the regaining of sight for
BPS people by implementing it in neurotechnological visual prosthetic devices (Fernandes et al.,
2012).
In this paper, we will focus on the use of AV to restore a rudimentary form of sight in BPS
1
R. E. Lucas Master Thesis
people. We will mainly focus on the application of AV for navigation and investigate the efficiency
of several algorithms and AV-parameters.
1.1 Theoretical Background
The application of AV in BPS people is typically achieved through electrical stimulation. Hu-
mayun and de Juan (1998) found that the electrical stimulation of the visual cortex resulted in
the appearance of blobs of light, which are called phosphenes. These phosphenes could be used
to compensate for the lack of photoreceptors in patients (Humayun & de Juan, 1998). With the
discovery of these phosphenes, visual prostheses could be developed. Visual prostheses are devices
that can evoke a visual percept through different stimulation methods, such as the use of electrical
stimulation (Fernandes et al., 2012) or optogenetics (Barrett et al., 2014). There are currently
roughly two different types of visual prostheses: retinal prostheses and cortical prostheses. Retinal
prostheses evoke percepts by stimulating the retinal neurons to compensate for damaged photore-
ceptors (Pio-Lopez et al., 2021). Cortical prosthesis, on the other hand, stimulate the visual cortex
to evoke visual percept (Liu & Humayun, 2014). Humayun and de Juan (1998) discussed that when
these visual prostheses elicit meaningful visual percepts, they could help BPS people by restoring
some of their vision.
Since then, extensive research has been conducted for the development of useful visual prostheses
(Chen et al., 2009; De Ruyter Van Steveninck, cl¨u, et al., 2022; De Ruyter Van Steveninck, Van
Gestel, et al., 2022; Dobelle et al., 1974). In 1974, two participants were implanted with an early
visual prosthesis (Dobelle et al., 1974). Both participants were able to recognize simple patterns,
such as letters. In 2012, the recognition of letters improved further, and the prostheses could even
be used for mobility and orientation (Fernandes et al., 2012).
The improvement of mobility and orientation is an important aspect of the development of AV
and assistive technologies. This because navigation is a difficult task for BPS people (Giudice, 2018;
Kemp, 1981). They often need the assistance of sighted people or a guidance dog (Bousbia-Salah
et al., 2007). There are often obstacles that need to be avoided, but that should be detected first, a
feat that strongly relies on vision in sighted people. When using a probing cane, the user can find
obstacles that are close by, but they cannot know what obstacles are further ahead (Bousbia-Salah
et al., 2007). However, with the use of phosphenes, the coming of obstacles further ahead could be
anticipated and avoided (De Ruyter Van Steveninck, Van Gestel, et al., 2022).
Nevertheless, the problem that often arises with this type of phosphene vision is that its ef-
fectivity depends on the number of phosphenes, which is determined by the number of implanted
electrodes. However, the space that can be stimulated with implanted electrodes in the human
primary visual cortex is very limited (van der Grinten et al., 2022). It is therefore important to
determine the desired amount of phosphenes before implanting any electrodes in the brain of a BPS
person. This because the placement of these visual prostheses can be very invasive and in the past
some implementations where experienced as rushed and ill prepared (Chen et al., 2009; Dobelle
et al., 1974; Fernandes et al., 2012). A solution for this problem is the use of simulated prosthetic
vision (SPV). Phosphene configurations can be tested noninvasively with SPV experiments that
are run with sighted participants. In such experiments, sighted people experience phosphene or
prosthetic vision through a simulation. This can for instance be done using a Virtual Reality (VR)
setup or by navigating on a computer through a virtual environment (Bollen et al., 2019; De Ruyter
Van Steveninck, Van Gestel, et al., 2022). These experiments can then be used to evaluate the
minimum requirements that are needed to restore a certain ability (Vergnieux et al., 2017). In gen-
eral, such simulations can thus help with the development of prostheses by determining the optimal
number, placement, and processing of phosphenes and electrodes without damaging a participant
2
R. E. Lucas Master Thesis
(Chen et al., 2009).
While SPV is a commonly used study method, it also has some drawbacks. One of the problems
is that these simulations often lack realism and biological plausibility (van der Grinten et al., 2022).
For example, the interaction between phosphenes in SPV studies is not the same as in real prostheses
(Chen et al., 2009). Furthermore, the temporal dynamics between phosphenes needs to reflect the
delay that can occur between the onset and offset of a phosphene (Dobelle et al., 1974). Namely,
the phosphene response and duration, change depending on the stimulus presentation (Chen et
al., 2009). Due to these shortcomings, some current results of SPV studies cannot be applied to
prosthetic vision. This creates a problem for the development of visual prostheses. If we cannot
properly study a prosthesis without invasively stimulating a person’s brain, we might need to look
at other options to improve the life of BPS people.
The problems with AV and mainly SPV studies were also noted by van der Grinten et al. (2022).
As said before, SPV studies lack biological plausibility. Chen et al. (2009) noted that for SPV studies
to be relevant, it is important that simulated phosphenes are represented as those evoked by a real
visual prosthesis. That is why van der Grinten et al. (2022) developed a biologically plausible
phosphene simulator. This simulator allows phosphene vision to take several characteristics of
the cortex into account which results in phosphenes that should look very similar to those evoked
by electrical cortical stimulation. It works in real-time and therefore can be used in behavioural
experiments such as SPV studies (van der Grinten et al., 2022). Using this simulator in SPV studies
will make the results a much stronger prediction for real visual prostheses.
While this simulator helps to study prostheses, the limited number of phosphenes relative to the
rich visual scenes we typically perceive still causes a substantial proportion of information to get lost
(Sanchez-Garcia et al., 2020). This is a problem for more complex tasks, such as navigation, which
require more in-depth interpretations and exact visual cues of information (Sanchez-Garcia et al.,
2020). Simple environments can be expressed in phosphene simulations, but more complex and
real-life environments are still too complex (De Ruyter Van Steveninck, Van Gestel, et al., 2022).
That is why helpful information reduction methods need to be researched and used in (simulations
of) visual prostheses.
Different algorithms have been used and developed to extract information from a visual scene.
A commonly used algorithm is an edge-detection algorithm. This algorithm extracts visual gra-
dients from all areas of the visual scene (De Ruyter Van Steveninck, Van Gestel, et al., 2022).
The advantage of this method is its ability to exclude noise, which is useful in real-world images
(Truchetet et al., 2001). However, this method subtracts a lot of information and requires more
phosphenes to capture the environment compared to a method where less information is subtracted.
An alternative is a surface-boundary algorithm (De Ruyter Van Steveninck, G¨cl¨u, et al., 2022).
A strict version of this algorithm removes all within-surface information and background textures
from a visual scene. With this simplification, a trade-off is created between interpretability and
informativity (De Ruyter Van Steveninck, Van Gestel, et al., 2022). Another algorithm that can be
used to visualize a scene is semantic segmentation. Sanchez-Garcia et al. (2020) clustered informa-
tion based on semantic meaning. This way objects fall under a semantic category and correspond
to a certain meaning. In their study, they combined the use of instance segmentation with the use
of semantic segmentation. This to both highlight the useful areas and show the edges of objects
and environments. Using this algorithm improved the recognition of objects and rooms compared
to a no processing method which caused an overload of information (Sanchez-Garcia et al., 2020).
De Ruyter Van Steveninck, Van Gestel, et al. (2022) did not find any improvement in their study on
mobility performance when only using an information reduction algorithm. Sanchez-Garcia et al.
(2020), on the other hand, found that recognition performance improved when combining an infor-
mation reduction algorithm with the important aspects in the scene. This means that highlighting
3
R. E. Lucas Master Thesis
useful parts of the environment and subtracting only useful information can improve performance.
More studies have been conducted to test information reduction. Vergnieux et al. (2017) studied
navigation and mobility. They focussed on way finding and compared different methods of infor-
mation display to see how well people could identify landmarks and remember the map of unknown
environments. They found that only rendering the edges of the environment was beneficial for the
performance of participants. Showing that minimum information is very sufficient to find one’s
way in an unknown environment. Fornos et al. (2005) studied the minimization of information in
reading. By minimising the sections shown of a text, participants were more able to read the text
(Fornos et al., 2005).
Lastly, Bollen et al. (2019) used simulated phosphene vision for emotion recognition. They com-
pared an edge detection algorithm to a more simple but still powerful image processing algorithm.
By reducing the amount of information, they found that the accuracy of emotion recognition was
saturated with a grid of 5k phosphenes. On the other hand, using the edge detection algorithm,
saturation was only found around 10k phosphenes. This illustrates that when only useful informa-
tion is used, in this case, the mouth, eyes, nose and facial contours, fewer phosphenes are necessary
to reach maximal performance.
1.2 This Paper
In this paper, we will further investigate the use of artificial vision in the improvement of
navigation for blind and partially sighted people. Since the most prominent current problem seems
to be the limited amount of information that can be represented with phosphene vision, we propose
a different method to process visual input, namely a ”walkable path”. The walkable path is a line
that can be constructed from phosphenes and that can indicate the path where the user can walk
on the sidewalk without hitting obstacles. This makes optimal use of a low phosphene resolution,
while still enabling people to navigate through complex scenes.
We conducted an SPV experiment to test the performance on a navigation task to evaluate
this method. We will compare three information-reduction methods for processing visual input to
evaluate the difference in performance and experience: 1) a semantic segmentation (SS) method,
2) a walkable path (WP) method, and 3) a combination of these two methods (SW).
We hypothesize that the walkable path improves navigation performance, and we thus expect
a higher performance in the methods where a walkable path is present (WP & SW) compared to
the condition where this path is absent (SS). This expectation mainly stems from the fact that
limited information can increase performance (Bollen et al., 2019; De Ruyter Van Steveninck, Van
Gestel, et al., 2022). Furthermore, we expect the subjective evaluation of phosphene vision to be
more positive in the conditions where a walkable path is present because the information is more
focused on the task. We expect a subjective preference for the SW method to the WP method,
since we think that having the environment present may feel more natural and gives a reference of
movement.
We will also compare different phosphene densities. For each method we compare the perfor-
mance across simulations with 64, 625 or 1000 phosphenes. We expect performance to improve
with higher phosphenes numbers.
2 Methods
Before starting the data collection an Ethics and Privacy Quick Scan was performed. This Scan
of the Utrecht University Research Institute of Information and Computing Sciences classified this
4
R. E. Lucas Master Thesis
research as low risk. No full ethics review or privacy assessment was required. This research was
executed according to the protocol of the Faculty of Social Sciences.
2.1 Participants
Data was collected from 20 participants: 10 Males and 10 females aged 19-49 years (M = 25.1,
SD = 6.37). They participated on a voluntary basis and could withdraw at any moment during the
experiment. They were, if applicable, compensated with student credit. Overall participants did
not have any visual impairments (contact lenses and glasses were allowed). One participant had
amblyopia but did not show to be an outlier. Nine participants had some experience with Virtual
Reality (VR) and twelve participants had no experience with Virtual Reality. Seven participants
described themselves as gamers. Two female participants were excluded from the study because
they could not complete the experiment due to extreme nausea in the VR environment.
2.2 Materials
2.2.1 VR
This experiment was conducted using a Virtual Reality (VR) setup. We used an HTC VIVE
PRO headset with a portable battery. The headset had a 1440x1600 resolution per eye, 98
hor-
izontal Field of View and a refresh rate of 90 Hz. The headset had a wireless connection to two
base stations that were used to track the movement of the headset and controllers. The tracking
frequency was 1000 Hz. There were two controllers with multiple buttons. In this experiment we
used the track pad on top of the controllers and the index trigger on the bottom of the controllers.
2.2.2 Unity
The experiment was built in Unity
©
using the Unity documentation for help (Technologies,
n.d.). The base of the environment was built by Art Equilibrium (2020) and represented the streets
of New York. This environment was adapted for this experiment by writing additional scripts for
an experimental pipeline and by adding some objects in the environment. The adaptations for the
working of the simulator and different shaders for this environment were done by De Ruyter van
Steveninck (2023).
Shaders To create the black, white edge contrast, an edge detection shader was used. This shader
was created by van der Grinten et al. (2022). It uses a Sobel filter to create the edges and does
not have a smoothing step. In figure 1b sub-panel (I) a representation of the world with only edge
detection is shown. The other figures show the world with the each information-reduction methods
and edge detection.
Semantic Segmentation The contrast between objects was created with semantic segmentation.
Each object was labelled to one of the following categories: plants, cars, houses, road, probs, signs,
fences, walkable path, or default.
Each category contained objects with corresponding semantic meanings. Every category had a
different colour and edges were created between objects of a different category. This is shown in
figure 1a sub-panel (II).
5
R. E. Lucas Master Thesis
Figure 1: Visual representation of the information-reduction methods applied to the environment
before and after applying edge detection
(a) Before applying the edge detection algorithm
I) Before any processing,
II) the SS method,
III) the SW method,
IV) the WP method
(b) After applying the edge detection algorithm
I) only edge detection,
II) the SS method,
III) the SW method,
IV) the WP method
Walkable Path Throughout the environment, a line was drawn that avoided obstacles and kept
participants on the sidewalk. When walking on this path and following it, participants should be
able to walk a perfect route.
Under the hood, this worked the same as the Semantic Segmentation. However, there were only
two categories and thus two colours. So, the edges existed between the lines of the path and the
rest of the environment. This is shown in figure 1a sub-panel (IV).
2.2.3 Phosphene Simulator
For the phosphene simulator, we used an implementation of the simulator by van der Grinten
et al. (2022). This is a biological plausible simulator that ensures that the results of this study will
be informative for real cortical implants. Three different files were generated with either 64, 625
or 1000 phosphenes within a field of view of 50 degrees. The locations of these phosphenes were
determined in a uniform randomized matter using polar coordinates. This can be seen in figure 2.
The size of the receptive field of the phosphene was set to 5. This was relative to the phosphene
size which was recorded in dynamic visual acuity. Phosphenes became smaller and denser when
they were in the fovea and bigger once they spread out towards the peripheral field. Phosphenes
became activated when edges became within the field of view.
2.2.4 Questionnaire
A questionnaire was used to evaluate the subjective experience of the participants. We measured
how easy they found the task, how mentally and physically challenging it was, whether they felt
stressed or comfortable, if they were aware of the environment and if they needed more guidance.
Participants had to respond on a 7-point Likert scale how much they agreed with the statement.
1 indicated they strongly agreed, 4 indicated that they neither agreed nor disagreed and 7 indicated
that they strongly disagreed.
The following statements were used to evaluate the subjective experience:
6
R. E. Lucas Master Thesis
Figure 2: Phosphene densities in 50 degrees FoV
Note. I) 64 phosphenes, II) 625 phosphenes and III) 1000 phosphenes
1. The overall task I was assigned was easy to complete.
2. The navigation task was mentally challenging.
3. The navigation task was physically challenging.
4. I felt stressed while completing the navigation task.
5. I felt comfortable while navigating in the VR environment.
6. I was aware of my surroundings.
7. More guidance during the task would be beneficial for me.
8. The choices I made throughout the task were based on my understanding about the environ-
ment.
After collecting the responses, some questions were re-scaled so that an overall score could be
interpreted. Question 2, 3, 4, 7 and 8 were re-scaled by replacing the low numbers with correspond-
ing high numbers. After that an overall score could be calculated. This score represents a positive
or negative experience, where a higher score means a more negative experience and a lower score a
more positive experience.
2.3 Conditions & Task
All participants experienced all three information-reduction methods: semantic segmentation
(SS), walkable path (WP) and these two methods combined (SW). In the SS condition the partici-
pant had to find their own way and deduce from the environment where they could and could not
walk. In the WP condition participants only had a path to guide them. This path showed them
where they could walk to stay on the sidewalk or crossroad and without hitting obstacles. In the
SS condition participants had the path to guide them but could also interpret the environment in
their own way.
For each method three different phosphene concentrations where compared: 64, 625 or 1000
phosphenes. So, in total there were 9 different conditions that all participants experienced. Partic-
ipants were divided into six groups. Each group had a different order of the information-reduction
methods. So, they would either start with SS, WP, or SW. This was done so that the experiment
was counterbalanced to avoid recency and learning effects. An overview of the different conditions
can be found in figure 3.
7
R. E. Lucas Master Thesis
Figure 3: Different phopshene densities for each information-reduction method
Note. The figures (A, B, C) on the first row contain 64 phosphenes. The figures (D, E, F) on the
second row contain 625 phosphenes. The figures (G, H, I) on the third row contain 1000 phosphenes.
The first column (A, D, G) presents the WP method, the second column (B, E, H) the SS method
and the third column (C, F, I) the SW Method. For panels F and I the path is accentuated with
a line.
In each trial, participants were asked to complete a navigation task. They either had to get
some bread at the bakery, post a letter, find a taxi, or throw away the trash as can be seen in figure
4. Each task had a different starting point at which participants heard which goal they had to find.
An audio voice guided them in the right direction. This voice told them to either go left or right,
or whether the destination was on their left or right side. If there were no directions, participants
were instructed to walk straightforward. When arriving at the finish, participants had to either
bump or walk into the finish.
2.4 Procedure
After entering the lab, the participants were asked to sign an informed consent and got a brief
instruction of the task. Then the VR headset was put on their head and adjusted to the right
settings.
First participants had to type in their participant number and group number in the menu.
After that the participant was allowed to practice in the VR environment with guidance of the
8