5-HT 2 A antagonist MDL 100907 showed inconsistent effect on decision-making during reward-guided behavior

We investigated whether the 5-HT2A antagonist, MDL100907, affected the decision-making during reward guided behavior. Three rhesus monkeys underwent intramuscular injection of MDL100907, then performed decision-making schedule task. In this task, the monkey was required to choose one of two options of different reward value which were consisted of a combination of different amount of workload and liquid reward. The error rate, the reaction time, and the choice probability between two options did not show consistent changes by MDL100907 among monkeys. We speculate that 5-HT2A is not the receptor type to affect the reward-based decision-making, or a wide distribution of 5-HT2A on cortex lead to inconsistent effects on the specific behavioral parameters among subjects. *Correspondence to: Munetaka Shidara PhD, Faculty of Medicine, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 305-8577, Doctoral Program in Kansei, Behavioral and Brain Science, Graduate School of Comprehensive Human Sciences, University of Tsukuba, 1-1-1 Tennodai, Tsukuba, Ibaraki 3058577, Japan, E-mail: mshidara@md.tsukuba.ac.jp


Introduction
The ascending serotonergic pathway that originates from dorsal raphe nucleus (DR) innervates whole forebrain and regulates the broad range of cognitive functions. The relation between reward-guided behavior and serotonergic function has been extensively explored. Many researchers pointed out that serotonin concentration in the brain affects discounting of the reward value which is determined by the delay or workload toward the reward and the amount of reward [1][2][3][4][5]. The diversity in serotonin receptor subtypes and their anatomical distributions complicate the understanding of serotonergic roles. Among 14 receptor subtypes, the ones that are most involved in reward-guided behavior still remain uncertain. Here we hypothesized that 5-HT 2A subtype might be involved in the reward-guided behavior because this type of receptor is widely and densely expressed in the neocortex [6,7]. Converging evidences suggest that the subdivisions of the frontal cortex are implicated in the reward value calculation. Several parts of the frontal cortex are the essential elements in one of the basal ganglia thalamocortical loops, which is necessary to generate behavior in response to emotionally or motivationally salient stimuli [8]. Therefore, the pharmacological manipulation on 5-HT 2A possibly alters the reward guided behavior. To investigate this hypothesis, we injected 5-HT 2A selective antagonist MDL100907 intramuscularly to the rhesus monkeys, immediately before they began to perform a decision-making schedule task and examined the effect on choice and reward schedule task execution.

Materials and methods
We used three rhesus monkeys (monkey H: ~7.3 kg, monkey P: ~6.4 kg, monkey Y: ~11.4 kg). The experiments were approved by the Animal Care and Use Committee of the University of Tsukuba, and were conducted in strict accordance with the guidelines for the Care and Use of Laboratory Animals of the University of Tsukuba.
In the drug condition, the monkeys underwent intramuscular injection of 1ml solution of MDL100907 at the dose of 0.002mg/kg.
A PET research showed that about 50% of 5-HT 2A in monkey brain was occupied at this dose [9]. In the control condition, 1ml of vehicle was injected. The experiment was continued two weeks, and the drug condition and control condition were performed on every alternate day (In 1 st week, Mon: control, Tue: drug, Wed: control, Thurs: drug. In 2 nd week, Tue: control, Wed: drug, Thurs: control, Fri: drug). The decision-making schedule task was begun 15 min after the injection. The task was composed of two parts: a decision-making part and a reward schedule part [10]. Briefly, the monkey sat in a primate chair of which three touch-sensitive bars were attached to the front panel. These bars were referred to as the center bar, and the right and the left choice bars. When the monkey touched the center bar, the decisionmaking part started. Two white rectangle cues were presented on CRT display as the choice options. The length and the brightness of the cues indicated the amount of workload (1, 2, 3, or 4 repeats of simple visual discriminations) and reward (1, 2, 3, or 4 drops of water; 1drop = 0.15ml), respectively. From these 4×4=16 different combinations of workload and reward, the computer randomly picked up two cues and presented to the monkey. Accordingly, 16 C 2 =120 different pairs existed. To make a decision, the monkey was required to touch either the right or the left choice bar that was on the same side as the chosen cue 150-3000ms after the onset of the cues. All the behavioral responses outside 150-3000ms were counted as an error. After the choice was successfully performed, the reward schedule part began. The monkey had to perform the repeats of red-green color discriminations they chose, to earn the promised reward. If the monkey failed to respond within 1 sec after the onset of green color, the trial was scored as an error and the failed trial was repeated until successfully done.
The "R" statistical programming language was used for all analyses. We identified the day in which the monkey earned the least number of the drop of reward. For other days, we collected the behavioral data before accumulated number of the drop of earned reward reached the least number. Using those data, the reaction time was examined as the time from visual target presentation to the onset of behavioral response, and the error rate was calculated as the ratio of error and performed total trials.
We calculated the percentage of chosen options across all 120 combinations of reward schedules. For analyzing the choice behavior, we used the same model as in Setogawa, et al. (2019). That is, the subjective reward value of the choice targets is discounted by workload which is consisted of physical effort and delay in our task. Our previous report [11] showed that the subjective reward value, v, is discounted exponentially: where r is number of reward drop, k is discounting rate, and D is the number of visual discriminations. Based on the differences in the subjective reward value between two options, the chosen side is determined through the following sigmoidal function; (2) where v left and v right are the value of the left and right targets, respectively, a determines the steepness, and p is the choice probability. Using the "optim" in the "R", the discount rate k was estimated from four days control condition. Assuming the k as an innate discount rate, we tested whether the probability of chosen cues was different between the drug and the control conditions. We performed a logistic regression analysis using a generalized linear model (GLM) with a binomial link function. (3) The dependent variable y is binary (0/1); the choice of left or right. The predictor variable x value is the difference in value between the left and the right cue, and x condition is the vehicle or the drug condition.

Results
First, we examined the behavior during decision-making part. The number of successful vs failed choices were 482 vs 88, 1066 vs 65, and 240 vs 24 in monkey Y, H, and P, respectively, in the drug condition. In the control condition, those numbers were 468 vs 19, 1059 vs 51, and 245 vs 9, respectively ( Figure 1 and Table 1). The whole error rate was calculated from those numbers. Though the whole error rate in two monkeys were significantly larger in drug condition (χ 2 test; χ 2 =37.2, p~0 in monkey Y, χ 2 =5.78, p=0.024 in monkey P, corrected with FDR), the remaining one monkey was not (χ 2 test; χ 2 =1.29, p=0.26). We did not find consistent change in the error rate of each error subtype (early error, bar error, late error), too ( Table 1). As for the reaction time, the movement when the left side cue was chosen and when the right-side cue was chosen was different. Therefore, we sorted the reactions in the decision-making part based on the chosen side and compared them between two conditions. We did not find consistent results among three monkeys ( Table 2).
Though the discounting rate among all monkeys seemed to increase in drug condition (left side columns in Table 3), this did not give rise to consistent change in the choice probability. In the decision-making part, the choice probability, in only one monkey (Y), significantly altered in the drug condition. Remaining two did not show significant changes (right side columns in Table 3). The results were inconsistent among the monkeys.
Next, we examined the behavioral parameters during the schedule part. In the drug condition, the number of successful and failed visual discrimination trials used for the analyses in the reward schedule part were 958 vs 31, 2149 vs 105, and 471 vs 131 for monkey Y, H, and P, respectively. In the control condition, those numbers Horizontal axis labels the drug and control conditions (shown as "ant." and "cont.") in each monkey. The error type was indicated by different gray level. We defined the "early error" as the bar touch before 150 msec after the cues were presented. The "bar error" was twice or more bar touches, and/or the touch to two or more bars. The "late error" was the no-touch to the choice bar within 3000 msec after the presentation of the cues. The asterisk shows significant level in p value (** : p<0.01, * :p<0.05). All p values were corrected with FDR were 958 vs 31, 2123 vs 148, and 476 vs 101, respectively. The whole error rate and the error rate of each error type were shown in Figure  2 and Table 4. Two monkey showed significant differences in whole error rates between the drug and the control conditions. However, one of them showed larger error rate in the control condition (monkey H; 0.0468 vs 0.065 in the drug and the control conditions, χ 2 -test; χ 2 =6.86, p=0.013, FDR correction) and the other showed larger error rate in the drug condition (monkey Y; 0.15 vs 0.031 in the drug and the control conditions, χ 2 -test; χ 2 =85, p~0, with FDR correction). The remaining one monkey did not show significant differences (monkey P; 0.22 vs 0.18 in the drug and the control conditions, χ 2 -test; χ 2 =3.1, p=0.08, with FDR correction). Taken together with the change in the error rate of each error subdivision (Table 4), we conclude that the drug did not generate consistent changes among three monkeys.
For the reaction time in the reward schedule part, two monkeys showed significantly larger and smaller reaction times in the drug condition than in the control conditions, and one monkey did not show significant change. Therefore, the results were inconsistent among monkeys (Table 5).

Discussion
We investigated whether the monkey behavior during performing the decision-making schedule task was affected by the administration of 5-HT 2A antagonist MDL100907. The results indicate that intramuscular injection of MDL100907 did not show consistent effect on behavioral parameters both during choice and reward schedule execution among monkeys. There are a few possibilities for these results. Firstly, 5-HT 2A is not the receptor type for this role. Because there are many receptor types, it is not surprising that other receptor types such as 5-HT 1A or 5-HT 4 might be important. Secondly, wide distribution of 5-HT 2A in the brain might counteract each other if each specific part of the brain has different role by 5HT 2A . The 5-HT 2A is exclusively expressed in broad area of neocortex [7]. The clinical symptoms such as anxiety, aggression and hallucinations are possibly related with 5-HT 2A activation [12]. Functionally, the 5-HT 2A is implicated in emotion, emotion related memory, learning, and food intake [13,14]. Taken together, the cortices excitability involved in broad functions seems to be regulated by the 5-HT 2A activity. Thus, antagonism to this receptor might interfere with various steps in the decision-making during reward-guided behavior, hence, did not show consistent changes if strength of each process is different among individual monkeys. Further complication derives from the fact that 5-HT acts on major dopaminergic pathways and 5-HT 2A facilitate dopamine release [15]. Injecting MDL100907 to the specific part of the brain, especially one narrow part of reward-related areas such as striatum, would be necessary to examine the role of 5-HT 2A on the reward-based decision-making.

Conclusion
Using three rhesus monkeys, we investigated the effect of 5-HT 2A antagonist, MDL100907, during performing the reward-based decision-making schedule task. Any of the following task performances, error rates, reaction time, discount rate and probability of choice, did not show a consistent change by the administration of MDL100907. Though the 5-HT 2A receptor is distributed in almost entire part of the neocortex and has a role to regulate cortical excitability in various mental functions, reward value discounting or schedule task execution in our task paradigm did not seem to be the main role of this receptor type.  Table 4. Error rate of each error type in reward schedule part. Early error: bar-release earlier than 150 msec after the presentation of green color, bar error: touches to left and/or right bars, late error: not releasing the bar within 1000 msec after the presentation of green color (** : p<0.01) Figure 2. The error rate in the reward schedule part. The axes labels are same as in Figure 1.

Monkey
The "early error" in the reward schedule part was the bar-release earlier than 150 msec after the presentation of green color. The "bar error" was the touches to left and/or right bars. The "late error" was not releasing the bar within 1000 msec after the presentation of green color. The significant level was indicated by asterisk (**: p<0.01, *: p<0.05)