read page 6 again: how they have deployed their algorithm
→ multi agent DRL: called CoTV
→ controls both traffic lights signals and connected autonomos vehicles(CAV)
:: can reduce all travel time, fuel and emissions
→ can be easy to deploy by cooperating with with only one CAV tha is nearest to the traffic light controller
→ Most existing research in sustainable urban traffic control adjusts either traffic light signals or vehicle speed
→ How CoTv different from MARL
: MARL uses action-dependent strategies( action of one agent depends on action of other agent)
→ CoTv relies on : exchange of states bw agents within the range of one intersection : action independent MARL
This CoTv using PPO obtains upto 30% reduction in both travel time and fuel consumption
→ Which vehicle is selected for coordination?
:: traffic light controller in CoTv selects closest CAV to intersection on each incoming road as CAV agent:::::: concept of 'platooning', where a group of vehicles closely follow a leading vehicle to improve overall traffic efficiency. By controlling the leading vehicle on a particular road, there is potential for the rest of the vehicles on the same road to form a 'platoon', following the leader and passing through the intersection more efficiently.
What info does the car and the signal phase of traffic light exchange
→communication schemes are designed to exchange the speed, acceleration, and location of CAVs and the current signal phase of traffic light controllers to each other.
Problems in DRL:
1) Every agent, CAV, controller, interacts with the same envirornment, causing more uncertainity on traffic convergence
2) scalibility issues, exponential increase in the computational cost of joint action
3) rewards can be: individual, regional , global
System overview
: how to design the system
→ 1) system design goals
2) system components( traffic lights and CAV)
3) design of state, action, reward
4) Cooperation scheme( Vehicle to Everything V2X)
5) Training process: using PPO, parameter sharing applied
1) System design goals:
→ Reduced travel time
→ lower fuel consumption ( by speed control of CAV)
→ longer time to collision
→ ease to deploy ( not much scalibility issue)
System components
traffic light controllers and CAV.
→A) Trafffic light controller
1) Action - 1 and 0. 1- switching to next phase for next timestamp
0 - keep current phase unchanged
2) State - 3 parts : current signal phase, traffic on the road that teh traffic light controller controls, status of the closest vehicle( speed, acceleration)
3) Reward - Penalty of intersection pressure.
→ Intersection pressure: difference bw Nin - Nout ( sum of vehicle on incoming road - sum of vehicle on the outgoing road) The above is the reward function.
C- is the max road capacity. It indicates max no of vehicle in a given road ( length of road / length of single car)
→
B) CAV
1) Action → continuous, acceleration representd by this. —> -3 to 3.
2) State - speed and accleration of itself + of previous CAV vehicle + distance to preceding vehicle + approaching intersection + current signal status of current traffic light
3) Reward - deviation of avg speed v from max speed v* + acceleration + speed and acc are that of all vehicles K located on same road as CAV agentthis is the reward function
→ explanation of how reward funciton accounts for low fuel cost and higher average speed
→
Meanwhile, the cooperation schemes among agents
(i.e., the traffic light controller and the approaching CAV
agents) only rely on the information exchange of states, not
actions. This means the action for a certain agent is selected
independently from other agents’ actions. Therefore, CoTV
avoids the exponentially increased complexity of joint actions for MARL using action-dependent design
→ Using FLOW with SUMO:
→
Other models:
1) Baseline
2)FlowCAV
3) PressLight
4) GLOSA
5) I-CoTv
6) M-CoTv
7) CoTv
Next steps
1) Study the algo and how its implemented ( codebase)
2) How they are measuring the evaluation metrix ( seems trivial but get the exact answer)
3) Then try to implement other models, what is the state and everything, and how they are implementing it