We looked at Humancentric Digital Twins from the perspective of Virtual Reality. We saw how humans, are not only affected by, but can have an active say in the end product
We are focusing in from Digital Twins informed by humans to Digital Twins about Humans in the spaces they live in.
What could simulations of the microscopic details of our body, from moving your ankle to sneezing, mean for future living? This week’s expert He Wang gives us a look into the world of crowd simulation…
He Wang (pictured centre) is a Lecturer at the School of computing at The University of Leeds. His expertise is on computer graphics and computer vision, and nowadays he is focusing on machine learning and applied machine learning within those fields of expertise. Aside from lecturing and research, He is the director of a program called a “High Performance Graphics and Game Engineering” which runs as a degree program at undergraduate and masters level, where they teach people how to make games, focusing on the hard kind of skills which comes from mathematics, programming and physics. He is also the academic lead in artificial intelligence and visualization in the newly established Centre for Immersive Technology at Leeds, as well as closely involved in other interdisciplinary themes like VirtuoCity, who we heard from last week.
He Wang’s research expertise is closely related to Digital Twins in terms of 3D visualization and Crowd simulation and today he sits down with Slingshot Founder David McKee to discuss what Digital Twins could mean to his area of research in crowd simulation and what this could mean on a microscopic level.
In my mind a Digital City consumes the data from the ever-changing physical city. So the two cities are like co-dependent species, as the digital city would consume all the data that is generated in the physical city. So that’s like a food provision for the digital city. The digestive system of the digital city would be:
Then the outcome of that would be to for example, inform the decision-making seen generally in the physical city.
There are in general two kinds of Crowd Behaviour Researchers within the field of engineering or Urban Analytics: the first type is only interested in high level statistics. They can do crowd simulation or crowd behaviour modelling, but they are mainly interested in clusters and high-level statistics like population density. I reside in the second type, which is what we call microscopic simulations.
In my crowd modelling practice, every single person matters. The level of detail that I go into as I try to model humans goes all the way down to individual body motions. For example, if you’re walking, something as small as your arm swing will be important to me. But that will not be as important to people in Urban Analytics where they only have to treat each individual as a 2D particle moving in a big landscape.
From this perspective I see residents, or space users as I would like to call them, as a centre or one of the many central units that would have generated data for a Digital Twin and therefore should be given priorities when it comes to decision making and policy making
So the whole idea is to make decision-making human centred, which could impact how our city looks and behaves in the future. Let me give you a futuristic example:
Walls and pillars can now move easily in the future and they can be controlled by AI driven methods. That would allow you to optimize the space configuration to facilitate the human activities there. To make that happen there needs to be a collective effort consisting of certain people:
This example would be my ideal take on what a Digital Twin could be doing for us in the future.
That’s a very good question. In one sentence you hammered multiple nails in my field which we haven’t been able to solve completely. The three main challenges for me right now are:
Modelling Subconscious Behaviours
In most of the existing work in crowd simulation, you probably have noticed that it treats individuals as 2D discs. That’s a massively overly simplified version of humans, making it not so useful in situations like COVID19, which is the first concrete challenge. We need to be able to model at least the 3D low level motionslike steering, acceleration, deceleration, arm swing, leg swing, the low level behaviours, which we probably don’t think of in our brain but just subconsciously do.
Capturing Sufficient and Accurate Data
The high-level behaviours are also hard to crack because in traditional empirical modelling, people can summarize people behaviours into categories, then use reasoning and logic or even the simplest finite state machines to predefine all the behaviours. The challenge is that many rule-based systems don’t capture enough information to be able to do accurate prediction. They can simulate realistic behaviours in general, but it is very hard to use them to do relatively accurate predictions for specific scenarios. For all the machine learning methods, including Deep Learning, we find that we can have a model that has enough data consumption capacity to learn complex behaviours but do not have enough data.
The next challenge, especially for my type of research who are interested in individual motions, is our need to consume large amount of data. The sensor network out there is not enough at the moment, it’s not covering everywhere and most of the time it doesn’t track at the level of detail we require. Secondly, even though we can track we still need a mechanism to distil that data to remove any irrelevant information because the data size is going to be huge. Only after all the data has been processed and stored appropriately can we begin to model.
The biggest challenge is that you want your model to be explainable. The old explainable diagram of research can explain things but cannot take much data in. Ironically, the newer methods can take a lot more data in but they’re not explainable and humans do not trust these kinds of black box systems. So, a combination of both needs to be attempted at least to address the explainability issue in crowd modelling.
Treating all the humans as my Lab Rats. OK that’s a wrong thing to say but in my ideal world I can attach sensors to people directly to see what they see, what they feel, and their full body motions. I’m talking about full body motion, not only a point in 2D space, and I would want to know when you swing your left arm while your swing your right leg.
Of course, that’s not going to happen. So a less accurate, but more feasible way, is to equip spaces with all sensors like video cameras and Wi-Fi stations, maybe even using people’s mobile phones if possible. So all the sensor data can be used to infer people’s activities. We’re only talking about low level activities at this point. When you do data fusion at that level you might be able to get a full picture of low-level motions, but the current situation is most spaces are not sensorised at all.
People have been using the Markov Decision Processes to model the low-level decision-making processes for many years. In the context of reinforcement or statistical machine learning, you can come up with a model but it’s very hard for the model to consume large amounts of data. This is before deep learning. It’s really difficult to capture all the heterogeneity in the data.
what we are doing right now is to see whether we can leverage both kinds of methodologies (Reinforcement Learning or statistical machine learning, and Deep Learning), by combining the data capacity of deep learning with the explainability of traditional statistical machine learning methods like Bayesian networks or reinforcement learning. There has been some attempts in the direction in theory, and applications. For example, deep reinforcement learning has been used to play go against humans such as Alpha Go and they’re really good at it. But again, a lot of them in my opinion are used as an optimizer to find the best strategy to carry out actions instead of being able to explain why people do that.
Explaining why people are doing what they do remains a research challenge. Let’s use the train station as an example: If you see a surge of a flow from the front entrance to the ticket office then to information desk then to the ticket office again then to a platform. You can model it by reinforcement learning. You can then observe that 20% of the people in this surge went to the information desk and then went back to the ticket office before going to the platform. But you won’t be able to understand what caused it for each person in the surge.
As humans we can intuitively understand or interpret some of it: Maybe these people really want to get a train, but they got confused? That’s why they’ve gone to the ticket office first and then they had to go to the information desk to gather more information then go back to the ticket office to buy a ticket and then wait for the train on the platform. Or they could be stressed out that their trains were cancelled? So, they had to rush into this station, then got tickets to board a new train. All the high level inferences are not there and the space managers will need to know it. That’s the key difference the model explainability can bring.
Data bias is a really complicated issue. I know that back in 2019 Georgia Tech University published a paper investigating the systematic bias on pedestrian tracking algorithms: They found that self-driving cars may be likelier to hit black people than white people Looking at this example it shows there is no good automatic way of avoiding data bias so far.
In my area where you model a crowd, personal space is a big factor. Personal space is very different in different cultures. In highly populated countries, people tend to be used to staying very close to each other. But when I first came to this country and saw people queue in front of a bus stop, it was a real eye opener for me, because each person stood around 1 1/2 meter away from each other and then naturally they formed a queue. Although there were only 5 or 6 people it took a lot of space on the street. So I suddenly realised that personal space is big factor here.
There are lots of other high level factors that we need to understand, even if you don’t model those high level behaviours, as they influence the low level decision making. The space will decide the density, then the density will influence greatly on the velocity with which people could move, which will further influence the overall efficiency of the transportation system. I’m not only talking about the differences between different ethnicities but also between different cultures and different cities. In my area I don’t see that as a problem of modelling rather than a problem of what data is available to use. So if the bias exists in the data already, there’s no good way to automatically make the models to realize that. So it means again, this goes back to your data collection question, if we’re talking about one city, you cannot avoid the bias: If we were talking about multiple countries, we might be able to.
A Digital Twin of a city is where the digital twin of the residents live a couple of minutes ahead of you in time.
This is more of a description of a Digital Twin for me rather than a definition. I try to interpret Digital Twins in terms of its functionality in prediction and use that as a description. I do believe that different people will have different definitions.
The biggest impact for me is it could save lives in situations like stampedes.
The craziest impact is it could one day transform our city into a big transformer. And the transformer will keep transforming itself to facilitate the predicted behaviours of humans.
David: So it’s really about creating an Organism. Making it organic
Organism, that’s a very good word because, remember I mentioned that the digital twin and the physical city are like two co-dependent species? There I see the Digital and Physical twins co-evolving in time.
I’ve noticed that a lot of academic people have been talking about this Digital Twin idea, but I would like to raise this research vision onto the level of Research Challenge and Global Challenge, because I don’t think this is feasible in one domain. It has to be an effort of all domains and all countries in my opinion. This should be in the UN discussions instead of only being talked about at the University of Leeds.