Teaching a vehicle to see in real time

It’s easy for you to instantly identify a stop sign. Whether it happens to be night or day, you can still likely identify a stop sign. If it’s fair or raining, morning or mid-day, directly in front of you or skewed to the side, your brain still relays to you that you’re seeing a stop sign. Teaching a computer to make that same connection regardless of the conditions, however, presents a challenge.

Building a scalable network that detects and identifies objects as fast as your brain starts with the vision of the vehicle. Forward-facing cameras and radar will soon be standard equipment in all cars. Those cameras will see everything we see, naturally, but those cameras have to be connected to a system that can accurately identify the things in the field of view.


As the first step, the navigation system must be taught what to look for. Teaching a computer to identify a particular sign is a huge task. To reliably learn a new sign, for example, any system will require example pictures of that sign in all conditions, and from all possible angles. This results in hundreds of thousands of images of the same sign.

Manually labeling all those images is incredibly inefficient. To solve this, HERE begins with a smaller, more manageable set of images. Those images are used to prompt the AI to teach itself to find the sign in question. Using state-of-the-art machine learning, the algorithm uses all of the other images to teach itself to find the sign in any conditions. This enables the system to rapidly and reliably add new objects to the database at a massive scale.

Using this approach, HERE built an extended database of objects. The next task is identifying those objects in real-time.

Consider the stop sign example from before. Imagine that stop sign is in front of a vehicle that can take a picture of the intersection in front of it. Unfortunately, that single picture is of limited use. A 2D picture of an intersection may contain a multitude of important objects: the stop sign, cross-walk markers, turn signs printed on the road, the middle lane divider, the posted speed limit, and quite a bit more. The objects might be detected by the system’s AI, but their distance and relative position would be extremely difficult to tell from one picture.

To solve this, the HERE system takes multiple pictures of the environment – at a rate of 20-30 pictures per second. As a vehicle moves, each picture is paired with the GPS location data. The navigation system then triangulates where objects are in the scene, which transforms a 2D image into a 3D environment model.


In this example, we have a database of objects that can be identified in any environment. That database is paired with location data, and a 3D view of the car’s environment. These two pieces, joined together in a scalable distributed network, are how a HERE autonomous car can orient itself on the road, and respond instantly to changing conditions.

The ecosystem of information extends well beyond this example. When differences between the HD Live Map and a car’s data are detected, the information must be added to the cloud network. That new information has to be processed on its own, then distributed to other vehicles when deemed necessary. This enables Self Healing Maps, and it’s how HERE is continuing to enable autonomous vehicles.

Topics: Automotive, Powered by HERE