Watch Healthcare NLP Summit 2024. Watch now.
was successfully added to your cart.

Auto-driving Cars Datasets

Through a quick look into autonomous vehicles world, we can simply realize that incidents and events are happening very fast and synchronously among market leaders of this world.

We can realize how fast and fierce competition is running over there. Tesla, Uber, Google, and are the most familiar and prominent competitors in this field.

In June 2018, Elon Musk (Tesla CEO) declared that the new version (Version 9) of Tesla software for auto-driving cars will be available by August 2018.

At almost the same time, Andrew Ng declared the launch of 6-month testing period for their first self-driving car in Texas. The test will run with a human driver to interfere in cases of emergencies.

Uber CEO, Khosrowshahi, declared that self-driving cars will be safer than humans, but they are still learning like undergraduate students.

The declarations of Elon Musk, Andrew Ng, and Khosrowshahi were not the results of a few months work, but a long time of hard work. Cars produced by Tesla since October of 2016 have all the hardware necessary for self-driving (an additional option for $8,000). On the other side, was established in 2015 by one of Andrew Ng graduate students.

It is clear for us now that what we see in 2018 have started 2-3 years ago or maybe earlier.

There are a lot of advocates and a lot of antagonists for autonomous driving. From my point of view, there are pros and cons for it like everything.


  • For investors and hiring companies, there will be no long or hard recruitment process, no salaries, no workers’ troubles, raise requests or strikes.
  • For passengers, there will be no fears of route loss
  • For the passengers, they can enjoy their time reading or have fun while reaching their destination
  • For governments: more availability of data and hence more control over the local population and government can shift the burden of public transportation to private companies.


  • Destroy your freedom:

Dictatorship governments or hackers can control auto-driving systems or steal the data of citizens.

  • Accidents:

In Tempe, Arizona, last March a self-driving Uber vehicle failed to spot a lady with a bicycle. The road was empty and that was at night.  There was a safety-driver on board, but he failed to hit the brakes at the right time. We can expect more accidents like this until systems learn well. Remember always that those systems are learning like undergraduate students. No one can know how many persons might be killed until those systems graduate.

  • Unemployment:

There will be no need for drivers, and hence downsizing and cutting more jobs.

Does this process need a training dataset?

Like most of the applications that entail AI algorithms, self-driving needs big datasets. UC Berkeley has an open source self-driving dataset. This dataset can be considered the largest dataset in this field (100,000 video sequences, each approximately 40 seconds long – 720p quality).

ApolloScape (released by Baidu Apollo), Kitti and CityScapes are other names for other datasets. ApolloScape is 10 times greater than other similar datasets, including Kitti and CityScapes, while UC Berkeley dataset is 800 times bigger than Baidu’s ApolloScape!

ApolloScape UC Berkeley
open source dataset billed as the largest self-driving dataset in the world UC Berkeley has open sourced the world’s largest and most diverse self-driving dataset
26 pre-defined semantic items contains 100,000 video sequences, each approximately 40 seconds long and in 720p quality
Baidu Apollo is the technology behind this 800 times bigger than Baidu’s ApolloScape
Name:  ApolloScape Name: ‘BDD100K’
Available at:

Github Page:

Available at:

You can find also other datasets for auto-driving cars like the one for NVIDIA Self Driving Car Training Set. I recommend reading this paper which includes 27 existing publicly available datasets. This paper will guide you to determine which training dataset is the best fit for the algorithm you are using.

If you are seeking more guidance, there are different professional organizations or companies which offer consultation services. John Snow Labs is becoming one of the leading companies in the field with its huge and diverse team of professionals in different domains dealing with data science, artificial intelligence, and healthcare machine learning datasets.

The methodology

– Data Collection:

Collecting data about public roads is done through a car provided with different sensors and types of equipment (camera, GPS, radar, and LiDAR), where the output of these sensors are stored on any suitable storage media. Training datasets used are supposed to contain data about weather conditions, traffic at different times of the day, pedestrian areas, busy intersections, multiple lane markings and all the possibilities of different driving scenarios.

You can find a detailed description of the BDD100K videos database in this paper.

– A full software stack:

The software used must include motion planning, localization, and mappings features. Moreover, the system used is supposed to categorize and identify (classify) hand gestures (used by the traffic policemen) in addition to traffic-lights.

There is no doubt, that the field is progressing very fast and a mutation is out there every 3-6 months. I think autonomous-driving gurus are carrying and preparing a lot of surprises to show by 2019.

Whether you are an advocate or an antagonist, I am sure that one day you will enjoy your trip drinking coffee and reading a book during leaving aside the burdens and stresses of driving. I believe that this day is coming soon.

A SET OF MEDICARE INDICATORS (III) – Medicare Beneficiaries in 2015

Introduction and definitions The last article of our blog series dedicated to the variations observed for Medicare indicators levels at geographical level,...