Data Checks
Match of code-description for accsmaps
The tables ‘allaccs’, ‘casualties’ and ‘vehicles’ can be exported from accmaps as csv files and then imported. A number of the fields are exported as codes so the codes have to replaced by the matching description for this analysis. Some of the codes can be linked to the fields in the open dataset guide linked to the DfT annual report. Others, such as contributory factors, are found only in the stats20 guide. T
Summary of other cleaning etc of imported data
- Convert all date character strings to date format.
- Convert all police_ref fields to upper case then trim, otherwise they won’t join properly.
- Check data type matches for all rows.
- Convert all vehicle and casualty references to three digits.
- Rename some columns to add clarity.
- In the contributory factor fields, replace NA with blank.
- Create longitude and latitude columns using easting and northing.
- Add longitude, latitude, easting and northing to casualties table.
- Merge the four partial accident description fields.
- Rename all motorcycle types into the single description “Motorcycle”.
- Add datetime, location, accident description, district into casualties table.
- Add vehicle type of each casualty into casualty table.
- Create cyclist casualties dataframe.
- Join each row of cyclist casualties dataframe to corresponding accident row (same accident can apply to more than one casualty).
- Create column showing where contributory factor is assigned to cyclist casualty.
- Map contributory factors to road safety factors.
Check for errors in the csv imports
DfT and Surrey Police data are validated by the respective organisations. The additional check here is to look for duplicates or missing rows in the imported data.
Check GB data - csv files imported from DfT road safety data site
NB all years before 2008 in imported data filtered out.
Accidents table
| Number of columns | 37 |
| Number of rows | 2,119,964 |
Vehicles table
| Number of columns | 34 |
| Number of rows | 3,884,403 |
Casualties table
| Number of columns | 21 |
| Number of rows | 2,804,090 |
Results of duplicate row check and any duplicate deletion required
Accidents
| accident_index | accident_year |
|---|
## [1] "No duplicate rows"
Number of rows in table after removing any duplicates: 2,119,964.
Vehicles
| accident_index | accident_year | vehicle_reference |
|---|
## [1] "No duplicate rows"
Number of rows in table after removing any duplicates: 3,884,403.
Casualties
| accident_index | accident_year | casualty_reference |
|---|---|---|
| 200804CC08101 | 2008 | 6 |
| 200804CC08101 | 2008 | 6 |
| 200804TA08267 | 2008 | 2 |
| 200804TA08267 | 2008 | 2 |
| 200904EA09322 | 2009 | 2 |
| 200904EA09322 | 2009 | 2 |
| 201104CK11089 | 2011 | 3 |
| 201104CK11089 | 2011 | 3 |
| 201104FJ11013 | 2011 | 4 |
| 201104FJ11013 | 2011 | 4 |
| 201204CL12218 | 2012 | 1 |
| 201204CL12218 | 2012 | 1 |
| 201304CJ13162 | 2013 | 3 |
| 201304CJ13162 | 2013 | 3 |
| 201304EA13001 | 2013 | 3 |
| 201304EA13001 | 2013 | 3 |
| 2013471305010 | 2013 | 1 |
| 2013471305010 | 2013 | 1 |
| 201404CJ14198 | 2014 | 3 |
| 201404CJ14198 | 2014 | 3 |
| 201404WA14102 | 2014 | 3 |
| 201404WA14102 | 2014 | 3 |
| 201604ED16270 | 2016 | 2 |
| 201604ED16270 | 2016 | 2 |
## Number of duplicate rows = 12
Number of rows in table after removing any duplicates: 2,804,078.
Check Surrey data - csv files imported from Accsmap
Number of unique police references in each export
Should be the same for all three.
| Table export | Number of unique police references |
|---|---|
| casualties | 57700 |
| vehicles | 57699 |
| accidents | 57700 |
Does the most recent casualty match?
| Table export | Most recent record |
|---|---|
| casualties | 2024-12-31 |
| collisions | 2024-12-31 10:15:00 |
| accidents | 2024-12-31 |
Number of police references in one table but not another
Totals should all be 0. The listings should all be <0 rows>.
Total number
| Comparison | Number |
|---|---|
| Refs in casualties but not in vehicles | 1 |
| Refs in casualties but not in accidents | 0 |
| Refs in vehicles but not in casualties | 0 |
| Refs in vehicles but not in accidents | 0 |
| Refs in accidents but not in casualties | 0 |
| Refs in accidents but not in vehicles | 1 |
Listing of anomalies
Refs in casualties but not in vehicles
## date police_ref
## 1 2011-04-16 SP59738/11
Refs in casualties but not in accidentss
## [1] date police_ref
## <0 rows> (or 0-length row.names)
Refs in vehicles but not in casualties
## [1] date police_ref
## <0 rows> (or 0-length row.names)
Refs in vehicles but not in accidents
## [1] date police_ref
## <0 rows> (or 0-length row.names)
Refs in accidents but not in casualties
## [1] datetime police_ref
## <0 rows> (or 0-length row.names)
Refs in accidents but not in vehicles
## datetime police_ref
## 1 2011-04-16 16:55:00 SP59738/11
Check imported data tables against published DfT data
This checks that the total numbers used on this site are the same as DfT published data.
Data import from Accsmaps compared to DfT data
This is covered in the data sources page.
Data import from DfT compared to published DfT data
DfT publish the annual total of cyclist casualties in GB through table RAS0201, so this should be the same as in the csv files imported from their site:
| ras40006.ras0201_year | total.from.ras40006.ras0201 | csv import total | difference |
|---|---|---|---|
| 2010 | 17185 | 17185 | 0 |
| 2011 | 19215 | 19215 | 0 |
| 2012 | 19091 | 19091 | 0 |
| 2013 | 19438 | 19438 | 0 |
| 2014 | 21287 | 21287 | 0 |
| 2015 | 18844 | 18844 | 0 |
| 2016 | 18477 | 18477 | 0 |
| 2017 | 18321 | 18321 | 0 |
| 2018 | 17550 | 17550 | 0 |
| 2019 | 16884 | 16883 | -1 |
| 2020 | 16294 | 16294 | 0 |
| 2021 | 16458 | 16458 | 0 |
| 2022 | 15693 | 15693 | 0 |
Miscellaneous subsetting checks
This is a place for recording of the reason for an unexpected number of rows has been found in a sub-set. The total number of cyclist casualties in the GB data is 283,096 and in the Surrey data 8,132.
GB - DfT data
Mapping number of GB cyclist casualties
Some of the GB data are missing longitude and latitude, so cannot be presented on the GB map. Of the total of 283,096 cyclist casualties recorded in GB, 32 have had to be excluded from the map.
Crash types - avoiding ‘many to many’ relationship between casualty and accident tables
To create the crash types, the table of GB cyclist casualties is joined to the accident table. The latter has the number of participants in the accident by mode type. However, as in some accidents there were two or more cyclist casualties, only distinct rows the from table of GB cyclist casualties have been used. This reduces the number of rows in the table of GB cyclist casualties from the correct number of 283,096 rows down to 280,124, so a slightly less than full picture is presented.
Crash types ‘other’
‘Other’ crash types are excluded from the totals on the Comparisons: other places page. This is because the number of ‘other’ = 202 - so is relatively small.
Surrey - Accsmap data
Crash types ‘other’
‘Other’ crash types are excluded from the totals on the how page. This is because the number of ‘other’ = 5 - so is relatively small; each of these are 2+ vehicles with ped and cyclist casualties.