People injured whilst cycling in Surrey

Information from police records

Data Checks

Match of code-description for accsmaps

The tables ‘allaccs’, ‘casualties’ and ‘vehicles’ can be exported from accmaps as csv files and then imported. A number of the fields are exported as codes so the codes have to replaced by the matching description for this analysis. Some of the codes can be linked to the fields in the open dataset guide linked to the DfT annual report. Others, such as contributory factors, are found only in the stats20 guide. T

Summary of other cleaning etc of imported data

  1. Convert all date character strings to date format.
  2. Convert all police_ref fields to upper case then trim, otherwise they won’t join properly.
  3. Check data type matches for all rows.
  4. Convert all vehicle and casualty references to three digits.
  5. Rename some columns to add clarity.
  6. In the contributory factor fields, replace NA with blank.
  7. Create longitude and latitude columns using easting and northing.
  8. Add longitude, latitude, easting and northing to casualties table.
  9. Merge the four partial accident description fields.
  10. Rename all motorcycle types into the single description “Motorcycle”.
  11. Add datetime, location, accident description, district into casualties table.
  12. Add vehicle type of each casualty into casualty table.
  13. Create cyclist casualties dataframe.
  14. Join each row of cyclist casualties dataframe to corresponding accident row (same accident can apply to more than one casualty).
  15. Create column showing where contributory factor is assigned to cyclist casualty.
  16. Map contributory factors to road safety factors.

Check for errors in the csv imports

DfT and Surrey Police data are validated by the respective organisations. The additional check here is to look for duplicates or missing rows in the imported data.

Check GB data - csv files imported from DfT road safety data site

NB all years before 2008 in imported data filtered out.

Accidents table

Number of columns 37
Number of rows 2,119,964

Vehicles table

Number of columns 34
Number of rows 3,884,403

Casualties table

Number of columns 21
Number of rows 2,804,090

Results of duplicate row check and any duplicate deletion required

Accidents

accident_index accident_year
## [1] "No duplicate rows"

Number of rows in table after removing any duplicates: 2,119,964.

Vehicles

accident_index accident_year vehicle_reference
## [1] "No duplicate rows"

Number of rows in table after removing any duplicates: 3,884,403.

Casualties

accident_index accident_year casualty_reference
200804CC08101 2008 6
200804CC08101 2008 6
200804TA08267 2008 2
200804TA08267 2008 2
200904EA09322 2009 2
200904EA09322 2009 2
201104CK11089 2011 3
201104CK11089 2011 3
201104FJ11013 2011 4
201104FJ11013 2011 4
201204CL12218 2012 1
201204CL12218 2012 1
201304CJ13162 2013 3
201304CJ13162 2013 3
201304EA13001 2013 3
201304EA13001 2013 3
2013471305010 2013 1
2013471305010 2013 1
201404CJ14198 2014 3
201404CJ14198 2014 3
201404WA14102 2014 3
201404WA14102 2014 3
201604ED16270 2016 2
201604ED16270 2016 2
## Number of duplicate rows = 12

Number of rows in table after removing any duplicates: 2,804,078.

Check Surrey data - csv files imported from Accsmap

Number of unique police references in each export

Should be the same for all three.

Table export Number of unique police references
casualties 57700
vehicles 57699
accidents 57700

Does the most recent casualty match?

Table export Most recent record
casualties 2024-12-31
collisions 2024-12-31 10:15:00
accidents 2024-12-31

Number of police references in one table but not another

Totals should all be 0. The listings should all be <0 rows>.

Total number

Comparison Number
Refs in casualties but not in vehicles 1
Refs in casualties but not in accidents 0
Refs in vehicles but not in casualties 0
Refs in vehicles but not in accidents 0
Refs in accidents but not in casualties 0
Refs in accidents but not in vehicles 1

Listing of anomalies

Refs in casualties but not in vehicles

##         date police_ref
## 1 2011-04-16 SP59738/11

Refs in casualties but not in accidentss

## [1] date       police_ref
## <0 rows> (or 0-length row.names)

Refs in vehicles but not in casualties

## [1] date       police_ref
## <0 rows> (or 0-length row.names)

Refs in vehicles but not in accidents

## [1] date       police_ref
## <0 rows> (or 0-length row.names)

Refs in accidents but not in casualties

## [1] datetime   police_ref
## <0 rows> (or 0-length row.names)

Refs in accidents but not in vehicles

##              datetime police_ref
## 1 2011-04-16 16:55:00 SP59738/11

Check imported data tables against published DfT data

This checks that the total numbers used on this site are the same as DfT published data.

Data import from Accsmaps compared to DfT data

This is covered in the data sources page.

Data import from DfT compared to published DfT data

DfT publish the annual total of cyclist casualties in GB through table RAS0201, so this should be the same as in the csv files imported from their site:

ras40006.ras0201_year total.from.ras40006.ras0201 csv import total difference
2010 17185 17185 0
2011 19215 19215 0
2012 19091 19091 0
2013 19438 19438 0
2014 21287 21287 0
2015 18844 18844 0
2016 18477 18477 0
2017 18321 18321 0
2018 17550 17550 0
2019 16884 16883 -1
2020 16294 16294 0
2021 16458 16458 0
2022 15693 15693 0

Miscellaneous subsetting checks

This is a place for recording of the reason for an unexpected number of rows has been found in a sub-set. The total number of cyclist casualties in the GB data is 283,096 and in the Surrey data 8,132.

GB - DfT data

Mapping number of GB cyclist casualties

Some of the GB data are missing longitude and latitude, so cannot be presented on the GB map. Of the total of 283,096 cyclist casualties recorded in GB, 32 have had to be excluded from the map.

Crash types - avoiding ‘many to many’ relationship between casualty and accident tables

To create the crash types, the table of GB cyclist casualties is joined to the accident table. The latter has the number of participants in the accident by mode type. However, as in some accidents there were two or more cyclist casualties, only distinct rows the from table of GB cyclist casualties have been used. This reduces the number of rows in the table of GB cyclist casualties from the correct number of 283,096 rows down to 280,124, so a slightly less than full picture is presented.

Crash types ‘other’

‘Other’ crash types are excluded from the totals on the Comparisons: other places page. This is because the number of ‘other’ = 202 - so is relatively small.

Surrey - Accsmap data

Crash types ‘other’

‘Other’ crash types are excluded from the totals on the how page. This is because the number of ‘other’ = 5 - so is relatively small; each of these are 2+ vehicles with ped and cyclist casualties.