Last month I had an interesting problem of standardizing dates in an input.
The dataset I got had a date field filled from different document types. There were valid dates in most of the cases but there were some numbers too which should not be considered as dates.
The challenge is to first identify whether the given value is date and if so, convert it to a standardized date format.
I went with the datetime package which offers robust functions for date/time manipulation
The two functions I used are srtptime and strftime
I used a list of know patterns I could gather from the dataset. You could increase the number of patterns as per your requirement.
Code
Output
The dataset I got had a date field filled from different document types. There were valid dates in most of the cases but there were some numbers too which should not be considered as dates.
The challenge is to first identify whether the given value is date and if so, convert it to a standardized date format.
I went with the datetime package which offers robust functions for date/time manipulation
The two functions I used are srtptime and strftime
- strptime function accepts a string and a pattern and converts the string to datetime if the string matches the pattern provided. If not it throws an exception
- strftime function formats datetime in the given pattern.
I used a list of know patterns I could gather from the dataset. You could increase the number of patterns as per your requirement.
Code
Output
Is "21-2-2019" a date?: True
Standardized date: 2019-02-21 Is "21" a date?: False |