https://dreaife-team.atlassian.net/browse/DREAITE-39
在看菜鸟的pandas对格式错误清洗时,发现菜鸟提供的代码在我现在的版本跑不通。
把报错在网上找了半天都是把报错errors参数给修改的。
最后重看了下报错信息,发现把format改成mixed,告诉pandas数据格式混合就可以(汗),应该是python3版本太新的问题
报错代码:
import pandas as pd
# 第三个日期格式错误data = { "Date": ['2020/12/01', '2020/12/02' , '20201226'], "duration": [50, 40, 45]}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
df['Date'] = pd.to_datetime(df['Date'])
print(df.to_string())错误信息:
ValueError: time data "20201226" doesn't match format "%Y/%m/%d", at position 2. You might want to try: - passing `format` if your strings have a consistent format; - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format; - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.修复代码:
import pandas as pd
# 第三个日期格式错误data = { "Date": ['2020/12/01', '2020/12/02' , '20201226'], "duration": [50, 40, 45]}df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
df['Date'] = pd.to_datetime(df['Date'], format='mixed')# df['Date'] = pd.to_datetime(df['Date'],format="%Y/%m/%d",errors='ignore')
print(df.to_string())
https://dreaife-team.atlassian.net/browse/DREAITE-39
While looking at Runoob’s guide on cleaning formatting errors in pandas, I found that the code Runoob provides doesn’t run on my current version.
Most online resources for this error suggest modifying the errors parameter.
Finally, after re-reading the error message, I realized that changing the format to ‘mixed’—to indicate mixed data formats—solves it (sweat). It seems to be an issue with my Python 3 version being too new.
Error code:
import pandas as pd
# 第三个日期格式错误data = { "Date": ['2020/12/01', '2020/12/02' , '20201226'], "duration": [50, 40, 45]}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
df['Date'] = pd.to_datetime(df['Date'])
print(df.to_string())Error message:
ValueError: time data "20201226" doesn't match format "%Y/%m/%d", at position 2. You might want to try: - passing `format` if your strings have a consistent format; - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format; - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.Fixed code:
import pandas as pd
# 第三个日期格式错误data = { "Date": ['2020/12/01', '2020/12/02' , '20201226'], "duration": [50, 40, 45]}df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
df['Date'] = pd.to_datetime(df['Date'], format='mixed')# df['Date'] = pd.to_datetime(df['Date'],format="%Y/%m/%d",errors='ignore')
print(df.to_string())
https://dreaife-team.atlassian.net/browse/DREAITE-39
pandas のフォーマットエラーのデータクリーニングを見ていると、初心者向けサイト が提供したコードが、私の現在のバージョンでは動作しないことに気づいた。
エラーをネットで探しても、ほとんどが errors 引数を修正することに関するものだった。
最後にエラーメッセージを再度確認すると、format を mixed に変更すれば、データ形式が混在していることを pandas に伝えられる、ということになる(汗)。おそらく Python 3 のバージョンが新しすぎる問題だと思われる。
エラーコード:
import pandas as pd
# 第三个日期格式错误data = { "Date": ['2020/12/01', '2020/12/02' , '20201226'], "duration": [50, 40, 45]}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
df['Date'] = pd.to_datetime(df['Date'])
print(df.to_string())エラーメッセージ:
ValueError: time data "20201226" doesn't match format "%Y/%m/%d", at position 2. You might want to try: - passing `format` if your strings have a consistent format; - passing `format='ISO8601'` if your strings are all ISO8601 but not necessarily in exactly the same format; - passing `format='mixed'`, and the format will be inferred for each element individually. You might want to use `dayfirst` alongside this.修正コード:
import pandas as pd
# 第三个日期格式错误data = { "Date": ['2020/12/01', '2020/12/02' , '20201226'], "duration": [50, 40, 45]}df = pd.DataFrame(data, index = ["day1", "day2", "day3"])
df['Date'] = pd.to_datetime(df['Date'], format='mixed')# df['Date'] = pd.to_datetime(df['Date'],format="%Y/%m/%d",errors='ignore')
print(df.to_string())部分信息可能已经过时









