Creating Dataframe With Json Keys
I have a JSON file which resulted from YouTube's iframe API and I want to put this JSON data into a pandas dataframe, where each JSON key will be a column, and each record should b
Solution 1:
A Pythonic Solution would be to use the keys and values API of the Python Dictionary.
it should be something like this:
ls = [
"{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}",
"{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"
]
ls = [json.loads(j) for j in ls]
keys = [j.keys() for j in ls] # this will get you all the keys
vals = [j.values() for j in ls] # this will get the values and then you can do something with it
print(keys)
print(values)
Solution 2:
easiest way is to leverage json_normalize
from pandas
.
import json
from pandas.io.json import json_normalize
input_dict = [
"{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}",
"{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"
]
input_json = [json.loads(j) for j in input_dict]
df = json_normalize(input_json)
Solution 3:
I think you are asking to break down your key and values and want keys as a column,and values as a row: This is my approach and plz always provide how your expected output should like
ChainMap flats your dict in key and values and pretty much is self explanatory.
data = ["{\"timemillis\":1563467467703,\"date\":\"18.7.2019\",\"time\":\"18:31:07,703\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:02\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.3,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}","{\"timemillis\":1563467468705,\"date\":\"18.7.2019\",\"time\":\"18:31:08,705\",\"videoId\":\"0HJx2JhQKQk\",\"startSecond\":\"0\",\"stopSecond\":\"90\",\"playerStateNumeric\":1,\"playerStateVerbose\":\"Playing\",\"curTimeFormatted\":\"0:03\",\"totalTimeFormatted\":\"9:46\",\"playoutLevelPercent\":0.5,\"bufferLevelPercent\":1.4,\"qual\":\"large\",\"qualLevels\":[\"hd720\",\"large\",\"medium\",\"small\",\"tiny\",\"auto\"],\"playbackRate\":1,\"playbackRates\":[0.25,0.5,0.75,1,1.25,1.5,1.75,2],\"playerErrorNumeric\":\"\",\"playerErrorVerbose\":\"\"}"]
import json
from collections import ChainMap
data = [json.loads(i) for i in data]
data = dict(ChainMap(*data))
keys = []
vals = []
for k,v in data.items():
keys.append(k)
vals.append(v)
data = pd.DataFrame(zip(keys,vals)).T
new_header = data.iloc[0]
data = data[1:]
data.columns = new_header
#startSecond playbackRates playbackRate qual totalTimeFormatted timemillis playerStateNumeric playerStateVerbose playerErrorNumeric date time stopSecond bufferLevelPercent playerErrorVerbose qualLevels videoId curTimeFormatted playoutLevelPercent
#0 [0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2] 1 large 9:4615634674677031Playing18.7.201918:31:07,703901.4 [hd720, large, medium, small, tiny, auto] 0HJx2JhQKQk 0:020.3
Post a Comment for "Creating Dataframe With Json Keys"