Skip to main content

Remove unicode string and spaces from json response using python

I am posting a data to a REST API and as a response I am receiving the data as a json response. This response having a unicode characters. I tried many approaches to remove the unicode character from my json data but nothing worked :( and after the removal of unicode charaters I am validating the schema. Following is my code snippet:

import json
import logging
from jsonschema import validate

if __name__ == '__main__':

    schema = {
        "$schema": "http://json-schema.org/draft-04/schema#",
        "type": "object",
        "properties": {
            "SOURCE": {
            "type": "string"
            },
            "TIMESTAMP": {
            "type": "string"
            },
            "TAGERRORS": {
            "type": "array",
            "items": [
                {
                "type": "object",
                "properties": {
                    "TAGNAME": {
                     "type": "string"
                     },
                    "ERROR": {
                    "type": "string"
                    }
                },
                "required": [
                    "TAGNAME",
                    "ERROR"
                ]
                }
            ]
            }
        },
        "required": [
            "SOURCE",
            "TIMESTAMP",
            "TAGERRORS"
        ]
    }

    response_dict ='"{\\u000d\\u000a  \\"SOURCE\\": \\"APPDEV\\",\\u000d\\u000a  \\"TIMESTAMP\\": \\"2022-04-19 12:29:27\\",\\u000d\\u000a  \\"TAGERRORS\\": []\\u000d\\u000a}"'
    response_dict = response_dict.replace("\\u000d\\u000a\\s*", "")
    print(response_dict)
    my_json = json.loads(response_dict)
    # validate(instance=my_json, schema=schema)

    # print(my_json)

The response_dict = response_dict.replace("\\u000d\\u000a\\s*", "") is not working and giving the following result.
"{\u000d\u000a \"SOURCE\": \"APPDEV\",\u000d\u000a \"TIMESTAMP\": \"2022-04-19 12:29:27\",\u000d\u000a \"TAGERRORS\": []\u000d\u000a}"
Also, I tried the following regex to remove the unicode characters but it failing during the schema validation.

import re

def removeunicode(text):
    text = re.sub(r'\\[u]\S\S\S\S[s]', "", text)
    text = re.sub(r'\\[u]\S\S\S\S', "", text)
    return text

my_json = json.loads(removeunicode(response_dict))

Can you please help to resolve the issue. Thanks.



source https://stackoverflow.com/questions/71928024/remove-unicode-string-and-spaces-from-json-response-using-python

Comments