Skip to main content

Pyspark action df.show() returns Java error

I was setting up my Spark development via Anaconda package on my Windows10 desktop. I used to have this set up earlier in the same machine working fine..was doing some cleanup and installing fresh again...but I am now getting issues when I invoke spark to show the data. Initializing, loading data to a data frame, importing libraries are all fine...until I call the action show ....something to do with my environment setting, what am I doing wrong?

Environment:

spark-3.1.2-bin-hadoop2.7 (SPARK_HOME & HADOOP_HOME)    
jdk1.8.0_281 (JAVA_HOME)    
Anaconda Spyder IDE    
winutils (for hadoop 2.7.7)

Python 3.9.7 (default, Sep 16 2021, 16:59:28) [MSC v.1916 64 bit (AMD64)] Type "copyright", "credits" or "license" for more information.

IPython 7.29.0 -- An enhanced Interactive Python.

import pyspark
from pyspark.sql import SparkSession
from pyspark.sql import Row
from pyspark.sql.types import StringType, StructType, StructField
from pyspark.sql import SparkSession
import pyspark.ml

spark = SparkSession.builder.getOrCreate()

data2 = [(1, "James Smith"), (2, "Michael Rose"),
         (3, "Robert Williams"), (4, "Rames Rose"), (5, "Rames rose")
         ]
df2 = spark.createDataFrame(data=data2, schema=["id", "name"])

df2.printSchema()
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
root
 |-- id: long (nullable = true)
 |-- name: string (nullable = true)

df2.show()

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
[Stage 0:>                                                          (0 + 1) / 1]
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Python was not found; run without arguments to install from the Microsoft Store, or disable this shortcut from Settings > Manage App Execution Aliases.
Traceback (most recent call last):

File "C:\Users\***\AppData\Local\Temp/ipykernel_25396/2272422252.py", line 1, in <module>
    df2.show()

  File "C:\Users\***\anaconda3\lib\site-packages\pyspark\sql\dataframe.py", line 484, in show
    print(self._jdf.showString(n, 20, vertical))

  File "C:\Users\***\anaconda3\lib\site-packages\py4j\java_gateway.py", line 1309, in __call__
    return_value = get_return_value(

  File "C:\Users\***\anaconda3\lib\site-packages\pyspark\sql\utils.py", line 111, in deco
    return f(*a, **kw)

  File "C:\Users\***\anaconda3\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(

Py4JJavaError: An error occurred while calling o36.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (DellXPS executor driver): org.apache.spark.SparkException: Python worker failed to connect back.
    at org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:182)


source https://stackoverflow.com/questions/70174906/pyspark-action-df-show-returns-java-error

Comments

Popular posts from this blog

How to show number of registered users in Laravel based on usertype?

i'm trying to display data from the database in the admin dashboard i used this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count(); echo $users; ?> and i have successfully get the correct data from the database but what if i want to display a specific data for example in this user table there is "usertype" that specify if the user is normal user or admin i want to user the same code above but to display a specific usertype i tried this: <?php use Illuminate\Support\Facades\DB; $users = DB::table('users')->count()->WHERE usertype =admin; echo $users; ?> but it didn't work, what am i doing wrong? source https://stackoverflow.com/questions/68199726/how-to-show-number-of-registered-users-in-laravel-based-on-usertype

Why is my reports service not connecting?

I am trying to pull some data from a Postgres database using Node.js and node-postures but I can't figure out why my service isn't connecting. my routes/index.js file: const express = require('express'); const router = express.Router(); const ordersCountController = require('../controllers/ordersCountController'); const ordersController = require('../controllers/ordersController'); const weeklyReportsController = require('../controllers/weeklyReportsController'); router.get('/orders_count', ordersCountController); router.get('/orders', ordersController); router.get('/weekly_reports', weeklyReportsController); module.exports = router; My controllers/weeklyReportsController.js file: const weeklyReportsService = require('../services/weeklyReportsService'); const weeklyReportsController = async (req, res) => { try { const data = await weeklyReportsService; res.json({data}) console

How to split a rinex file if I need 24 hours data

Trying to divide rinex file using the command gfzrnx but getting this error. While doing that getting this error msg 'gfzrnx' is not recognized as an internal or external command Trying to split rinex file using the command gfzrnx. also install'gfzrnx'. my doubt is I need to run this program in 'gfzrnx' or in 'cmdprompt'. I am expecting a rinex file with 24 hrs or 1 day data.I Have 48 hrs data in RINEX format. Please help me to solve this issue. source https://stackoverflow.com/questions/75385367/how-to-split-a-rinex-file-if-i-need-24-hours-data