Skip to main content
The 2024 Developer Survey results are live! See the results

Questions tagged [hadoop]

Hadoop is an Apache open-source project that provides software for reliable and scalable distributed computing. The core consists of a distributed file system (HDFS) and a resource manager (YARN). Various other open-source projects, such as Apache Hive use Apache Hadoop as persistence layer.

hadoop
0 votes
0 answers
9 views

Azure Databricks Hadoop Streaming Error for Read From Apache Iceberg

We are building out a data lakehouse and upgrading our databricks runtime from 12.2 LTS to 14.3 LTS to support python 3.10. We are able to write into our iceberg tables, but reading those tables ...
Daniel Brenner's user avatar
0 votes
0 answers
18 views

Java orc-tools-2.0.1 can't use this version

I saw new version of hadoop and orc-tools libraries, so I decided to update my project. There are my new libraries: implementation group: 'org.apache.hadoop', name: 'hadoop-hdfs', version: '3.4.0' ...
LukiBoy's user avatar
0 votes
0 answers
25 views

Unable to read a dataframe from s3

I am getting the following error: 24/07/25 21:29:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 24/07/25 21:29:53 ...
Minu's user avatar
  • 7
0 votes
0 answers
8 views

How to read MapContext from jobClient (or another API)?

I'm submitting a job in a hadoop cluster. This job has a file and is using InputFormatClass = NLineInputFormat. After the job starts, it will create several map tasks, each one with a line of the ...
user3783810's user avatar
1 vote
1 answer
21 views

How do I include a static hql in other hql files?

I have multiple hql files that contain repeating initial code. So when I make changes to that bit of code, I have to change it in 12 files. I don't know if I'm using the wrong keyword search, but I am ...
Patricia Flickner's user avatar
0 votes
0 answers
16 views

java.lang.UnsatisfiedLinkError in PySpark when writing to Parquet file on Windows

I have written the following code: from pyspark.sql.functions import col, udf from pyspark.sql.types import StringType import re import os os.environ['HADOOP_HOME'] = 'D:\\hadoop' os.environ['PATH'] +...
ArpanMona's user avatar
0 votes
2 answers
31 views

Too many "Authorized committer" errors after upgrading to Pyspark==3.5.1

The problem I have recently upgraded my apps to run on Spark3.5.1+YARN3.3.6, and observing frequent failures saying "Authorized committer". The apps run PySpark and I observe the error ...
akki's user avatar
  • 2,202
0 votes
0 answers
20 views

Seeking insights and tools for troubleshooting issues with Hadoop3 and Hive3 [closed]

Our team is currently in the process of upgrading a system that performs statistical processing using Hadoop and Hive. We are upgrading from Hadoop 0.20 and Hive 0.1.7 to Hadoop 3.3.3 and Hive 3.1.3. ...
hisa's user avatar
  • 11
0 votes
0 answers
19 views

Docker Hadoop Installation Error: HADOOP_HOME and hadoop.home.dir are unset

I'm setting up a Hadoop environment using Docker and encountering an error during installation: My setup: Using Docker-compose.yml to configure multiple services (namenode, datanode, resourcemanager, ...
Jay Padhiyar's user avatar
-1 votes
0 answers
21 views

HDFS Namenode fails to switch to active when active namenode goes down (v 3.4.0)

When 1 worker node goes down in HDFS HA , Still namenode is expected to run . Ex: nn0 is active and nn1 is standby and nn0 goes down to terminating or crashloop .Then nn1 is expected to switch to ...
mpsimham's user avatar
0 votes
0 answers
24 views

How to use API and API key on Python [closed]

I am trying to implement API keys for alpha vantage, Bloomberg and newsapi to load data into hadoop using spark ALPHA_VANTAGE_API_URL = "https://www.alphavantage.co/query?function=...
Onuh John Edoh Adanu's user avatar
0 votes
1 answer
20 views

How to understand the result of yarn queue status

When I run the following command to see the status of my queue: $ yarn queue -status my-queue Queue Information : Queue Name : my-queue State : RUNNING Capacity : 10.0% ...
Tom's user avatar
  • 6,196
0 votes
0 answers
15 views

Apache oozie JA008 error - job state changed from SUCCEDED to FAILED

I'm running oozie HA 5.2.1 on EMR and I have an issue with this temporary directory. I have a workflow which has start node -> action node -> end node. The job start running -> runs for 10-15 ...
Stefan Ss's user avatar
0 votes
0 answers
11 views

Apache Ranger Yarn Plugin Installation: Class Loading and Logging Configuration Problems

Class Loading Issue with Commons Logging: I have confirmed the presence of commons-logging*.jar in the directory /usr/bigtop/3.2.0/usr/lib/hadoop-yarn/share/hadoop/hdfs/lib/. Despite this, I am ...
Sobit's user avatar
  • 1
0 votes
0 answers
16 views

The jmx data obtained from Flume is empty. How to get the correct result?

i set flume-env.sh JAVA_OPTS= -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=5445 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false. i ...
ray's user avatar
  • 43

15 30 50 per page
1
2 3 4 5
2962