Questions tagged [hadoop]
Hadoop is an Apache open-source project that provides software for reliable and scalable distributed computing. The core consists of a distributed file system (HDFS) and a resource manager (YARN). Various other open-source projects, such as Apache Hive use Apache Hadoop as persistence layer.
hadoop
44,417
questions
0
votes
0
answers
9
views
Azure Databricks Hadoop Streaming Error for Read From Apache Iceberg
We are building out a data lakehouse and upgrading our databricks runtime from 12.2 LTS to 14.3 LTS to support python 3.10. We are able to write into our iceberg tables, but reading those tables ...
0
votes
0
answers
18
views
Java orc-tools-2.0.1 can't use this version
I saw new version of hadoop and orc-tools libraries, so I decided to update my project.
There are my new libraries:
implementation group: 'org.apache.hadoop', name: 'hadoop-hdfs', version: '3.4.0'
...
0
votes
0
answers
25
views
Unable to read a dataframe from s3
I am getting the following error:
24/07/25 21:29:43 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/07/25 21:29:53 ...
0
votes
0
answers
8
views
How to read MapContext from jobClient (or another API)?
I'm submitting a job in a hadoop cluster.
This job has a file and is using InputFormatClass = NLineInputFormat.
After the job starts, it will create several map tasks, each one with a line of the ...
1
vote
1
answer
21
views
How do I include a static hql in other hql files?
I have multiple hql files that contain repeating initial code. So when I make changes to that bit of code, I have to change it in 12 files. I don't know if I'm using the wrong keyword search, but I am ...
0
votes
0
answers
16
views
java.lang.UnsatisfiedLinkError in PySpark when writing to Parquet file on Windows
I have written the following code:
from pyspark.sql.functions import col, udf
from pyspark.sql.types import StringType
import re
import os
os.environ['HADOOP_HOME'] = 'D:\\hadoop'
os.environ['PATH'] +...
0
votes
2
answers
31
views
Too many "Authorized committer" errors after upgrading to Pyspark==3.5.1
The problem
I have recently upgraded my apps to run on Spark3.5.1+YARN3.3.6, and observing frequent failures saying "Authorized committer". The apps run PySpark and I observe the error ...
0
votes
0
answers
20
views
Seeking insights and tools for troubleshooting issues with Hadoop3 and Hive3 [closed]
Our team is currently in the process of upgrading a system that performs statistical processing using Hadoop and Hive. We are upgrading from Hadoop 0.20 and Hive 0.1.7 to Hadoop 3.3.3 and Hive 3.1.3. ...
0
votes
0
answers
19
views
Docker Hadoop Installation Error: HADOOP_HOME and hadoop.home.dir are unset
I'm setting up a Hadoop environment using Docker and encountering an error during installation:
My setup:
Using Docker-compose.yml to configure multiple services (namenode, datanode, resourcemanager, ...
-1
votes
0
answers
21
views
HDFS Namenode fails to switch to active when active namenode goes down (v 3.4.0)
When 1 worker node goes down in HDFS HA , Still namenode is expected to run .
Ex: nn0 is active and nn1 is standby and nn0 goes down to terminating or crashloop .Then nn1 is expected to switch to ...
0
votes
0
answers
24
views
How to use API and API key on Python [closed]
I am trying to implement API keys for alpha vantage, Bloomberg and newsapi to load data into hadoop using spark
ALPHA_VANTAGE_API_URL = "https://www.alphavantage.co/query?function=...
0
votes
1
answer
20
views
How to understand the result of yarn queue status
When I run the following command to see the status of my queue:
$ yarn queue -status my-queue
Queue Information :
Queue Name : my-queue
State : RUNNING
Capacity : 10.0%
...
0
votes
0
answers
15
views
Apache oozie JA008 error - job state changed from SUCCEDED to FAILED
I'm running oozie HA 5.2.1 on EMR and I have an issue with this temporary directory. I have a workflow which has start node -> action node -> end node. The job start running -> runs for 10-15 ...
0
votes
0
answers
11
views
Apache Ranger Yarn Plugin Installation: Class Loading and Logging Configuration Problems
Class Loading Issue with Commons Logging: I have confirmed the presence of commons-logging*.jar in the directory /usr/bigtop/3.2.0/usr/lib/hadoop-yarn/share/hadoop/hdfs/lib/. Despite this, I am ...
0
votes
0
answers
16
views
The jmx data obtained from Flume is empty. How to get the correct result?
i set flume-env.sh JAVA_OPTS= -Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=5445 -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false.
i ...