Skip to main content
The 2024 Developer Survey results are live! See the results

All Questions

Tagged with
0 votes
1 answer
22 views

How to filter dataframe by column from different dataframe?

I want to filter dataframe by column with Strings from different dataframe. val booksReadBestAuthors = userReviewsWithBooksDetails.filter(col("authors").isin(userReviewsAuthorsManyRead:_*)) ...
Joanna Kois's user avatar
0 votes
1 answer
31 views

What is the maximum number of entries a array can in a Spark column can hold?

I've created a struct with the data of some columns combined. Large numbers of these structs now occur for my unique identifier values. I want to combine these structs into an array using collect_list....
M.S.Visser's user avatar
0 votes
1 answer
75 views

How to create data-frame on RocksDB (SST files)

We hold our documents in RocksDB. We will be syncing these RocksDB sst files to S3. I would like to create a dataframe on the SST files and later run an SQL query. When I googled, I was not able to ...
chendu's user avatar
  • 729
0 votes
0 answers
22 views

Flattening nested json with back slash in apache spark scala Dataframe

{ "messageBody": "{\"task\":{\"taskId\":\"c6d9fb0e-42ba-4a3e-bd39-f2a32a6958c1\",\"serializedTaskData\":\"{\\\"clientId\\\":\\\&...
Vanshaj Singh's user avatar
0 votes
0 answers
34 views

Spark : Read special characters from the content of dat file without corrupting it in scala

I have to read all the special characters in some dat file (e.g.- testdata.dat) without being corrupted and initialise it into a dataframe in scala using spark. I have one dat file (eg - testdata.dat),...
Prantik Banerjee's user avatar
0 votes
0 answers
42 views

Spark re computes the cached Dataframes

Working on a Spark application written in Scala. Have six functions. Each function takes two Dataframes as an input, processes them and emits one result DF. I am caching the result of each function's ...
Karthik's user avatar
  • 63
1 vote
2 answers
68 views

Scala - Convert Map to Dataframe where Keys re the Column Titles

I wish to create a dataframe by using a map such that the keys of the map are the column titles, and the values of the map are the data itself. In python and pyspark, this can be done quite easily in ...
Sam's user avatar
  • 23
0 votes
1 answer
28 views

It is possible to use spark Dataframe/Dataset api with accumulators?

I read and filter data, need to count how each filter operation affects result. Is it possible to somehow mixin spark accumulators while using Dataframe/Dataset api? Sample code: sparkSession.read ....
Capacytron's user avatar
  • 3,709
0 votes
1 answer
17 views

Group spark dataframe column values based on variable scale

I have a dataframe with survey results. Each question has varying numeric scale from 4 - 6. I would like to bucket results based on the scale with the highest two answers being good results and the ...
P201_eng's user avatar
0 votes
1 answer
32 views

Illegal start of simple expression when calling scala function

I have a function declared outside the main method to melt a wide data frame that i got from this post How to unpivot Spark DataFrame without hardcoding column names in Scala? def melt(preserves: Seq[...
P201_eng's user avatar
0 votes
0 answers
34 views

Scala : create dataframe column from array where the array size is variable

I have a variable like val activityId = "activity_" + activityNum + "_id" the variable activityNum is incremented from a loop (1,2,3,...) so I want to create an array where it can ...
Bennani Med's user avatar
0 votes
1 answer
52 views

How to read a dataframe with inferschema as true

I have a dataframe df1 with all the columns are in string(100+ Columns), now i want to cast it to appropriate type with inferschema Like, for example what we do if we have a csv file and we want the ...
Sam's user avatar
  • 1
0 votes
1 answer
42 views

Convert a spark DataFrame to a slighly different case class?

I have some data in HDFS that is in parquet-protobuf. Due to some project constraints, I want to read that data with a spark DataFrame (easy) and then convert to a case class that is slightly ...
Tonyx's user avatar
  • 61
0 votes
1 answer
31 views

get reference of case class from it's fully qualified name to be used to convert dataframe to dataset

I have a fully qualified name of case classes. For my use case at runtime I need to get the reference of case class to be used to convert dataframe to dataset. eg. I have the FQN as: com.org.common....
zolo's user avatar
  • 7
-1 votes
1 answer
25 views

Filter out and log null values from Spark dataframe

I have this dataframe : +------+-------------------+-----------+ |brand |original_timestamp |weight | +------+-------------------+-----------+ |BR1 |1632899456 |4.0 | |BR2 |...
Nab's user avatar
  • 138

15 30 50 per page
1
2 3 4 5
157