Newest 'dataframe+scala' Questions

0 votes

1 answer

22 views

How to filter dataframe by column from different dataframe?

I want to filter dataframe by column with Strings from different dataframe. val booksReadBestAuthors = userReviewsWithBooksDetails.filter(col("authors").isin(userReviewsAuthorsManyRead:_*)) ...

Joanna Kois

1

asked yesterday

0 votes

1 answer

31 views

What is the maximum number of entries a array can in a Spark column can hold?

I've created a struct with the data of some columns combined. Large numbers of these structs now occur for my unique identifier values. I want to combine these structs into an array using collect_list....

M.S.Visser

61

asked Jul 22 at 16:15

0 votes

1 answer

75 views

How to create data-frame on RocksDB (SST files)

We hold our documents in RocksDB. We will be syncing these RocksDB sst files to S3. I would like to create a dataframe on the SST files and later run an SQL query. When I googled, I was not able to ...

chendu

729

asked Jul 8 at 6:31

0 votes

0 answers

22 views

Flattening nested json with back slash in apache spark scala Dataframe

{ "messageBody": "{\"task\":{\"taskId\":\"c6d9fb0e-42ba-4a3e-bd39-f2a32a6958c1\",\"serializedTaskData\":\"{\\\"clientId\\\":\\\&...

Vanshaj Singh

19

asked Jul 5 at 4:48

0 votes

0 answers

34 views

Spark : Read special characters from the content of dat file without corrupting it in scala

I have to read all the special characters in some dat file (e.g.- testdata.dat) without being corrupted and initialise it into a dataframe in scala using spark. I have one dat file (eg - testdata.dat),...

Prantik Banerjee

1

asked Jul 3 at 11:36

0 votes

0 answers

42 views

Spark re computes the cached Dataframes

Working on a Spark application written in Scala. Have six functions. Each function takes two Dataframes as an input, processes them and emits one result DF. I am caching the result of each function's ...

Karthik

63

asked Jun 23 at 9:22

1 vote

2 answers

68 views

Scala - Convert Map to Dataframe where Keys re the Column Titles

I wish to create a dataframe by using a map such that the keys of the map are the column titles, and the values of the map are the data itself. In python and pyspark, this can be done quite easily in ...

Sam

23

asked May 30 at 0:18

0 votes

1 answer

28 views

It is possible to use spark Dataframe/Dataset api with accumulators?

I read and filter data, need to count how each filter operation affects result. Is it possible to somehow mixin spark accumulators while using Dataframe/Dataset api? Sample code: sparkSession.read ....

Capacytron

3,709

asked May 29 at 12:21

0 votes

1 answer

17 views

Group spark dataframe column values based on variable scale

I have a dataframe with survey results. Each question has varying numeric scale from 4 - 6. I would like to bucket results based on the scale with the highest two answers being good results and the ...

P201_eng

17

asked May 22 at 18:19

0 votes

1 answer

32 views

Illegal start of simple expression when calling scala function

I have a function declared outside the main method to melt a wide data frame that i got from this post How to unpivot Spark DataFrame without hardcoding column names in Scala? def melt(preserves: Seq[...

P201_eng

17

asked May 21 at 21:53

0 votes

0 answers

34 views

Scala : create dataframe column from array where the array size is variable

I have a variable like val activityId = "activity_" + activityNum + "_id" the variable activityNum is incremented from a loop (1,2,3,...) so I want to create an array where it can ...

Bennani Med

1

asked May 19 at 14:59

0 votes

1 answer

52 views

How to read a dataframe with inferschema as true

I have a dataframe df1 with all the columns are in string(100+ Columns), now i want to cast it to appropriate type with inferschema Like, for example what we do if we have a csv file and we want the ...

Sam

1

asked May 14 at 7:26

0 votes

1 answer

42 views

Convert a spark DataFrame to a slighly different case class?

I have some data in HDFS that is in parquet-protobuf. Due to some project constraints, I want to read that data with a spark DataFrame (easy) and then convert to a case class that is slightly ...

Tonyx

61

asked Apr 29 at 8:35

0 votes

1 answer

31 views

get reference of case class from it's fully qualified name to be used to convert dataframe to dataset

I have a fully qualified name of case classes. For my use case at runtime I need to get the reference of case class to be used to convert dataframe to dataset. eg. I have the FQN as: com.org.common....

zolo

7

asked Apr 16 at 8:49

-1 votes

1 answer

25 views

Filter out and log null values from Spark dataframe

I have this dataframe : +------+-------------------+-----------+ |brand |original_timestamp |weight | +------+-------------------+-----------+ |BR1 |1632899456 |4.0 | |BR2 |...

Nab

138

asked Apr 8 at 10:18

Collectives™ on Stack Overflow

All Questions

How to filter dataframe by column from different dataframe?

What is the maximum number of entries a array can in a Spark column can hold?

How to create data-frame on RocksDB (SST files)

Flattening nested json with back slash in apache spark scala Dataframe

Spark : Read special characters from the content of dat file without corrupting it in scala

Spark re computes the cached Dataframes

Scala - Convert Map to Dataframe where Keys re the Column Titles

It is possible to use spark Dataframe/Dataset api with accumulators?

Group spark dataframe column values based on variable scale

Illegal start of simple expression when calling scala function

Scala : create dataframe column from array where the array size is variable

How to read a dataframe with inferschema as true

Convert a spark DataFrame to a slighly different case class?

get reference of case class from it's fully qualified name to be used to convert dataframe to dataset

Filter out and log null values from Spark dataframe

Hot Network Questions

Collectives™ on Stack Overflow

All Questions

Related Tags