Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batching support #29

Closed
aesteve opened this issue Nov 3, 2020 · 7 comments
Closed

Batching support #29

aesteve opened this issue Nov 3, 2020 · 7 comments

Comments

@aesteve
Copy link

aesteve commented Nov 3, 2020

Hello, and thanks a lot for your work.

I'd like to replace the following JDBC code:

        val updateFlags = conn.prepareStatement("UPDATE table SET field='something' WHERE id = ?")
        ids.toSet().forEach { id ->
            updateFlags.setLong(1, id)
            updateFlags.addBatch()
        }
        updateFlags.executeBatch()
        updateFlags.close()

Which runs decently fast (a few hundred ms for 10k entries)

I naively tried to run:

    let now = Instant::now();
    let mut stmt = conn.prepare("UPDATE table SET field='something' WHERE id = :1", &[StmtParam::FetchArraySize(10_000)])?;
    for id in &ids {
        stmt.execute(&[id])?;
    }
    println!("Time for {} updates: {}", ids.len(), now.elapsed().as_millis());
    conn.commit()

But unfortunately it takes way too much time as the number of ids increase. (More than 10s for 10k entries).

I see in README there's "Batch support" mentioned.
I guess this covers such feature, am I right?

If so, well, thank you!

And could you please notify me by closing this issue whenever you have time to implement it?
This way I would know I can go back to dealing with Oracle in Rust!

Thanks a lot.

@rgutierrez2004
Copy link

rgutierrez2004 commented Mar 21, 2021

The StmtParam::FetchArraySize option is for queries, using it for updates will only increase the amount of memory your program need.

The batch support works, I personally tested the latest version of rust-oracle with a 19c database and performance is quite good,
furthermore, if you put all the logic inside a SQL block and send it with rust-oracle similar to what you would do with a SQL tool where you run your SQL scripts will notice that performance with rust-oracle is better than with a Java thin driver which most SQL tools use.

If summary your code should look like this:

let ids: Vec<i32> = (0..10000).map(|v| v + 1).collect();
let now = Instant::now();
let mut stmt = conn.prepare("UPDATE table SET field='something' WHERE id = :1", &[])?;
for id in &ids {
  stmt.execute(&[id])?;
}
println!("Time for {} updates: {}", ids.len(), now.elapsed().as_millis());
conn.commit()

Some numbers: running the above code using two VMs in the same machine, one runing the rust program and the other the DB in a container with 10k records, got 1970 ms, now putting all the logic inside a SQL string block and running it with the same stmt.execute got only 104 ms which is fater than my 145 ms using a SQL tool with Java thin driver.

Hope this help,
R

@aesteve
Copy link
Author

aesteve commented Mar 23, 2021

Thanks for your comment.
I'll give it another try, I must have been doing something wrong with batching.

@aesteve aesteve closed this as completed Mar 23, 2021
@kubo
Copy link
Owner

kubo commented Mar 23, 2021

Sorry for not replying for so long time.

Batching is efficient when the network distance between the client and the server is long.
When a network round trip requires 1ms, inserting 10k rows requires at least 10s excluding time spent in the server side.
If 1000 rows are sent in a batch, it decreases to 10ms.

Well, I'll implement it in two or three weeks.

@kubo kubo reopened this Mar 23, 2021
@kubo
Copy link
Owner

kubo commented Apr 15, 2021

I added batch support. See the following doc comment. I'll release it after I add notice that batch errors and row counts are available only when both the client and the server are Oracle 12.1 or upper.

rust-oracle/src/batch.rs

Lines 227 to 406 in 0f62963

/// Statement batch, which inserts, updates or deletes more than one row at once
///
/// Batching is efficient when the network distance between the client and
/// the server is long. When a network round trip requires 1ms, inserting
/// 10k rows using [`Statement`] consumes at least 10s excluding time spent
/// in the client and the server. If 1000 rows are sent in a batch, it
/// decreases to 10ms.
///
/// # Usage
///
/// 1. [`conn.batch(sql_stmt, batch_size).build()`][Connection#method.batch] to create [`Batch`].
/// 2. [`append_row()`](#method.append_row) for each row. Rows in the batch are sent to
/// the server when the number of appended rows reaches the batch size.
/// **Note:** The "batch errors" option mentioned later changes this behavior.
/// 3. [`execute()`](#method.execute) in the end to send rows which
/// have not been sent by `append_rows()`.
///
/// ```
/// # use oracle::Error;
/// # use oracle::test_util;
/// # let conn = test_util::connect()?;
/// # conn.execute("delete from TestTempTable", &[])?;
/// let sql_stmt = "insert into TestTempTable values(:1, :2)";
/// let batch_size = 100;
/// let mut batch = conn.batch(sql_stmt, batch_size).build()?;
/// for i in 0..1234 {
/// batch.append_row(&[&i, &format!("value {}", i)])?;
/// }
/// batch.execute()?;
/// // Check the number of inserted rows.
/// assert_eq!(conn.query_row_as::<i32>("select count(*) from TestTempTable", &[])?, 1234);
/// # Ok::<(), Error>(())
/// ```
///
/// # Error Handling
///
/// There are two modes when invalid data are in a batch.
///
/// 1. Stop executions at the first failure and return the error information.
/// 2. Execute all rows in the batch and return an array of the error information.
///
/// ## Default Erorr Handling
///
/// `append_row()` and `execute()` stop executions at the first failure and return
/// the error information. There are no ways to know which row fails.
///
/// ```
/// # use oracle::Error;
/// # use oracle::test_util;
/// # let conn = test_util::connect()?;
/// # conn.execute("delete from TestTempTable", &[])?;
/// let sql_stmt = "insert into TestTempTable values(:1, :2)";
/// let batch_size = 10;
/// let mut batch = conn.batch(sql_stmt, batch_size).build()?;
/// batch.append_row(&[&1, &"first row"])?;
/// batch.append_row(&[&2, &"second row"])?;
/// batch.append_row(&[&1, &"first row again"])?; // -> ORA-00001: unique constraint violated.
/// batch.append_row(&[&3, &"third row ".repeat(11)])?; // -> ORA-12899: value too large for column
/// batch.append_row(&[&4, &"fourth row"])?;
/// let result = batch.execute();
/// match result {
/// Err(Error::OciError(dberr)) => {
/// assert_eq!(dberr.code(), 1);
/// assert!(dberr.message().starts_with("ORA-00001: "));
/// }
/// _ => panic!("Unexpected batch result: {:?}", result),
/// }
///
/// // Check the inserted rows.
/// let mut stmt = conn.prepare("select count(*) from TestTempTable where intCol = :1", &[])?;
/// assert_eq!(stmt.query_row_as::<i32>(&[&1])?, 1);
/// assert_eq!(stmt.query_row_as::<i32>(&[&2])?, 1);
/// assert_eq!(stmt.query_row_as::<i32>(&[&3])?, 0);
/// assert_eq!(stmt.query_row_as::<i32>(&[&4])?, 0);
/// # Ok::<(), Error>(())
/// ```
///
/// ## Error Handling with batch errors
///
/// [`BatchBuilder.with_batch_errors`][] changes
/// the behavior of `Batch` as follows:
/// * `execute()` executes all rows in the batch and return an array of the error information
/// with row positions in the batch when the errors are caused by invalid data.
/// * `append_row()` doesn't send rows internally when the number of appended rows reaches
/// the batch size. It returns an error when the number exceeds the size instead.
///
/// ```
/// # use oracle::Error;
/// # use oracle::test_util;
/// # let conn = test_util::connect()?;
/// # conn.execute("delete from TestTempTable", &[])?;
/// let sql_stmt = "insert into TestTempTable values(:1, :2)";
/// let batch_size = 10;
/// let mut batch = conn.batch(sql_stmt, batch_size).with_batch_errors().build()?;
/// batch.append_row(&[&1, &"first row"])?;
/// batch.append_row(&[&2, &"second row"])?;
/// batch.append_row(&[&1, &"first row again"])?; // -> ORA-00001: unique constraint violated.
/// batch.append_row(&[&3, &"third row ".repeat(11)])?; // -> ORA-12899: value too large for column
/// batch.append_row(&[&4, &"fourth row"])?;
/// let result = batch.execute();
/// match result {
/// Err(Error::BatchErrors(mut errs)) => {
/// // sort by position because errs may not preserve order.
/// errs.sort_by(|a, b| a.offset().cmp(&b.offset()));
/// assert_eq!(errs.len(), 2);
/// assert_eq!(errs[0].code(), 1);
/// assert_eq!(errs[1].code(), 12899);
/// assert_eq!(errs[0].offset(), 2); // position of `[&1, &"first row again"]`
/// assert_eq!(errs[1].offset(), 3); // position of `[&3, &"third row ".repeat(11)]`
/// assert!(errs[0].message().starts_with("ORA-00001: "));
/// assert!(errs[1].message().starts_with("ORA-12899: "));
/// }
/// _ => panic!("Unexpected batch result: {:?}", result),
/// }
///
/// // Check the inserted rows.
/// let mut stmt = conn.prepare("select count(*) from TestTempTable where intCol = :1", &[])?;
/// assert_eq!(stmt.query_row_as::<i32>(&[&1])?, 1);
/// assert_eq!(stmt.query_row_as::<i32>(&[&2])?, 1);
/// assert_eq!(stmt.query_row_as::<i32>(&[&3])?, 0); // value too large for column
/// assert_eq!(stmt.query_row_as::<i32>(&[&4])?, 1);
/// # Ok::<(), Error>(())
/// ```
///
/// # Affected Rows
///
/// Use [`BatchBuilder.with_row_counts`][] and [`Batch.row_counts`][] to get affected rows
/// for each input row.
///
/// ```
/// # use oracle::Error;
/// # use oracle::sql_type::OracleType;
/// # use oracle::test_util;
/// # let conn = test_util::connect()?;
/// # conn.execute("delete from TestTempTable", &[])?;
/// # let sql_stmt = "insert into TestTempTable values(:1, :2)";
/// # let batch_size = 10;
/// # let mut batch = conn.batch(sql_stmt, batch_size).build()?;
/// # batch.set_type(1, &OracleType::Int64)?;
/// # batch.set_type(2, &OracleType::Varchar2(1))?;
/// # for i in 0..10 {
/// # batch.append_row(&[&i]);
/// # }
/// # batch.execute()?;
/// let sql_stmt = "update TestTempTable set stringCol = :stringCol where intCol >= :intCol";
/// let mut batch = conn.batch(sql_stmt, 3).with_row_counts().build()?;
/// batch.append_row_named(&[("stringCol", &"a"), ("intCol", &9)])?; // update 1 row
/// batch.append_row_named(&[("stringCol", &"b"), ("intCol", &7)])?; // update 3 rows
/// batch.append_row_named(&[("stringCol", &"c"), ("intCol", &5)])?; // update 5 rows
/// batch.execute()?;
/// assert_eq!(batch.row_counts()?, &[1, 3, 5]);
/// # Ok::<(), Error>(())
/// ```
///
/// # Bind Parameter Types
///
/// Parameter types are decided by the value of [`Batch.append_row`][], [`Batch.append_row_named`][]
/// or [`Batch.set`][]; or by the type specified by [`Batch.set_type`][]. Once the
/// type is determined, there are no ways to change it except the following case.
///
/// For user's convenience, when the length of character data types is too short,
/// the length is extended automatically. For example:
/// ```no_run
/// # use oracle::Error;
/// # use oracle::sql_type::OracleType;
/// # use oracle::test_util;
/// # let conn = test_util::connect()?;
/// # let sql_stmt = "dummy";
/// # let batch_size = 10;
/// let mut batch = conn.batch(sql_stmt, batch_size).build()?;
/// batch.append_row(&[&"first row"])?; // allocate 64 bytes for each row
/// batch.append_row(&[&"second row"])?;
/// //....
/// // The following line extends the internal buffer length for each row.
/// batch.append_row(&[&"assume that data lenght is over 64 bytes"])?;
/// # Ok::<(), Error>(())
/// ```
/// Note that extending the internal buffer needs memory copy from existing buffer
/// to newly allocated buffer. If you know the maximum data length, it is better
/// to set the size by [`Batch.set_type`][].

@kubo
Copy link
Owner

kubo commented Apr 18, 2021

Rust-oracle 0.5.1 was published. It includes batching support.

@IdemenB
Copy link

IdemenB commented Apr 26, 2021

Hi @rgutierrez2004 , could you give a bit more detail on "putting all the logic inside a SQL string block" please? Do you mean also including the ID and time stamp generation into the SQL script? If yes, can you share the final script as an example?

Thanks in advance!

Some numbers: running the above code using two VMs in the same machine, one runing the rust program and the other the DB in a container with 10k records, got 1970 ms, now putting all the logic inside a SQL string block and running it with the same stmt.execute got only 104 ms which is fater than my 145 ms using a SQL tool with Java thin driver.

Hope this help,
R

@aesteve
Copy link
Author

aesteve commented Apr 27, 2021

Thanks a lot @kubo should I close the issue or let you do it?

@kubo kubo closed this as completed Apr 29, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants